Remote Data Scientist

Remote
Roles:
Data
Must-have skills:
Data ScienceNLPPython
One of skills:
AWSGCP
Nice-to-have skills:
DockerGoJenkinsKubernetesTerraform
Considering candidates from:
Central Asia, Europe, South Caucasus, Belarus, Bosnia and Herzegovina, Moldova, Montenegro, Russia, Serbia, Turkey, Ukraine and United Kingdom
Work arrangement: Remote
Industry: Security Systems Services
Language: English
Level: Senior
Required experience: 5+ years
Size: 11 - 50 employees
Logo of Archipelo

Remote Data Scientist

Remote
Archipelo is building a code security platform that gives organizations the ability to verify the authenticity and provenance of code within their software development lifecycle. They are solving a painful problem that affects every software developer on the planet: ensuring software security, authenticity, integrity, and compliance - by providing the context for how the code was created.

They ensure that secure coding best practices are implemented proactively at the earliest stages of the SDLC, from research and design to development and deployment.
Right now they are looking for Senior NLP Data Scientist to lead technology development on the frontier of code discovery and developer productivity. A successful applicant is an expert in data science, machine learning, software engineering, and complex data analysis spanning natural language, code syntax and networks. You will help their team identify, analyze, and process large heterogeneous data sets. You will develop prototypes, tools, and methods that inform decision-making for software developers (e.g. “Is this the right solution to my coding problem?” or “How do I implement this specific code in my application?” or “What code libraries are other developers using to solve my problem?”). 

Tasks:
  • Design and train production-grade NLP models
  • Build complete data processing systems that drive products, systems or applications 
  • Lead experimentation processes that accelerate prototyping and maximize resource utilization
  • Process data pipelines for machine learning operations: scheduling, ETL, dataflow programming, SQL, data labeling, representation learning, hyperparameter tuning, and model management
  • Produce and deploy internal and external APIs
  • Design and implement predictive models on multiple decision platforms
  • Apply the latest techniques academic research to real world problems in production environment
  • Review code, mentor other engineers and support the data science and engineering teams
  • Attract, recruit and retain top data science and engineering talent
Must-have:
  • 7+ years of experience in Data Science
  • Expertise in Natural Language Processing and Understanding (NLP & NLU)
  • Expertise in microservices and cloud computing—in at least one cloud platform
  • Familiar with distributed systems and the orchestration of large numbers of independent commodity machines into complete, functional systems to handle diverse workloads
  • Expertise performing data science research
  • Expertise writing world class Python code
Nice-to-have:
  • PhD in computer science, artificial intelligence, machine learning or related technical field
  • Advanced working knowledge of information retrieval and search technologies and have set up and used open-source search systems to query and understand data
  • Expertise with Go
  • Experience with many of the following technologies:
    • Modern ML Models (e.g. BERT)
    • ElasticSearch, Solr and equivalent 
    • Kubernetes, Docker, Terraform
    • Machine learning infrastructure
    • Deep learning, GNNs
    • CircleCI, GitHub Actions, Jenkins or equivalent
    • Graph databases
Benefits and conditions:
  • Stock options
  • Paid vacation and sick leave
  • A strong remote work culture that includes group activities and local gatherings
Interview process:
  1. Intro call with Toughbyte
  2. Screening Interview
  3. Founder Interview
  4. ML Interview
  5. Data Science Interview
  6. Final interview