
Recruiter
Vera Bekker
Roles:
Machine Learning
Must-have skills:
Python
Nice-to-have skills:
Embedded
Considering candidates from:
Eastern Europe
Eastern Europe
Work arrangement: Remote
Industry: Software Development
Language: English, Russian
Level: Senior
Required experience: 5+ years
Size: 2 - 10 employees
Company
Solving AI inference economics through intelligent orchestration, real-time telemetry & automatic runtime optimization.
Description
The company is looking for an engineer to support model optimization and inference for large language models, working mainly with Python and NVIDIA GPUs (CUDA).
Tasks:
- Work with NVIDIA GPUs (CUDA) to run and optimize ML workloads
- Apply quantization techniques to LLMs using existing libraries (e.g., GPTQ)
- Integrate and run off-the-shelf tools for model optimization and inference
- Optimize performance of models on modern GPU architectures (e.g., Hopper, Blackwell)
- Collaborate with the team to validate approaches and results
- Quickly prototype and validate technical solutions
Must-have:
- 5+ years of experience in software engineering / ML / GPU-related roles
- Strong hands-on experience with NVIDIA GPUs and CUDA
- Solid Python skills
- Experience working with ML frameworks and running models in production or near-production environments
- Ability to work independently
- Basic background in applied mathematics (education)
Nice-to-have:
- Experience with LLM optimization and inference pipelines
- Familiarity with modern GPU architectures (Hopper, Blackwell)
- Experience with quantization techniques (e.g., GPTQ or similar)
- English skills
- Embedded systems or low-level optimization
Benefits:
- Remote, flexible engagement
- Opportunity to expand into a larger role if collaboration is successful
- Work on modern AI / LLM optimization problems
Interview process:
- Intro call with Toughbyte
- First interview with the architect
- Follow-up interview with the company executives (if needed)
Questions
Have questions about this position? Try the company page or sign up to ask one.
