GPU Site Reliability Engineer

Stockholm, Sweden
2 day average response time from company
Photo of Milada Gudkova
Recruiter
Milada Gudkova
Roles:
DevOps
Must-have skills:
C++
One of skills:
AWSAzureGCP
Nice-to-have skills:
CDockerKubernetesRust
Considering candidates from:
Europe
Work arrangement: Onsite or hybrid
Industry: Software Development
Language: English
Level: Any
Required experience: 1+ years
Relocation: Not paid
Visa support: Not provided
Size: 11 - 50 employees
Logo of UX Stream

GPU Site Reliability Engineer

Stockholm, Sweden
2 day average response time from company
UX Stream builds ultra-fast streaming solutions to power the cloud app ecosystem of the future. They host apps on servers and stream them to any device in real-time unlocking more powerful experiences on smartphones, watches, VR glasses, etc. with the help of their unique Mobile Interactive Real-Time Streaming (MIRS™) technology based on AI and 5G.
As they expand, they are now looking for a talented Super Computer Developer to lead the deployment and orchestration of their software in edge-server. You will plan and execute the deployment on rented and/or owned GPU-powered servers or instances. Specifically, you will work on evaluating container options and other deployment methods for streaming software and underlying streamed apps, load balancing, deploying and terminating instances to match the load, and collaborating with the development team to optimize the streamed experience for the end user.

Tasks:
  • Collaborate with the company's streaming development team to understand software deployment and balancing requirements
  • Design, configure, and deploy software applications on rented GPU-power instances, ensuring optimal performance and resource utilization. Evaluate alternatives for owned, dedicated hardware
  • Develop and implement automation scripts and tools to streamline the software deployment and balancing process
  • Monitor, manage, and troubleshoot GPU instances to ensure smooth operations and minimize downtime
  • Collaborate with cross-functional teams to identify and resolve performance bottlenecks, scalability issues, and other technical challenges
  • Stay up-to-date with the latest edge computing and GPU technologies advancements, and propose innovative solutions to improve software deployment processes
  • Document deployment procedures, best practices, and lessons learned to facilitate knowledge sharing and future reference
Must-have:
  • Proven experience in cloud development, preferably with rented GPU instances (e.g., AWS EC2 GPU instances, Google Cloud GPUs, Azure NV instances) and their associated services (ECS, EKS, GKE)
  • Proficiency in C/C++/Rust
  • Proficiency in scripting languages (Python, Bash, PowerShell) for automation and deployment tasks
  • Proficiency with containerization technologies and orchestration frameworks (Docker, Kubernetes, etc.)
  • Experience with networking systems solutions
  • Experience with Linux and/or Windows distro environments for Cloud Deployment
  • Experience with software performance profiling, optimization, and debugging techniques
  • Excellent problem-solving and analytical skills, with the ability to diagnose and resolve complex technical issues
  • Strong communication and collaboration skills, with the ability to work effectively in a team environment
Nice-to-have:
  • Solid understanding of GPU architectures, CUDA programming, and GPU-accelerated libraries
  • Familiarity with version control systems (Git) and code review processes
  • Certifications in cloud computing (AWS, Azure, Google Cloud) and GPU technologies are a plus
Benefits and conditions:
  • Flexible working hours
  • Onsite or hybrid work arrangement
  • 30 days of vacation per year
Interview process:
  1. Intro call with Toughbyte
  2. Interview with Alex, a Board Member
  3. Interview with the CEO
Have questions about this position? Try the company page or sign up to ask one.