GPU Site Reliability Engineer

UX Stream

Stockholm, Sweden

view all positions

Roles:

DevOps

Must-have skills:

C++

One of skills:

AWSGCPAzure

Nice-to-have skills:

CDockerKubernetesRust

Considering candidates from:
Europe

Work arrangement: Onsite or hybrid

Industry: Software Development

Language: English

Level: Any

Required experience: 1+ years

Relocation: Not paid

Visa support: Not provided

Size: 11 - 50 employees

GPU Site Reliability Engineer

UX Stream

Stockholm, Sweden

Company

UX Stream builds ultra-fast streaming solutions to power the cloud app ecosystem of the future. They host apps on servers and stream them to any device in real-time unlocking more powerful experiences on smartphones, watches, VR glasses, etc. with the help of their unique Mobile Interactive Real-Time Streaming (MIRS™) technology based on AI and 5G.

Description

As they expand, they are now looking for a talented Super Computer Developer to lead the deployment and orchestration of their software in edge-server. You will plan and execute the deployment on rented and/or owned GPU-powered servers or instances. Specifically, you will work on evaluating container options and other deployment methods for streaming software and underlying streamed apps, load balancing, deploying and terminating instances to match the load, and collaborating with the development team to optimize the streamed experience for the end user.

Tasks:

Collaborate with the company's streaming development team to understand software deployment and balancing requirements
Design, configure, and deploy software applications on rented GPU-power instances, ensuring optimal performance and resource utilization. Evaluate alternatives for owned, dedicated hardware
Develop and implement automation scripts and tools to streamline the software deployment and balancing process
Monitor, manage, and troubleshoot GPU instances to ensure smooth operations and minimize downtime
Collaborate with cross-functional teams to identify and resolve performance bottlenecks, scalability issues, and other technical challenges
Stay up-to-date with the latest edge computing and GPU technologies advancements, and propose innovative solutions to improve software deployment processes
Document deployment procedures, best practices, and lessons learned to facilitate knowledge sharing and future reference

Must-have:

Proven experience in cloud development, preferably with rented GPU instances (e.g., AWS EC2 GPU instances, Google Cloud GPUs, Azure NV instances) and their associated services (ECS, EKS, GKE)
Proficiency in C/C++/Rust
Proficiency in scripting languages (Python, Bash, PowerShell) for automation and deployment tasks
Proficiency with containerization technologies and orchestration frameworks (Docker, Kubernetes, etc.)
Experience with networking systems solutions
Experience with Linux and/or Windows distro environments for Cloud Deployment
Experience with software performance profiling, optimization, and debugging techniques
Excellent problem-solving and analytical skills, with the ability to diagnose and resolve complex technical issues
Strong communication and collaboration skills, with the ability to work effectively in a team environment

Nice-to-have:

Solid understanding of GPU architectures, CUDA programming, and GPU-accelerated libraries
Familiarity with version control systems (Git) and code review processes
Certifications in cloud computing (AWS, Azure, Google Cloud) and GPU technologies are a plus

Benefits and conditions:

Flexible working hours
Onsite or hybrid work arrangement
30 days of vacation per year

Interview process:

Intro call with Toughbyte
Interview with Alex, a Board Member
Interview with the CEO

GPU Site Reliability Engineer at UX Stream

view all positions