Skip to main content
Back to all jobs

This job is no longer actively hiring. Open Roles to see active jobs.

Confidential company

Job listing

Gloucestershire, United KingdomNot Disclosed

Platform Site Reliability Engineer at AI infrastructure platform startup

Are you a seasoned Platform Site Reliability Engineer passionate about AI infrastructure? Join a pioneering platform startup revolutionizing how software connects with hardware for the AI era. You'll be instrumental in running and evolving a globally scaled platform, deploying Kubernetes for AI workloads, and ensuring 24/7 stability and security. This is a chance to make a significant impact, drive automation, and mentor others in a fast-paced, innovative environment.

Overview

Role overview

We're seeking an experienced Platform Site Reliability Engineer to manage and evolve our AI infrastructure platform. You'll ensure 24/7 stability and security across bare-metal, virtualization, and orchestration layers, deploying and optimizing Kubernetes for AI workloads. This role involves significant automation, incident management, and mentoring, contributing to a scalable and efficient AI ecosystem.

Company

About the company

AI infrastructure platform startup

Responsibilities

What you will do

  • Deploy and manage Kubernetes clusters at scale, supporting AI-centric workloads across diverse infrastructure.
  • Optimize Linux system configurations and build automation scripts for platform lifecycle and incident resolution.
  • Apply ITSM frameworks, maintain observability with Prometheus/Grafana, and operate services in 24x7 production environments.

Candidate profile

Who this is a fit for

  • 5+ years proven experience in globally scaled, performance-intensive SRE environments with 24/7 support.
  • 3+ years experience running, deploying, and optimizing orchestration platforms, with strong Kubernetes expertise.
  • Expert-level Linux administration (especially Ubuntu), system tuning, and strong networking fundamentals.

What makes it remarkable

Why this role is remarkable

  • Drive the evolution of cutting-edge AI infrastructure, connecting software and hardware for the AI era.
  • Work across bare-metal, virtualization, and large-scale Kubernetes deployments supporting critical AI workloads.
  • Make a significant impact on 24/7 operations, automation, and mentorship within a growing, well-funded tech company.

Meet Jack

Thumbnail for Meet Jack

Jack gets to know what you're great at and what you want next, then searches 14 million jobs daily and introduces you directly to hiring managers.

How does this work?

Jack's an AI agent for job searching and career coaching. He works for you.

Jill is the AI recruiter working for the company. She recruits from Jack's network.

If it's a match and the company wants to meet you, they'll make the intro. In the meantime, if you'd like, Jack will send you excellent alternatives.

Find a job withJack

Ready to find your next role?

Talk to Jack for 10 minutes and see your first matches.