Role Overview
We are seeking a Senior Software Engineer, Backend (8–10 years experience) to own the core platform powering a GPU infrastructure service: scheduling, resource pooling, billing/metering, provisioning, and the APIs that tie it all together. The role demands strong distributed systems fundamentals and comfort operating in an agentic development environment, where you decide how much you write by hand versus delegate to agents, while remaining fully accountable for correctness, observability, and quality. The ideal candidate is a self-directed engineer who writes specs when none exist, reviews agent output with real technical judgment, and holds a high bar for both their own work and the team around them.
Key Responsibilities
- Own the core platform: scheduling, resource pooling, billing/metering, provisioning, and the APIs connecting them.
- Build systems that stay correct and fast at infrastructure scale, where latency, consistency, and throughput all matter simultaneously.
- Decide how much of the implementation to write by hand versus delegate to agents, while remaining accountable for the outcome.
- Write specs that agents can execute against, and build checks that keep agent output honest.
- Review generated code with enough depth to defend every line, catching subtle errors a less experienced reviewer would miss.
- Ensure systems are observable, well-tested, and production-ready.
- Collaborate with the broader engineering team to push architecture and code quality forward.
- Push back on unclear or flawed specs, and write the missing spec yourself when needed.
Requirements
- 8–10 years of backend engineering experience, with meaningful time at a product company or early-stage startup.
- Golang proficiency deep enough to have strong opinions on it, and to catch when generated Go is subtly wrong.
- Strong distributed systems fundamentals: concurrency, consistency, fault tolerance, and observability.
- Experience with cloud-native infrastructure such as Kubernetes, bare metal provisioning, or similar.
- A genuine agentic workflow, not casual use: writing specs agents can execute, building checks that keep agent output honest, and having real opinions on tools shipped with (Claude Code, Cursor, or similar).
- Degree from a tier 1 or tier 2 engineering institution.
Nice to Have
- Direct experience building or operating multi-tenant infrastructure platforms at scale.
- Familiarity with billing, metering, or resource scheduling systems in cloud or infrastructure contexts.
- A track record of setting engineering standards or mentoring on code quality within a team.
Why Join Us?
- Build and own core systems for a GPU infrastructure platform operating at real scale, not a side project.
- Work in a team that builds through agentic development by design, with humans owning judgment, architecture, and verification.
- Take full ownership of outcomes, with real latitude in how you get there.
- Competitive compensation, growth opportunities, and a culture built around technical excellence and high standards.