You will own the agent platform for an enterprise AI startup already generating seven-figure revenue. Your work involves building the orchestration and reliability infrastructure that turns frontier models into high-stakes production features. You will define the technical standard for agentic systems across financial services, healthcare, and regulated technology sectors.
Founding Engineer, Agent Systems at Helmguard
As a Founding Engineer, you will own the agent orchestration platform, building the reliability and evaluation infrastructure that makes AI agents trustworthy enough for the world’s most regulated industries. This isn't just another wrapper; you'll be pushing frontier APIs to their limits and defining the future of agent-native risk management. If you have 1-2 years of production LLM experience and want outsized equity upside in a London-based team backed by tech industry titans, this is the role for you.
About this role
Role overview
About the company
Helmguard
HelmGuard Technologies, Inc. provides an AI-native enterprise trust and risk assurance platform that functions as an "Enterprise AI Risk Operating System" for security, compliance, and operations teams.[1][3] The company’s platform consolidates risk, security, and compliance data into a unified intelligence layer and uses specialized, autonomous AI agents to execute tasks such as risk assessments, compliance mapping, third‑party/vendor risk management, incident and exposure detection, and customer assurance reporting.[1][3][5] By orchestrating AI agents across multiple risk domains—including cybersecurity, IT, legal, finance, and regulatory compliance—HelmGuard enables continuous monitoring, predictive exposure detection, and generation of stakeholder‑ready, evidence‑backed reports, helping enterprises make faster, clearer, and more accountable security and risk decisions at scale.[1][3][4][5]
What you'll do
What you will do
- Architect and scale agent scaffolding including tool use, context management, sandboxing, and prompt-injection defense mechanisms.
- Develop production-grade evaluation infrastructure for fuzzy, high-stakes outputs using dataset curation and LLM-as-judge methodologies.
- Build robust reliability systems including circuit breakers, fallbacks, and prompt versioning to ensure enterprise-grade stability.
Who you are
Who this is a fit for
- Professional backend experience in TypeScript with at least 1–2 years specifically shipping production-level LLM features and agent frameworks.
- Deep understanding of systems engineering principles like async queues, idempotency, and regression testing under model swaps.
- Proven ability to own AI quality end-to-end, with the technical confidence to define and enforce shipping standards for complex agentic workflows.
Why this role
Why this role is remarkable
- Join an elite founding team of Palantir, Oxford, and Stanford alumni at a company that achieved seven-figure revenue within months of launch.
- Work at the extreme frontier of LLM application; the team recently identified and reported critical bugs in Anthropic's core API during production stress-testing.
- Benefit from top-decile London compensation, meaningful EMI-eligible equity, and significant budgets for specialized AI tooling and API experimentation.
Jack & Jill
How Jack & Jill work together
Meet Jack
Jack gets to know what you're great at and what you want next, then searches 15 million jobs daily and helps you discover roles at companies like this.
How does this work?
Jack’s an AI agent for job searching and career coaching. He works for you.
Jill is the AI recruiter working for the company. She recruits from Jack’s network.
If it’s a match and the company wants to meet you, they’ll make the intro. In the meantime, if you’d like, Jack will send you excellent alternatives.