TLDR: We are looking for an ML Infrastructure Engineer to build the systems behind our LLM post-training, RL, evaluation, inference, and agentic development workflows. You will work close to researchers, GPUs, training loops, data control systems, evals, inference stacks, and the infrastructure decisions that directly affect model learning and product quality.

About us

White Circle is an AI Safety company building the safety, reliability, and optimization layer for AI systems. At the core of our platform are policies – simple natural-language rules that define what an AI model should and shouldn’t do. We automatically test, enforce, and continuously improve these policies at scale.

We’ve raised $11M from top funds, founders, and senior leaders at OpenAI, Anthropic, HuggingFace, Mistral, DeepMind, Datadog, Sentry, and others
We process over 100M+ API calls every month
We fine-tune and train our own LLMs so they run faster and cheaper than any open or proprietary model

We’re a small, highly focused team. If you want to work deeply on hard problems, see your work ship to production quickly, and influence how AI safety is actually built – you’re the one we need.

You will:

Build robust, flexible, and scalable RL and post-training pipelines, including smoke tuning runs for quality testing and approach ablations
Design data control systems that govern what the model sees, when it sees it, and how training data flows through rollouts, replay, filtering, evaluation, and policy updates
Tune training and inference end-to-end for high throughput across the systems that matter: networking, memory, compute scheduling, data loading, storage, checkpointing, and I/O
Investigate how infrastructure choices affect learning dynamics, eval quality, model behavior, and training stability – staying close to the state of the art in LLMs, RL, and post-training
Build infrastructure for model iteration: experiment runs, artifacts, evals, dashboards, failure inspection, reproducibility, and cost visibility
Work on inference infrastructure where it affects post-training and evaluation loops
Build and improve agentic development environments: coding-agent harnesses, browser/tool integrations, terminal/runtime sandboxes, repo-aware workflows, and multi-agent orchestration
Work closely with the team: plan future steps, discuss tradeoffs, share context early, and stay in touch while building

You’ll fit right in if you:

Have designed, built, or maintained distributed RL/post-training systems at scale and are fluent in their moving parts: rollouts, replay buffers, reward signals, data filtering, policy updates, evaluation loops, and failure analysis
Are familiar with deep learning frameworks such as PyTorch or JAX
Are proficient in Python, including concurrency, asynchronous programming, multiprocessing, and performance optimization
Can debug distributed GPU workloads across CUDA runtime, container runtime, driver versions, NCCL or equivalent communication layers, networking, storage, scheduling, and checkpointing
Have experience with profiling tools across the stack, for example py-spy, PyTorch profiler, Nsight, perf, tracing, metrics, logs, or custom instrumentation
Have experience with inference stacks such as vLLM, SGLang, TensorRT-LLM, Dynamo, or custom serving infrastructure
Can reason from system metrics back to model behavior: when latency, queueing, sampling, data order, rollout throughput, or infrastructure failures affect learning
Have a strong ownership mindset: you can take an ambiguous infrastructure problem, make it concrete, ship a working system, and improve it from real feedback

A big plus:

A public builder footprint: open-source contributions to RL, distributed ML, LLM training, inference, eval, or agent infrastructure – repos, PRs, benchmarks, papers with code, technical posts – and a good technical X/Twitter presence with live building, debugging threads, and useful interaction with strong builders
Experience in a high-bar AI infra, research, or model environment such as xAI/Grok, Qwen, ByteDance AI infra/research, Prime Intellect, or similar teams
Custom training framework support or ownership: distributed training, fine-tuning pipelines, trainers, schedulers, checkpointing, data loaders, model/eval integration, or performance tooling
Serious use of Claude Code, Codex, Kimi Code, Pi Agent, Droid, or similar agentic coding systems as a development surface
Experience with GPU clusters on Kubernetes, Slurm, Ray, custom schedulers, or cloud GPU orchestration
NCCL, UCX, NVSHMEM, RDMA, InfiniBand, RoCE, or EFA
Rust, C++, CUDA, Go, or systems-level performance work

Why White Circle

You will be able to propose and run your own experiments and research ideas on modern ML infrastructure with very little friction
You will work on current ML infra problems, close to research, product needs, and real model iteration – not maintaining legacy systems for the sake of keeping them alive
You will have an unusually high contribution level for the size of the team; your systems decisions can change how quickly we train, evaluate, ship, and improve models
You will have room to dig into areas of your own interest, as long as they help the company build better, faster, safer AI systems
Paid time off in line with your local regulations, no matter where you work from.
Work from Paris (hybrid) with a relocation package available, or work from London (note: we are currently unable to provide relocation support and medical insurance for London-based roles).
Comprehensive medical insurance for our France-based team.
All the hardware, tools, and services you need.
Covered subscriptions for AI agents and IDEs.
Team off-sites twice a year: we’ve recently been to the Alps and to Saint-Tropez.

How we hire

Introductory call with HR (25 min)
Take-home test task
Technical interview with Head of Applied Research (60 min)
Final conversation with our CEO (45 min)

Please submit your application in English.

ML Infrastructure Engineer

Job Description