Site Reliability Engineer (SRE)
As an SRE at Mercor you will automate the systems, tooling, and runbooks that keep our platform reliable at scale.
You will partner closely with product and platform teams to bake reliability into every launch while leading observability and incident programs.
Key Responsibilities
- Mentor engineers on observability, alert management, and instrumentation best practices.
- Lead incident response from triage through post-mortem analysis and remediation.
- Own load-testing, disaster-recovery, and chaos-engineering initiatives.
- Automate reliability checks, capacity planning, and service-level monitoring for core services.
Skills We're Looking For
- Hands-on SRE experience in high-throughput or large-scale distributed systems.
- Proficiency with Terraform, Python, and Go in production environments.
- Deep familiarity with AWS and modern observability stacks.
- Ability to diagnose complex systems quickly and communicate findings clearly.
Bonus Experience
- Data: Familiarity with MySQL, MongoDB, Redis, or Snowflake.
- Scale: Prior work inside a high-growth startup environment.
- Architecture: Experience designing reliability guardrails during product development.
- Collaboration: Comfort partnering across engineering squads and product stakeholders.
Join Mercor to build the reliability foundation that powers thousands of professionals working on frontier AI projects.
Stay Updated on Roles Like This
Subscribe to receive fresh openings aligned with Engineering expertise across Mercor and JobHub by NeonLabs.