Senior AIOps Engineer
Simpplr
Who We Are
Simpplr is the AI-powered platform that unifies the digital workplace – bringing together engagement, enablement, and services to transform the employee experience. It streamlines communication, simplifies interactions, automates workflows, and elevates the everyday experience of work. The platform is intuitive, highly extensible, and built to integrate seamlessly with your existing technology.
More than 1,000 leading organizations – including AAA, the NHS, Penske, and Moderna – trust Simpplr to foster a more aligned and productive workforce. Headquartered in Silicon Valley with global offices, Simpplr is backed by Norwest Ventures, Sapphire Ventures, Salesforce Ventures, and Tola Capital. Learn more at simpplr.com.
JOB DESCRIPTION
AI/LLM Ops Specialist
(Agentic AI & AI‑Native Infrastructure)
Role Summary
We are seeking an experienced AI/LLM Ops Specialist to own the deployment and operations of agentic AI systems at enterprise scale. You will accelerate our shift from AI‑enabled workflows to AI‑native infrastructure, implementing and running autonomous multi‑agent LLM solutions that drive mission‑critical processes.
Key Responsibilities
- Architect & Deploy Agentic AI Systems: Design, implement, and maintain multi‑agent orchestration frameworks (e.g., FastAgent, FastMCP, LangGraph, AutoGen, CrewAI) to coordinate autonomous LLM workflows.
- LLMOps & RAG Integration: Develop and operationalize RAG workflows, vector stores, and knowledge‑graph connectors.
- Monitoring, Observability & Reliability: Implement end‑to‑end observability using Prometheus/Grafana, OpenTelemetry, and incident playbooks.
- Governance, Security & Compliance: Enforce data governance, PII handling, and prompt‑audit trails across agent interactions.
- Collaboration & Enablement: Partner with data engineers, MLOps, and DevOps to align CI/CD for AI workloads and mentor teams.
- Continuous Improvement: Benchmark agent performance, optimize token usage, and research emerging agentic AI frameworks.
Core Qualifications
- 5+ years in production ML/LLM operations, with 2+ years in autonomous agent systems.
- Hands‑on experience with FastAgent, FastMCP, LangGraph, AutoGen, CrewAI, or similar.
- Expertise in Kubernetes, Docker, Terraform (or Pulumi), and GitOps workflows.
- Proven track record implementing RAG pipelines with Pinecone, Elasticsearch, or similar.
- Proficient in observability tools (Prometheus, Grafana, Jaeger/OpenTelemetry).
Bonus Qualifications
- Open‑source contributions or conference talks on agentic AI or LLMOps.
- Experience optimizing inference with vLLM, TensorRT‑LLM, or similar libraries.
- Certifications in cloud architecture or AI security.
- Familiarity with multi‑modal agent workflows or edge‑deployed agents. Experience leading AI‑first governance and decision‑loop monitoring.
Technologies & Frameworks
- Agentic AI Frameworks: FastMCP, FastAgent, Agent Builder, LangGraph, AutoGen, CrewAI
- Protocols & Standards: Model Context Protocol (MCP), A2A
- LLM Providers: OpenAI, Anthropic, Google Gemini
- Vector Stores & Indexing: Pinecone, Redis Vector DB, Weaviate, PGVector, LlamaIndex
- Monitoring & Observability: LangSmith, LangFuse, Arize, Coralogix
- Cloud & Orchestration: Kubernetes, Docker, AWS Bedrock/SageMaker, GCP Vertex AI, Azure ML
- Inference Optimization: vLLM, TensorRT‑LLM, Ollama
- IaC & CI/CD Tools: Terraform, Helm, GitHub Actions
Join us and be the driving force behind our evolution into an AI‑native, agentic future, crafting the autonomous systems that will power tomorrow’s innovation.
Simpplr’s Hub-Hybrid-Remote Model:
At Simpplr we believe that when work is good, life is better and that belief guides all we do. Including how we approach our flexible work model. Simpplr operates with a Hub-Hybrid-Remote model. This model is role-based with exceptions and provides employees with the flexibility that many have told us they want.
- Hub - 100% work from Simpplr office. Role requires Simpplifier to be in the office full-time.
- Hybrid - Hybrid work from home and office. Role dictates the ability to work from home, plus benefit from in-person collaboration on a regular basis.
- Remote - 100% remote. Role can be done anywhere within your country of hire, as long as the requirements of the role are met.