Job Title: Lead/Senior DevOps Engineer (Scripting/Public Cloud)
Location: India – Hyderabad / Bangalore
Work Type: Full-time
Date Posted: 06 June 2025
Job ID: JR283130
Salesforce’s Tech and Product team is responsible for building and maintaining a large-scale, distributed systems platform that supports millions of users. The platform must be secure, customizable, reliable, and efficient, supporting continuous feature deployment at scale. As a Site Reliability Engineer, you'll help manage the Kubernetes-based infrastructure behind Salesforce’s Core CRM and other services.
Manage and ensure high availability of Kubernetes clusters and microservices-based infrastructure.
Troubleshoot complex production issues in real time across the Kubernetes ecosystem.
Contribute to internal codebases to improve platform reliability and developer experience.
Automate routine operational tasks using Python, Golang, Terraform, Spinnaker, Puppet, and Jenkins.
Enhance observability using monitoring tools and by implementing metrics and alerting.
Implement proactive, self-healing mechanisms to resolve issues before they impact users.
Collaborate with Infrastructure, Development, and Product teams across Salesforce.
Evaluate and adopt new technologies to improve platform scalability and stability.
Work on projects involving scalable distributed systems, high-availability data storage, and clustering.
Interact with and support internal development teams to enable faster and safer deployments.
Experience managing large-scale distributed systems in cloud environments (AWS preferred).
Strong troubleshooting abilities and a mindset for continuous learning.
Solid Linux Systems Administration experience with good knowledge of internals.
Proficient in scripting or programming using Python or Go.
Basic understanding of networking (TCP/IP, switches, routers, load balancers).
Hands-on experience with configuration management tools (Puppet, Chef, Ansible).
Familiarity with monitoring tools (Nagios, Grafana, Zabbix, etc.).
Experience with Kubernetes, Docker, or service mesh technologies.
Knowledge of infrastructure-as-code and deployment tools (Terraform, Spinnaker).
Strong communication, collaboration, and problem-solving skills.
Experience working with clustering, system programming, APIs, and public cloud services.
Ability to manage the lifecycle of infrastructure software across multiple disciplines.
A passion for service ownership and platform reliability.
Practical knowledge of working with cloud APIs, Kubernetes APIs, and distributed data systems.
Ability to learn and implement new technologies in real-world production environments.
Previous experience in a DevOps, SRE, or Cloud Engineering role at scale.
Comprehensive benefits including wellness reimbursement, parental leave, and fertility/adoption support.
World-class enablement through Trailhead.com.
Regular 1:1 mentorship and access to leadership.
Opportunities to volunteer and engage in Salesforce’s 1:1:1 philanthropic model.