Lead, SRE

Requisition Number: 48982

Job Location: Bangalore, IND

Work Type: Office Working

Employment Type: Permanent

Posting Start Date: 17/02/2026

Posting End Date: 03/03/2026

Job Description:

Job Summary

• Design, implement, and maintain scalable, reliable, and highly available systems.
• Develop and maintain automation tools for infrastructure provisioning, monitoring, and incident response.
• Collaborate with development teams to improve system operability and ensure reliability best practices are followed.
• Monitor system performance, identify bottlenecks, and implement solutions to improve reliability and scalability.
• Troubleshoot production issues, perform root cause analysis, and implement fixes to prevent recurrence.
• Collaborate with devops team to build and maintain CI/CD pipelines to automate deployments and testing.
• Implement and manage monitoring, alerting, and logging solutions using tools like Prometheus, Grafana, and Loki.
• Ensure systems are secure and compliant with organizational policies and standards.
• Conduct post-incident reviews and drive improvements to reduce mean time to recovery (MTTR).
• Champion a culture of reliability, automation, and continuous improvement within the team.
• To be successful, the candidate will have a strong understanding of system reliability principles and will work to achieve the related business objectives:
• Effectively manage themselves and their tasks during the project lifecycle.
• Identify and mitigate risks that could impact system reliability and availability.
• Engage with multiple stakeholders and vendors to ensure alignment on reliability goals.

Key Responsibilities

Strategy
• 8+ years of experience in Site Reliability Engineering or DevOps, with a strong focus on automation, monitoring, and system reliability.
Business
• Strong experience in designing and implementing scalable, reliable, and fault-tolerant systems.
• Proficient in infrastructure automation tools like Terraform, Ansible, or equivalent.
• Hands-on experience with CI/CD tools like Jenkins, Azure DevOps (ADO), or GitLab CI/CD.
• Strong knowledge of monitoring and observability tools such as Prometheus, Grafana, Loki, or equivalent.
• Proficient in scripting and automation using Python, Bash, or similar languages.
• Experience with containerization (Docker, Podman) and orchestration platforms (Kubernetes).
• Strong understanding of cloud platforms (AWS, Azure, or GCP) and infrastructure as code (IaC) principles.
• Experience in troubleshooting and optimizing Linux-based systems.
• Hands-on experience in setting up and managing logging and alerting systems.
• Experience in conducting post-incident reviews and implementing reliability improvements.
• Familiarity with security best practices and compliance standards.
Desired skills (good to have):
• Exposure to Generative AI and knowledge/experience in implementing AI solutions for system reliability.
• Experience with chaos engineering tools to test system resilience.
• Knowledge of database performance tuning and optimization.
• Experience with service mesh technologies like Istio or Linkerd.

Processes
• Responsible for implementing end to end SRE engineering solutions.
People & Talent
• This role is not a people management role.
Risk Management
• This role is not a people management role.
Governance
• Responsible for assessing the effectiveness of the Group's arrangements to deliver effective governance, oversight and controls in the business and, if necessary, oversee changes in these areas; Awareness and understanding of the regulatory framework, in which the Group operates, and the regulatory requirements and expectations relevant to the role.
Regulatory & Business Conduct
• Display exemplary conduct and live by the Group’s Values and Code of Conduct.
• Take personal responsibility for embedding the highest standards of ethics, including regulatory and business conduct, across Standard Chartered Bank. This includes understanding and ensuring compliance with, in letter and spirit, all applicable laws, regulations, guidelines and the Group Code of Conduct.
• Effectively and collaboratively identify, escalate, mitigate and resolve risk, conduct and compliance matters.

Skills and Experience

• Site Reliability Engineering
• Infrastructure Automation (Terraform, Ansible)
• CI/CD Tools (Jenkins, Azure DevOps)
• Monitoring and Observability (Prometheus, Grafana, Loki)
• Java AND/OR Python
• Linux System Administration
• Scripting (Python, Bash)
• Containerization and Orchestration (Docker, Kubernetes, Podman)
• Cloud Platforms (AWS, Azure, GCP)
• Chaos Engineering
• AI/Generative AI/LLMs/SLMs

Qualifications

Education
• Bachelor’s Degree Or Higher
Training
• Devops & ai certifications good to have
Languages
• English

About Standard Chartered

We're an international bank, nimble enough to act, big enough for impact. For more than 170 years, we've worked to make a positive difference for our clients, communities, and each other. We question the status quo, love a challenge and enjoy finding new opportunities to grow and do better than before. If you're looking for a career with purpose and you want to work for a bank making a difference, we want to hear from you. You can count on us to celebrate your unique talents and we can't wait to see the talents you can bring us.

Our purpose, to drive commerce and prosperity through our unique diversity, together with our brand promise, to be here for good are achieved by how we each live our valued behaviours. When you work with us, you'll see how we value difference and advocate inclusion.

Together we:

Do the right thing and are assertive, challenge one another, and live with integrity, while putting the client at the heart of what we do
Never settle, continuously striving to improve and innovate, keeping things simple and learning from doing well, and not so well
Are better together, we can be ourselves, be inclusive, see more good in others, and work collectively to build for the long term

What we offer

In line with our Fair Pay Charter, we offer a competitive salary and benefits to support your mental, physical, financial and social wellbeing.

Core bank funding for retirement savings, medical and life insurance, with flexible and voluntary benefits available in some locations.
Time-off including annual leave, parental/maternity (20 weeks), sabbatical (12 months maximum) and volunteering leave (3 days), along with minimum global standards for annual and public holiday, which is combined to 30 days minimum.
Flexible working options based around home and office locations, with flexible working patterns.
Proactive wellbeing support through Unmind, a market-leading digital wellbeing platform, development courses for resilience and other human skills, global Employee Assistance Programme, sick leave, mental health first-aiders and all sorts of self-help toolkits
A continuous learning culture to support your growth, with opportunities to reskill and upskill and access to physical, virtual and digital learning.
Being part of an inclusive and values driven organisation, one that embraces and celebrates our unique diversity, across our teams, business functions and geographies - everyone feels respected and can realise their full potential.

Provider	Description	Enabled
LinkedIn	LinkedIn is an employment-oriented social networking service. We use the Apply with LinkedIn feature to allow you to apply for jobs using your LinkedIn profile. Opting out of LinkedIn cookies will disable your ability to use Apply with LinkedIn. Cookie Policy Cookie Table Privacy Policy Terms and Conditions
Google Analytics	Google Analytics is a web analytics service offered by Google that tracks and reports website traffic. Cookie Information Privacy Policy Terms and Conditions
Google Tag Manager	Google Tag Manager is a tag management system for conversion tracking, site analytics, remarketing and more. Privacy Policy Terms and Conditions