Job Title Here Experience Director

Title: VP - Site Reliability Engineering (Chennai/Kuala Lumpur/Bangalore)
Chennai, IN
Job Summary
- This role could be based in India and Malaysia. When you start the application process you will be presented with a drop down menu showing all countries, Please ensure that you select a country where the role is based.
RESPONSIBILITIES
- Lead the implementation and advocacy for SRE (SIte Reliablity Engineer) principles to improve the reliability and availability of our applications
- Drive work on setting and maintaining SLI/SLO/Error budgets for our applications
- Responsible for developing and executing on the Chapter Vision together with the other Chapter Leads
- Drive technology strategy, technology stack selection, and implementation for a future-ready technology stack, to achieve outcomes of highly scalable, robust, resilient system.
- Experienced former practitioner with leadership ability.
- Oversees the execution of functional standards and best practices
- Provide thought leadership on the craft, inspire and retain talents by developing and nurturing an extensive internal and external network of practitioners.
- This role is around capability building, it is not to own applications or delivery
- Creates a strategy roadmap of technical work
- Works to drive technology convergence and simplification across their chapter area
Technical Responsibilities
- Service Reliability: Monitor and maintain the reliability, availability, and performance of production services and infrastructure.
- Automation and Tooling: Develop and maintain automation tools and processes to streamline system provisioning, configuration management, deployment, and monitoring.
- Incident Management: Respond to and troubleshoot incidents, outages, and performance issues in production environments, ensuring timely resolution and minimal impact on users.
- Blameless Postmortems and Learning from Incidents – Participate in the wider root cause analysis and support & drive collaborative actions.
- Capacity Planning: Analyze system performance and capacity trends to forecast future resource requirements and optimize infrastructure utilization.
- Performance Optimization: Identify and address performance bottlenecks and optimization opportunities across the software stack, from application code to underlying infrastructure.
- Security and Compliance: Implement security best practices and ensure compliance with regulatory requirements, collaborating with security and compliance teams as needed.
- Continuous Improvement: Continuously evaluate and improve system reliability, scalability, and performance through automation, process refinement, and technology upgrades.
- Documentation and Knowledge Sharing: Document system designs, configurations, and procedures, and share knowledge with team members through documentation, training, and mentoring.
Strategy
- Reliability Engineering Strategy – Develop and execute a comprehensive reliability engineering strategy to ensure high availability, fault tolerance and disaster recovery capabilities for critical systems and services
- Scalability Planning – Design and implement scalable architecture solution that can accommodate growth in user traffic and data volume over time
- Monitory and Alerting Strategy – Defining and implementing monitoring and alerting strategies to proactively identify and address issues before they reach the end users
- Capacity Planning Strategies – Develop capacity planning strategies to ensure that systems have sufficient resources to handle current and future workloads
Business
- Experienced practitioner and hands on contribution to the squad delivery for their craft (Eg. SRE).
- Responsible for balancing skills and capabilities across teams (squads) and hives in partnership with the Chief Product Owner & Hive Leadership, and in alignment with the fixed capacity model.
- Responsible to evolve the craft towards improving automation, simplification and innovative use of latest market trends.
- Trusted advisor to the business. Work hand in hand with the Business, taking product programs from investment decisions, into design, specification, and solution phases, all the way to operations on the ground and securing support services from other teams.
- Provide leadership and technical expertise for the subdomain to achieve goals and outcomes
- Support respective businesses in the commercialisation of capabilities, bid teams, monitoring of usage, improving client experience, and collecting defects for future improvements.
- Manage business partner expectations. Ensure delivery to business meeting time, cost and with high quality
Processes
- Chapter Lead may vary based upon the specific chapter domain its leading.
- Define standards to ensure that applications are designed with scale, resilience and performance in mind
- Enforce and streamline sound development practices and establish and maintain effective governance processes including training, advice and support, to assure the platforms are developed, implemented and maintained aligning with the Group’s standards
- Responsible for overall governance of the subdomain that includes financial management, risk management, representation in steering committee reviews and engagement with business for strategy, change management and timely course correction as required
- Ensure compliance to the highest standards of business conduct, regulatory requirements and practices defined by internal and external requirements. This includes compliance with local banking laws and anti-money laundering stipulations
People & Talent
- Accountable for people management and capability development of their Chapter members.
- Reviews metrics on capabilities and performance across their area, has improvement backlog for their Chapters and drives continual improvement of their chapter.
- Focuses on the development of people and capabilities as the highest priority.
- Ensure that the organisation works in a proactive way to upgrade capacity well in advance and predict future capacity needs
- Responsible for building an engineering culture where application and infrastructure scalability is paramount for on-going capacity management with an aim to reduce the need for capacity reviews using monitoring and auto-scale properties
- Empower the engineers so that they can provide economy of scale focused on delivering value, speed to market, availability, monitoring & system management
- Foster a culture of innovation, transparency, and accountability end to end in the subdomain while promoting a “business-first” mentality at all levels
- Develop and maintain a plan that provides for succession and continuity in the most critical delivery and management position
Risk Management
- Responsible for effective capacity risk management across the Chapter with regards to attrition and leave plans.
- Ensures the chapter follows the standards with respect to risk management as applicable to their chapter domain.
- Adheres to common practices to mitigate risk in their respective domain.
- Effectively and collaboratively identify, escalate, mitigate, and resolve risk, conduct and compliance matters.
- Incident Response Planning – Develop incident response plans and procedures to effectively mitigate and manage risks when they materialize
- Risk monitoring and alerting – Implement monitoring and alerting systems to detect early warning signs of potential risks
- Root Cause analysis – Conduct thorough root cause analysis of incidents and outages to understand the underlying causes and contributing factors
- Ensure that the organisation works in a proactive way to upgrade capacity well in advance and predict future capacity needs
- Responsible for building an engineering culture where application and infrastructure scalability is paramount for on-going capacity management with an aim to reduce the need for capacity reviews using monitoring and auto-scale properties
- Empower the engineers so that they can provide economy of scale focused on delivering value, speed to market, availability, monitoring & system management
Regulatory & Governance
- Ensure all artefacts and assurance deliverables are as per the required standards and policies (e.g., SCB Governance Standards, ESDLC etc.).
- Display exemplary conduct and live by the Group’s Values and Code of Conduct.
- Take personal responsibility for embedding the highest standards of ethics, including regulatory and business conduct, across Standard Chartered Bank. This includes understanding and ensuring compliance with, in letter and spirit, all applicable laws, regulations, guidelines and the Group Code of Conduct.
Key Stakeholders
- Chief Product Owner, Hive Lead, Product Owners, Engineering Leads
- WRB Application Teams
Other Responsibilities
- Embed Here for Good and Group’s brand and values in the digital sales/commerce team
- Perform other responsibilities assigned under Group, Country, Business or Functional policies and procedures
Qualification
Requirements & Skills
- Bachelor's degree in computer science, Information Technology, or related field (or equivalent experience).
- Overall experience of 15+ years
- Proven experience of at least 10+ years as an SRE Engineer or in a similar role, with a proven track record of leadership.
- Strong understanding of SRE principles and practices.
- Proficiency in troubleshooting complex issues and exceptional problem-solving skills.
- Deep knowledge of a wide array of software applications and infrastructure.
- Experience with monitoring and observability tools (e.g., Prometheus, Grafana, AppDynamics, Splunk, PagerDuty).
- Proficiency in scripting and automation (e.g., Python, Bash, Ansible).
- Familiarity with cloud platforms (e.g., AWS, Azure) and containerization technologies (e.g., Docker, Kubernetes).
- Excellent communication and collaboration skills.
- Ability to work in a fast-paced, dynamic environment.
- Strong attention to detail and a commitment to delivering high-quality results.
- Ability to debug and troubleshoot Java applications.
- Proficiency in using Splunk for log management and analysis.
- Familiarity with CI/CD tools and practices.
- Experience in the banking or financial services industry.
- Certification in relevant technologies (e.g., AWS Certified Solutions Architect, Google Cloud Professional DevOps Engineer).
- Knowledge of security best practices and compliance requirements.
- Ability to articulate the overall vision for the Chapters and ensure upskilling of the organisation holistically
- Experience in identifying skill gaps and mitigate risks to deliverables
- Ensure all solutions are as per Architecture Standards
- Strong experience in software development, system administration, or a related technical field.
- Proficiency in programming/scripting languages such as Python, Go, Java, or Shell scripting.
- Experience with containerization and orchestration technologies such as Docker, Kubernetes, or similar.
- Deep understanding of Linux/Unix systems and networking fundamentals.
- Experience with cloud platforms such as AWS, GCP, or Azure.
- Strong analytical and problem-solving skills, with a keen attention to detail.
- Excellent communication and collaboration skills, with the ability to work effectively in a cross-functional team environment.
- Prior experience with DevOps practices, continuous integration/continuous delivery (CI/CD) pipelines, and infrastructure as code (IaC) is a plus.
Role Specific Technical Competencies
- Software Engineering
- Systems Software Infrastructure
- Platform Architecture
- Programming & Scripting (Java / Python or Similar Programming Language)
- Cloud (AWS, Azure, GCP)
- Database Development
- Service Excellence
- Agile Application Delivery Process
- Operating Systems
- Network Fundamentals
- Security Fundamentals
- Credit Card and Lending Domain Knowledge
About Standard Chartered
We're an international bank, nimble enough to act, big enough for impact. For more than 170 years, we've worked to make a positive difference for our clients, communities, and each other. We question the status quo, love a challenge and enjoy finding new opportunities to grow and do better than before. If you're looking for a career with purpose and you want to work for a bank making a difference, we want to hear from you. You can count on us to celebrate your unique talents and we can't wait to see the talents you can bring us.
Our purpose, to drive commerce and prosperity through our unique diversity, together with our brand promise, to be here for good are achieved by how we each live our valued behaviours. When you work with us, you'll see how we value difference and advocate inclusion.
Together we:
- Do the right thing and are assertive, challenge one another, and live with integrity, while putting the client at the heart of what we do
- Never settle, continuously striving to improve and innovate, keeping things simple and learning from doing well, and not so well
- Are better together, we can be ourselves, be inclusive, see more good in others, and work collectively to build for the long term
What we offer
In line with our Fair Pay Charter, we offer a competitive salary and benefits to support your mental, physical, financial and social wellbeing.
- Core bank funding for retirement savings, medical and life insurance, with flexible and voluntary benefits available in some locations.
- Time-off including annual leave, parental/maternity (20 weeks), sabbatical (12 months maximum) and volunteering leave (3 days), along with minimum global standards for annual and public holiday, which is combined to 30 days minimum.
- Flexible working options based around home and office locations, with flexible working patterns.
- Proactive wellbeing support through Unmind, a market-leading digital wellbeing platform, development courses for resilience and other human skills, global Employee Assistance Programme, sick leave, mental health first-aiders and all sorts of self-help toolkits
- A continuous learning culture to support your growth, with opportunities to reskill and upskill and access to physical, virtual and digital learning.
- Being part of an inclusive and values driven organisation, one that embraces and celebrates our unique diversity, across our teams, business functions and geographies - everyone feels respected and can realise their full potential.