Site Reliability Engineer with Housing and Development Board


$60K - 108K a year

The mission of Housing & Development Board (HDB) is to provide affordable, quality housing and a great living environment where communities thrive. To achieve its mission, HDB aims to be data-driven to the core and adopt evidence-based decision making in developing better housing policies service, improving service delivery and optimising operations.

What you will be working on

  • You will be part of the Information Services Group that leads the development and implementation of enterprise-wide ICT solutions for HDB, working closely with in-housed or outsourced development teams to create and maintain scalable and highly reliable systems.
  • Your goal is to ensure the smooth delivery of digital services to delight our customers.
  • You will work in cross-functional teams. Be results oriented with strong ability to collaborate with and engage stakeholders.
  • You should also possess good problem-solving skills and an analytical mind; have excellent communication skills, both verbal and written; and be resilient to work in a fast-paced environment.
  • Define and implement systems' metrics and perform monitoring activities
  • Define and implement automations, visualisations and alerts on systems' health
  • Oversee staging releases to Production, ensuring stability and maintaining quality & efficiency in large scale cloud environments
  • Respond to and troubleshoot incidents, providing post-mortem analysis/ areas of improvement
  • Define and document best practices and strategies regarding application deployment and infrastructure maintenance
  • Collaborate with stakeholders and management, to identify and implement improvements towards efficient daily operations
  • Analyse and identify trends/ opportunities across the cloud environment to improve the performance of the applications

What we are looking for

  • Strong background in computer science, computer engineering, information technology or related field.
  • Minimum 3 years of experience in a SRE (Site Reliability Engineer) role, Infrastructure Engineering or Application support with DevOps.
  • Minimum 3 years of experience in one or more programming languages – Python/ Java and configuration management/ IAC tools – Ansible/ Terraform.
  • Strong experience in a Continuous Integration/Continuous Delivery (CI/CD) with hands-on working knowledge in Jira, Confluence, Gitlab.
  • Experience in container technologies using Docker or Kubernetes.
  • Experience in AWS cloud architectures and infrastructure management (Terraform / CloudFormation).
  • Experience in design and implementation of observability platforms.
  • Experience driving major production incidents and organised incident retrospective Meetings.
  • Good understanding of and experience in review and providing recommendation covering the AWS well architected framework.
  • Understanding of IT Service Management and Operations for Cloud.
  • Familiarity with the Singapore Government Tech Stack is preferred but not a must

Good to have:

  • Team Player; we work together as a team
  • Independent and take ownership of work responsibilities

Successful candidates will be offered a 1+1 year contract in the first instance. Conversion to perm is dependent on good performance.