Site Reliability Engineer with Oracle


$66K - 99K a year

Oracle is leading the digital revolution. We are empowering nearly half a million businesses to thrive in the age of skyrocketing connections. Join us and play an instrumental role in masterminding the software that will have a truly global impact.


What You’ll Do

  • Engage in and improve the whole Java Management Service lifecycle of applications deployment and operation
  • Improve the existing continuous deployment pipeline for a wide range of functionalities across geographically separated zones
  • Improve JMS Observability platform, Security and Incident management to meet the SLAs and SLOs defined for all Oracle cloud services
  • Architect highly available and scalable service
  • Skills to troubleshoot and trace symptoms back to the root cause 
  • Document and present methodologies to operations, engineering, and executive teams 
  • Educate the wider engineering organization on design and operational best practices for distributed computing 
  • Helping to meet the SLAs/SLOs for internal and external services and continual improvement of operational processes (weekly ops meetings, metrics, etc)
  • Build tools and automation to improve system observability, availability, reliability, performance/latency, monitoring, emergency response
  • On-call duties

Required Skills/Experience

What You’ll Bring

  • Strong track record of implementing services on OCI/AWS/GCP/Azure in a variety of distributed computing environments, with good understanding on Docker, Kubernetes
  • Understanding of CNI/CNCF landscape is good to have
  • Strong knowledge of runtimes of Storage/RDBMS and NoSQL databases
  • Experience in implementing multi cloud networking and deployment architecture
  • Good understanding of the L3/4/7 network layers (including SDN) 
  • Hand on design, coding on any one of - Python, Shell, Go or Java
  • Strong debugging/troubleshooting skills
  • Experience on implementing observability platforms using any of products suites like DataDog, NewRelic, ELK, Prometheus preferably using Grafana
  • Strong Experience with infrastructure automation and monitoring tools- Terraform, Helm, Ansible, Puppet, Chef, etc
  • Experience with modern cloud development practices (microservices architectures, REST interfaces, etc.) 
  • Deep working knowledge on Linux servers and networking preferably Oracle Linux