Customer Reliability Engineer with NetApp

Remote (US)

$166.5K - 203.5K a year

We're forward-thinking technology people with heart. We make our own rules, drive our own opportunities, and try to approach every challenge with fresh eyes. Of course, we can't do it alone. We know when to ask for help, collaborate with others, and partner with smart people. We embrace diversity and openness because it's in our DNA. We push limits and reward great ideas. What is your great idea?

"At NetApp, we fully embrace and advance a diverse, inclusive global workforce with a culture of belonging that leverages the backgrounds and perspectives of all employees, customers, partners, and communities to foster a higher performing organization." -George Kurian, CEO

Job Summary

As a Customer Reliability Engineer, you'll manage a portfolio of customer-facing cloud services (SaaS/IaaS) ensuring overall availability, performance and security. You'll work in a highly collaborative environment with NetApp and Google/AWS/Microsoft teams from all over the world (RTP, Reykjavík, Bangalore, Sunnyvale, Redmond, and more). This position includes rotational on-call work as part of a global team due to the critical nature of the services we support.

Job Requirements

You will be working in a hectic and fast paced organization as an engineer on the Customer Reliability Engineering (CRE) team. This team is responsible for assisting NetApp Cloud Volume Services (CVS) and Astra customers in resolving complex technical issues in production environments. 

We are looking for a CRE with a deep understanding of complex distributed system platforms/cloud technologies and ability to simply articulate it to customers and SREs within a customer organization. 

You will have the opportunity to work with your teammates and our customers to support many new, leading-edge technologies that solve real challenges. You will work to provide robust feedback and guidance to our Product and Engineering teams while being a voice for our customers. You want to make our customers successful while strengthening their relationship with NetApp. You can make a huge impact and have real ownership for the work you do.

Job Requirements

Essential Responsibilities

  • Work with external customers and partners to help make them successful
  • Respond to, troubleshoot and drive root cause analysis (RCA) of complex live production incidents and cross platform issues handling OS, Networking and Database in a cloud-based SaaS / IaaS environments by following and implementing SRE best practices
  • Continuously monitor, analyze and measure the availability, latency and overall system health using tools like Prometheus, Stackdriver, ElasticSearch, Grafana and SolarWinds as well as develop steps to improve system and application performance, availability and reliability
  • Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available
  • Keep up-to date with security and proactively identify, diagnose, and solve complex security issues
  • Maintain and monitor deployment, orchestration of the servers, docker containers, databases, and general backend infrastructure
  • Apply automation to any tasks or parts of the system that would benefit from it or are performed manually
  • Utilize Atlassian Jira to track issues to resolution based on their priority

Qualifications

  • Advanced knowledge of the Incident Management processes and ability to resolve issues within agreed organization SLA/SLO
  • Advanced knowledge of Linux operating systems (Ubuntu, CentOS, etc.)
  • Advanced knowledge of container-based architecture (Kubernetes)
  • Advanced knowledge of tools like Ansible, Python, Bash, Go, PowerShell and other scripting language
  • Intermediate knowledge in algorithms, data structures and databases (SQL/NoSQL)
  • Intermediate knowledge of networking concepts
  • Intermediate understanding of cloud environments such as GCP or AWS
  • Intermediate knowledge of site reliability engineering principles

Education

  • BS in computer science or equivalent or 10+ years professional experience