Site Reliability Engineer with ByteDance


$120K - 240K a year

Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Helo, and Resso, as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.

The security team at ByteDance is missioned to build infrastructures, platforms and technologies, as well as to support cross-functional teams to protect our users, products and infrastructures. In this team you'll have a unique opportunity to have first-hand exposure to the strategy of the company in key security initiatives, especially in building scalable and secure-by-design systems and solutions. Our challenges are not your regular day-to-day technical problems; you'll be part of a team that's developing new solutions to new challenges of a kind not previously addressed by big tech. It's working fast, at scale, and we're making a difference.


  1. Work with product engineering team on system design, software platforms, capacity planning and launch reviews throughout whole lifecycle of services.
  2. Maintain sustainable scalability of software systems by building automations to measure and monitoring availability, scalability, latency and overall system health.
  3. Consistently evolve systems by pushing for changes that improve reliability and velocity.
  4. Practice sustainable incident response and postmortems.


  1. BS degree in Computer Science, Computer Engineering, Electrical Engineering or relevant majors with 3+ years of working experience.
  2. Excellent programming, debugging, and optimization skills in general purpose programming languages but not limited to: Go, Python, C/C++, Rust, or Java.
  3. Experience of analyzing and debugging production issues at scale.
  4. Experience and understanding of infrastructure-as-code concepts, approaches, methods, and tooling.

Preferred Requirements

  1. Hands on experience on large cloud providers such as AWS, Azure, GCP, Alibaba or IBM.
  2. Code Infrastructure with tools such as Kubernetes, Terraform, Ansible, Puppet, Chef or SaltStack.
  3. Secure infrastructure in a distributed system with automation or practice chaos engineering.
  4. Experience with two or more from the following areas: web application development, Unix/Linux environments, distributed and parallel systems, developing large software systems, mobile application development, and/or security software development.