Site Reliability Engineer with ByteDance

Singapore

$120K - 240K a year

Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Helo, and Resso, as well as platforms specific to the China market, including Toutiao, Douyin, and Xigua, ByteDance has made it easier and more fun for people to connect with, consume, and create content.

The Datacenter Infrastructure Engineering team supports the company's fast growth by building and operating hyperscale datacenters. The team manages the end to end lifecycle of server fleet, providing cloud solutions and various infrastructure services ensuring that they are scalable and are reliable.

Responsibilities

  • Build, expand and operate Bytedance's global traffic platform, including large-scale systems in public and private clouds, edge data centers and content delivery networks.
  • Build tools, automations, visualizations and monitors to facilitate the operation and optimization of the global traffic platform.
  • Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues.
  • Help improve the whole lifecycle of infrastructure services from inception and design throughout development, to deployment, user support and refinement

Qualifications

  • Master's degree (or Bachelor's degree with 3+) years of experience in Computer Engineering, Electrical Engineering, Computer Science or related major
  • 3+ years experience working with Linux systems from kernel to shell and beyond with experience working with system libraries, file systems, and client-server protocols.
  • 3+ years experience in one or more programming languages such as Go, Python and Shell script.
  • Familiar with Cloud and CI/CD framework/Tools, such as GIT, Docker, Kubernetes, etc.
  • Self-driven and capable of coping with ambiguity and moving projects from concept to delivery.
  • Strong in analytical skills and the ability to solve real world problems in a fast moving environment.

Preferred qualifications

  • Experience in designing, analyzing and building automation and tools for large scale systems
  • Experience in building solutions with AWS, Google, Azures and other cloud services.
  • Experience in networking technologies such TCP/IP, HTTP, DNS, etc. in a carrier-grade environment.
  • Experience in developing and operating one or more of following systems: OpenStack, Kubernetes, Nginx, ipvs, ELK stack, Hadoop, etc.