Livepeer is the video layer of the web3 stack. It processes millions of minutes of video a week to applications in production every week. As a blockchain-based protocol that has been live on Ethereum since 2018, the Livepeer Protocol is one of the first web3 networks to deliver real-world value to an existing multi-billion dollar industry. It achieves this through an open-source, blockchain coordinated, p2p network of hardware operators running infrastructure processing video around the world.
What we're looking for
Livepeer Inc, the team behind the Livepeer, is looking for an experienced, self-driven SRE Engineer – someone that loves to automate everything and deliver the best production experiences for end users. He/she is passionate about keeping all user-facing services and livepeer production systems running smoothly. He/she specializes in systems (operating systems, storage subsystems, networking), while implementing best practices for availability, reliability and scalability, with varied interests in algorithms and distributed systems.
We value reliability. We approach the infrastructure with craft and think a lot about form and function. You should feel equally at home talking to developers and designers. We are looking for someone who cares about the reliability of the infrastructure as much as we do. You will ensure the final product is high quality and works as intended.
- Think about systems: edge cases, failure modes, behaviours, specific implementations.
- Know your way around Linux and the Unix Shell.
- Know what is the use of configuration management systems like Chef and Ansible.
- Have strong programming skills: Shell, Python and/or Go.
- Have an urge to collaborate and communicate asynchronously.
- Have an urge to document all the things so you don't need to learn the same thing twice.
- Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it. Have an urge for delivering quickly and effectively, and iterating fast.
- Have experience with Nginx, HAProxy, Docker, Kubernetes, Terraform, or similar technologies - Ability to use github
- Be on an on-call (PagerDuty) rotation to respond to incidents that impact livepeer availability, and provide support for service engineers with customer incidents.
- Use your on-call shift to prevent incidents from ever happening.
- Run our infrastructure with Chef, Ansible, Terraform, Github CI/CD, and Kubernetes.
- Build monitoring that alerts on symptoms rather than on outages.
- Document every action so your findings turn into repeatable actions and then into automation.
- Improve operational processes (such as deployments and upgrades) to make them as boring as possible.
- Design, build and maintain core infrastructure that enables livepeer scaling to support hundreds of thousands of concurrent users.
- Debug production issues across services and levels of the stack. Plan the growth of livepeer infrastructure.