Site Reliability Engineer

Posted 2 years ago

We are looking for an SRE who thrives on and enjoys solving complex problems through innovation and impacting change at scale. You will join an elite team of SREs driving technology’s transformation by launching new platforms, building tools, automating away complex issues, and integrating with the latest technology. Site Reliability Engineers leverage their experience as software and systems engineers to ensure applications managed by SRE are available, have full stack observability and have continuous improvement through code and automation.

  • Design and deliver software to improve the availability, scalability, latency, and efficiency of large-scale systems. No fear of complexity or scale.
  • Influence and create new designs, architectures, standards & best practices in support of service level objectives.  Relish change.
  • Share support responsibilities and participate in the 24×7 support coverage for critical applications.  Problems are just an opportunity to improve.
  • Design, code, test, and deliver software to automate manual operational work. Approach each challenge with “Anything can be fixed with software”.
  • Exercise failure cases regularly to validate resilience assumptions
  • Automate key SRE metrics including SLOs/SLAs and error budgetsRequierements:
  • Strong understanding of Cloud, API, Event Driven, and Microservices technologies for large scale environments.
  • Experience in performance engineering and monitoring using tools such as Geneos, AppDynamics, Splunk, Apica, Jmeter, DynaTrace, Capacity Manager, Blaze meter etc.
  • Significant experience building and/or configuring metrics, distributed tracing, and logging systems, preferably using CNCF technologies and standards
  • Experience in performance engineering and monitoring using tools such as Geneos, AppDynamics, Splunk, Apica, Jmeter, DynaTrace, Capacity Manager, Blaze meter etc.

Apply Online