Skip navigation EPAM

Site Reliability Engineer Remote

  • hot

Site Reliability Engineer Description

Job #: 57651
EPAM is committed to providing our global team of more than 41,150 EPAMers with inspiring careers from day one. EPAMers think creatively and lead with passion and honesty. Our people are the source of our success. We value collaboration, work in partnership with our customers, and strive for the highest standards of excellence. In today’s market conditions, we’re supporting operations for hundreds of clients around the world remotely. No matter where you are located, you’ll join a dedicated, diverse community that will help you discover your fullest potential.

Description


You are curious, persistent, logical and clever – a true techie at heart. You enjoy living by the code of your craft and developing elegant solutions for complex problems. If this sounds like you, this could be the perfect opportunity to join EPAM as a Site Reliability Engineer. Scroll down to learn more about the position’s responsibilities and requirements.
#LI-DNI
#LI-DNP

What You’ll Do

  • Takes ownership and accountability of the Product/site reliability
  • Champion the DevSecTestOps (DTSO), analyze code for reliability issues, components, infrastructure and system level
  • Work with architects, teach leads, test leads and stakeholders to identify points of failure
  • Define Blue-Green deployment approach to enable zero-downtime deployment
  • Define the strategies, patterns, solution to improve the reliability of the system by reducing/eliminating points of failure
  • Define automatic healing and recovery strategies, and work with development team to implement such strategies
  • Define the type of errors, exceptions and messages to be monitored that will trigger the alerts and recovery
  • Establish best practices for system logging, monitoring, health checks, and recovery
  • Define approach for scale up and scale down and ensure Infrastructure provisioning scripts meet required implementation
  • Work with QA lead, Tech leads, architects to ensure test automation, security testing is integrated with DTSO pipeline
  • Define key operational metrics and apply tools to gauge the product health in development, test and production environment

What You Have

  • Strong technologist with hands on experience with DTSO automation including CI/CD pipeline
  • AWS Cloud, Fargate or other container experience
  • Ability to zoom in and zoom out to understand the holistic architecture and design of the product/site
  • Strong full-stack architecture experience (Front End, APIs, Database etc.)
  • Solid experience on Infrastructure provisioning, and automated deployment
  • Hands on knowledge on monitoring tools such as Splunk, Cloudwatch, etc
  • Ability to proactively identify product/site operational issues
  • Challenge the status quo and navigate vagueness with ease
  • Strong communication, collaboration, self-driven, leadership skills
  • Ability to prioritize effectively between competing priorities
  • DTSO automation (huge plus)

What We Offer

  • Medical, Dental and Vision Insurance (Subsidized)
  • Health Savings Account
  • Flexible Spending Accounts (Healthcare, Dependent Care, Commuter)
  • Short-Term and Long-Term Disability (Company Provided)
  • Life and AD&D Insurance (Company Provided)
  • Employee Assistance Program
  • Unlimited access to LinkedIn learning solutions
  • Matched 401(k) Retirement Savings Plan
  • Paid Time Off
  • Legal Plan and Identity Theft Protection
  • Accident Insurance
  • Employee Discounts
  • Pet Insurance
  • REQ #: 209997482

Наши сотрудники

ИЛЬЯ ПРИТУЛА
Ресурсный менеджер
Санкт-Петербург, Россия

ОЛЬГА КАЗАКОВА 
Руководитель проектов
Москва, Россия

ИЛЬЯ РОМАНОВ
Глава практики e-commerce
Москва, Россия

АЛЕКСЕЙ УДАЛОВ
Архитектор решений
Амстердам, Нидерланды

ЧЕМ МЫ МОЖЕМ ВАМ ПОМОЧЬ?


Наши офисы