CORE – Senior Site Reliability Engineer
Within the Site Reliability Engineering our goal is to provide technical solutions to complex production problems with a focus on reduction of incident and problem toil, speeding detection and recovery of critical incidents through observability and continuous improvement through operational health measurement and sharing.
What You Will Work On
The following are a Site Reliability Engineer’s responsibility for this role but is not limited to:
- Drive reliability throughout the Engineering Organizations through Observability, informed architectural improvements, and automation.
- Collaborate closely with Engineering teams to build cohesive service operation solution into the overall service design.
- Build and enhance the DevOps process, environment and tool chains for high service reliability and availability.
- Exercise and optimize the service operation process to support the whole service with all partner teams. Mitigate and recover live site incident efficiently.
- Bachelor’s degree in Computer Science, Engineering, Math, Science or another technical field
- 5+ years of working experience in IT industry in building large scale applications/services on platforms like AWS/AZURE.
- Proficient in building micro services using Java in Cloud platform.
- Understanding of distributed systems architecture.
- 3+ years of experience in software development automating business processes using Java, Node or Python on Cloud platform
- Experience in supporting high available and scalable systems with ability to debug/troubleshoot live systems
- Adaptive and flexible to manage multiple tasks with changing priority.
- Hands on experience with Observability tools like Splunk, NewRelic, Azure monitor or CloudWatch.
- Good troubleshooting skills and deep understanding of Metrics, Logs and Traces.
- Experience, interest, and adaptability to working in a Lean Scaled Agile delivery environment.
- Exceptional written, verbal, and interpersonal communication skills with management, technical peers, and business stakeholders.
What You Bring
- 3+ years of software development experience
- 2 - 3 years’ experience in building cloud-based enterprise systems, ideally on AWS.
- Expertise in Infrastructure automation using open-source tools like Terraform.
- Expertise in Monitoring & Alerting concepts
- Demonstrable knowledge of Linux operating system internals, TCP/IP, filesystems, disk/storage technologies
- Basic understanding of DNS, Networking, Virtualization
- Experience with Docker and/or Serverless patterns.
- Experience in building scalable Micro Services in Cloud
- Experience with expertise in other modern enterprise languages (functional or other - Scala, Python, Golang, etc.)