لم يعد يتم قبول المزيد من الطلبات لهذه الوظيفة
- OS: Linux Ubuntu;
- Web server: Nginx;
- Monitoring: Grafana, Prometheus, Graylog, Jaeger;
- CI/CD: Jenkins, Git, Gitlab, Docker;
- Automation: Python, Bash;
- SCM: Ansible, Chef;
- IaC: Terraform. Pulumi;
- DB: PostgreSQL, Redis, Keydb, MySQL;
- Cloud: Openstack, AWS, GCP, DO.
- Review processes, platform and infrastructure;
- Implementation of Grafana OnCall;
- Review and rework ITSM processes if needed.
- Identification of bottlenecks and preparation of recommendations to improve the reliability of services;
- Responding to platform emergencies, localizing and resolving the causes of failures, compiling postmortem reports;
- Development of monitoring and alerting tools ensuring high availability and quick detection of potential issues: (Grafana, Grafana OnCall, Prometheus Alert manager, etc.);
- Active participation in change management processes, including assessment and coordination of changes to the infrastructure within Change Advisory Board (CAB) sessions;
- Implementation and support of ITSM processes to optimize team workflow and enhance service quality.
- Development and maintenance of documentation in an up-to-date state.
- 3+ years of experience in SRE/DevOps;
- Understanding of SRE principles, practical experience in implementing SRE practices;
- Understanding of principles and practical experience in building resilient systems;
- Experience with monitoring and logging systems (Prometheus, Graylog, Grafana).
- Experience with automation tools for software build and deployment (CI/CD): GitLab, Jenkins;
- Understanding of virtualization and containerization principles;
- Understanding of Infrastructure as Code (IaC) approaches and experience;
- Proficiency in a programming language for automation script development (Python, Nodejs, Golang, etc.), ability to understand service code;
- Understanding of network protocols, topologies, and network models;
- Experience with configuration management tools: Ansible, Chef;
- Basic experience with relational databases, such as PostgreSQL;
- Experience in administering Linux operating systems;
- Fluency in English and Russian (B2 minimum).
- Experience in implementing monitoring and logging systems from scratch;
- Experience with k8s, Openstack;
- Advanced programming skills in any language.
Senior Site Reliability Engineer - Amman, الأردن - Quadcode
وصف
Senior Site Reliability EngineerTech stack
Examples of first tasks in the role:
Responsibilities in the role:
Requirements:
As an advantage: