We will audit the entire infrastructure, test for failures and fault tolerance, examine the monitoring and alert system, and check code quality..
We give recommendations on optimization and improvement of infrastructure, consult on monitoring systems and provide detailed instructions..
- install applications for round-the-clock monitoring;
- respond to accidents within 15 minutes;
- carry out project documentation;
- technical support 24/7;
- backup and data integrity control.
Three SRE engineers and project manager are working on the project
- Set up servers.
- Transfer projects with subsequent testing..
- Switch traffic to new servers without downtime.
- Carry out audit and comprehensive optimization of infrastructure.
- Infrastructure management via CI/CD
- containerizing applications for Kubernetes
- create Helm charts
- automatic deployment and rollback of your application
- dynamic review for launching the environment by button for developers
- unit testing
- work with Docker Registry
- Design, create and maintain Kubernetes cluster
- Monitoring cluster health with Prometheus and Grafana
- Build, clean and store application and cluster logs with Elasticsearch and Kibana
- Network policies
- Configure and maintain CephFS cluster
- Manage secrets with HashiCorp Vault
- Planned system updates