Course Outline
Introduction
- How SRE bridges traditional IT and software development.
- The need for automation and observability
- The roles of software engineers versus system administrators.
- Site Reliability Engineers versus DevOps engineers.
Overview of an IT System
- System architecture, including on-premise and cloud environments.
Overview of SRE Principles and Practices
- Infrastructure as Code.
- The role of containerization and orchestration (e.g., Docker, Kubernetes).
- Continuous Integration, Continuous Deployment, and Continuous Delivery.
- Observability.
Evaluating an IT System
- Assessing team and organizational resources.
- Mapping out systems and processes.
- Estimating the potential impact of SRE.
- The role of the software engineering team.
- The role of the operational team.
- The role of management.
Maintaining System Reliability
- Describing and measuring desired service reliability.
- Understanding Service Level Objectives (SLOs).
- Understanding Service Level Indicators (SLIs) and Service Level Agreements (SLAs).
- Working with Error Budgets.
- Developing SLOs.
Optimizing System Administration
- Setting up a development environment.
- Evaluating SRE tools.
- Prioritizing tasks for automation.
- Writing software.
Deploying "Infrastructure as Code"
- Testing and iterating code.
- Creating anti-fragile systems.
- Learning from failure.
Monitoring a System
- Observing system performance.
- SRE tools and techniques.
The Future of SRE
Summary and Conclusion
Requirements
- A foundational understanding of IT infrastructure.
- A general awareness of the software development lifecycle.
- Programming or scripting experience in any language.
Audience
- Developers
- System Administrators
- Software Architects
- DevOps Engineers
- IT Managers
Testimonials (7)
How detailed subjects are explained with real world examples
Brian Hlabane - African Bank
Course - Site Reliability Engineering (SRE) Fundamentals
She is expert in area and provide really nice training. Material, training was really mix of examples , discussion and
Peter Tutka - Deutsche Telekom IT & Telecommunications Slovakia s.r.o.
Course - Site Reliability Engineering (SRE) Fundamentals
View on the SRE/ DevOps from more business/ theoretical point of view. Most helpful for people who already have the practical view.
Michael Varhol - Deutsche Telekom IT & Telecommunications Slovakia s.r.o.
Course - Site Reliability Engineering (SRE) Fundamentals
Approach of the training to send questionnaire before the training, so the training was planned accordingly to expectations. Brings the participants more active.
Stefan Girman - Deutsche Telekom IT & Telecommunications Slovakia s.r.o.
Course - Site Reliability Engineering (SRE) Fundamentals
Sticking to the initial survey from attendees about what should be the focus of training.
Denis Majorsky - Deutsche Telekom IT & Telecommunications Slovakia s.r.o.
Course - Site Reliability Engineering (SRE) Fundamentals
discussions , SRE definition
Daniel Horvath - Deutsche Telekom IT & Telecommunications Slovakia s.r.o.
Course - Site Reliability Engineering (SRE) Fundamentals
Concept of the training, keeping the people focused by asking them a questions and triggering discussions. Also group breakout sessions were great to think about things in groups and see different outcomes from other group.