Must-Know Site Reliability Engineering (SRE) Terminology

February, 13 2023

Charlotte Binstead

Familiarise yourself with the key terms in Site Reliability Engineering and test yourself with our wordsearch!

Toil

Toil is manual, repetitive work (for example, creating user accounts, handling access requests, clearing browser cache, etc.) that is necessary, but of low value. The SRE approach is to automate as much toil as possible, leaving more opportunities to perform higher-value work.

Auto-remediation

The ability to use advanced automation to sense issues, effect solutions and restore service; reducing toil.

SLO

Service-level objective: A target value or range of values for a service level that is measured by an SLI.

SLI

Service Level Indicators (SLIs) are measures of performance that allow SREs to understand if they are meeting the SLOs for the system. For example, they can be the uptime metric for a particular service.

Mean Time to Detect (MTTD)

The time taken to identify an incident before it makes an impact.

Mean Time to Recover (MTTR)

Critical for SREs. The average time it takes to recover and get back up and running.

Incident response automation (IRA)

The ability to streamline the incident response process through automation.

Mean Time Between Failures (MTBF)

The average time elapsed between two incidents across a series of incidents.

Error budget

The amount of acceptable downtime for a particular service.

Can you find all these words, and more, in our SRE wordsearch?

Click to play our interactive SRE wordsearch!

Must-Know Site Reliability Engineering (SRE) Terminology

Toil

Auto-remediation

SLO

SLI

Mean Time to Detect (MTTD)

Mean Time to Recover (MTTR)

Incident response automation (IRA)

Mean Time Between Failures (MTBF)

Error budget

Can you find all these words, and more, in our SRE wordsearch?

Subscribe

Related Posts

4 ways to scale your Site Reliability Engineering (SRE) practice

Modernise with AWS and say goodbye to proprietary licences

Why Site Reliability Engineering is one of the most in-demand skills in 2023

Must-Know Site Reliability Engineering (SRE) Terminology

Toil

Auto-remediation

SLO

SLI

Mean Time to Detect (MTTD)

Mean Time to Recover (MTTR)

Incident response automation (IRA)

Mean Time Between Failures (MTBF)

Error budget

Can you find all these words, and more, in our SRE wordsearch?

Subscribe

Share

Related Posts

4 ways to scale your Site Reliability Engineering (SRE) practice

Modernise with AWS and say goodbye to proprietary licences

Why Site Reliability Engineering is one of the most in-demand skills in 2023