MTTR
What is MTTR
MTTR (Mean Time to Recovery) is one of the key metrics developed by the DevOps Research and Assessment (DORA) organization to measure the performance of software development and delivery teams. It displays the average time it takes to recover from a product or system failure. This includes the full time of the outage — from the time the system or product fails to the time that it becomes fully operational again.
MTTR metric is used to understand the effectiveness of the DevOps and ITOps teams and identify opportunities to improve their processes and capabilities.
Interpreting MTTR
System outages and downtime heavily impact customer experience, so it’s important for MTTR to be as short as possible. A higher MTTR means the organization and its customers are more likely to experience significant and frequent downtime, which can lead to complaints, cancellations, and non-renewals.
A good MTTR is directly related to how quickly you can detect and identify a problem’s root cause (the mean time to detect, or MTTD). The longer it takes to identify a problem, the longer it will take you to restore the system to full operation.
However, it’s a very high-level metric that doesn't give insight into what part of the process actually takes the most time. Since MTTR includes everything from incident detection and alerting to repairs and resolution, it’s impossible to say which part of the incident management process can or should be improved.
For example, high recovery time can be caused by incorrect settings of the alerting system, which takes longer to alert the right person than it should. But it can also be caused by issues in the repair process. Without more data, it’s impossible to tell.
To solve this problem, we need to use other metrics that allow for the analysis of specific parts of the process.
MTTR according to Google’s Accelerate State of DevOps reports
According to Google’s Accelerate State of Devots reports 2022, performance can be evaluated in the following way:
Low: MTTR is between one week and one month.
Medium: MTTR is between one day and one week.
High: MTTR is less than one day.
Â