What are Error Budgets
Site reliability engineering (SRE) is a discipline that allows teams to design and operate scalable, resilient systems using a software engineering approach. Gartner defines SRE as a collection of systems and software engineering principles used to build and operate resilient distributed systems at scale. SRE acts as a complement to DevOps practices by managing the risks of rapid change by promoting resilience, accountability and innovation.
Error Budgets enable teams to make decisions on ‘are we focussing on the right things as a team’. It allows the team to see if the time spent on the feature is not taking a toll in production.
When the error budget runs out, the team needs to change direction and make sure it huddles to ensure the systems are stable again and drop any work with regard to features.
Setting up Error Budgets
Step 1. Connect Agile Analytics to your backend
Connect to Google Cloud Monitoring: [Google Cloud Monitoring] Connect Agile Analytics to Google Cloud Monitoring
Connect to AWS Cloud Watch: (?????)
Connect to Prometheus: (coming soon)
Connect to Datalog: (coming soon)
Connect to Dynatrace: (coming soon)
Connect to Elasticsearch: (coming soon)