Monitor your service
Dashboard Hierarchy
Following the design principle of "Hierarchical dashboards with drill-downs to the next level" 1, we have developed a five tier dashboard structure to fulfil different persona needs as follows: -
Dashboard |
Description |
Persona / User |
Dashboard Title |
Overview |
Observability of all products and tenants running on a platform. |
Service Manager |
SRE MaC / Overview |
Product View |
Observability of all the user journeys running on an individual product. |
Product Manager and Team |
SRE MaC / {Product Name} |
User Journey View |
Observability of all the SLIs in a single user journey. |
Engineers / Analyst |
SRE MaC / {Product Name} / {User Journey Name} |
Detail View |
Observability of all whitebox and blackbox metrics which contribute to SLIs and Service Health. For troubleshooting. |
Engineers / Analyst |
SRE MaC / {Product Name} / {User Journey Name} / Detail |
Dashboard Design Principles
1.0 Methodology
ID |
Principles |
1.1 |
Methodical dashboards according to DDaT SLI/SLO standards. |
|
- Dashboards focused on symptoms rather than causes. |
|
- The ability to visualise adherence to SLOs in a dashboard |
|
- The ability to visualise Error budget in a dashboard |
|
- The ability to visualise Burn Rate in a dashboard |
1.2 |
Align SLI/SLO dashboards to standard Google SLI Categories |
2.0 Automation
ID |
Principles |
2.1 |
Scripting libraries to generate dashboards, ensure consistency in pattern and style. |
|
- No editing in the browser. Dashboard viewers change views with variables. |
2.2 |
Version controlled dashboards iterated inline with code management best practices |
2.3 |
Reuse dashboards and enforce consistency by using templates and variables. |
2.4 |
Dashboards should be linked to by alerts. |
3.0 Visualisation
ID |
Principles |
3.1 |
Keep graphs simple and focused on answering one question |
3.2 |
Dashboards should reduce cognitive load and be quick to figure out |
3.3 |
Expressive charts with meaningful use of colour and normalising axes where you can. |
|
- Example of meaningful colour: Green/Blue means it’s good, red means it’s bad. |
|
- Example of normalising axes: When comparing CPU usage, measure by percentage rather than raw number. |
3.4 |
Use a meaningful name |
3.5 |
Browsing should be directed with links. |
3.6 |
Add documentation to dashboards and panels |