Skip to main content

Monitor your service

Dashboard Hierarchy

Following the design principle of "Hierarchical dashboards with drill-downs to the next level" 1, we have developed a five tier dashboard structure to fulfil different persona needs as follows: -

Dashboard Hierarchy

Dashboard Description Persona / User Dashboard Title
Overview Observability of all products and tenants running on a platform. Service Manager SRE MaC / Overview
Product View Observability of all the user journeys running on an individual product. Product Manager and Team SRE MaC / {Product Name}
User Journey View Observability of all the SLIs in a single user journey. Engineers / Analyst SRE MaC / {Product Name} / {User Journey Name}
Detail View Observability of all whitebox and blackbox metrics which contribute to SLIs and Service Health. For troubleshooting. Engineers / Analyst SRE MaC / {Product Name} / {User Journey Name} / Detail

Dashboard Design Principles

1.0 Methodology

ID Principles
1.1 Methodical dashboards according to DDaT SLI/SLO standards.
  - Dashboards focused on symptoms rather than causes.
  - The ability to visualise adherence to SLOs in a dashboard
  - The ability to visualise Error budget in a dashboard
  - The ability to visualise Burn Rate in a dashboard
1.2 Align SLI/SLO dashboards to standard Google SLI Categories

2.0 Automation

ID Principles
2.1 Scripting libraries to generate dashboards, ensure consistency in pattern and style.
  - No editing in the browser. Dashboard viewers change views with variables.
2.2 Version controlled dashboards iterated inline with code management best practices
2.3 Reuse dashboards and enforce consistency by using templates and variables.
2.4 Dashboards should be linked to by alerts.

3.0 Visualisation

ID Principles
3.1 Keep graphs simple and focused on answering one question
3.2 Dashboards should reduce cognitive load and be quick to figure out
3.3 Expressive charts with meaningful use of colour and normalising axes where you can.
  - Example of meaningful colour: Green/Blue means it’s good, red means it’s bad.
  - Example of normalising axes: When comparing CPU usage, measure by percentage rather than raw number.
3.4 Use a meaningful name
3.5 Browsing should be directed with links.
3.6 Add documentation to dashboards and panels