Observability

Last updated: 21 March 2025

To operate software effectively, you need to have insights into how it is behaving and performing by capturing logs and metrics about significant events. This allows you to detect, diagnose, and resolve issues quickly, or even prevent there being a user impact.

Requirement(s)

You MUST capture meaningful log data
You MUST have a way to inspect and search log data
You MUST have a way to automatically alert or take action from log data
You MUST capture data at platform and technology appropriate levels
You MUST define a useful lifetime for log data
You MUST have appropriate access controls

You MUST capture meaningful log data

Having appropriate log data is essential to understanding how your software is behaving. However, you should not log data excessively as this increases the costs of storage and indexing, makes it harder to find relevant logs, and increases the risk of personally identifiable information (PII) being exposed unknowingly.

You MUST have a way to inspect and search log data

Using appropriate log management and analytics tooling allows you to more quickly find relevant information and resolve issues more quickly. The capability to build customised visualisations or dashboards allows you to tailor this specifically to logs of interest.

You MUST have a way to automatically alert or take action from log data

Manual monitoring of logs, visualisations, or dashboards is time-consuming, difficult, and error-prone. Automated alerting tools allow you to specify conditions under which action is required to protect service availability, which should be configurable based on a specific application’s requirements.

Alerts should always be actionable. Spurious alerts make it less likely real alerts will be given appropriate attention, increasing the probability and severity of user-impacting incidents.

You MUST capture data at platform and technology appropriate levels

Depending on the technology you use, you need to consider what information is relevant to log and monitor. For example, you may need to log:

User interaction with a frontend component
Database health indicators
API and network logs
Container, virtual machine, or hardware utilisation metrics
Business level management information

You MUST define a useful lifetime for log data

Retaining logs for longer than they are useful wastes storage and compute resources. It makes it harder to identify relevant information and also potentially increases the risk of any data breaches.

You MUST have appropriate access controls

Depending on the sensitivity of the data you are logging, you must identify who should have access to the data and implement appropriate access control. This should be discussed in collaboration with your Security Information Business Partner (SIBP).