Continuous Delivery: Gold Standard for Software Development
Any business trying to deliver best-in-class products and services to its customers should willingly embrace change. Of course, this can be difficult. Change often brings uncertainty, as established processes and services have to shift to meet new standards. And change can’t be for change’s sake. Businesses must tie these new approaches, products and different ways of working to measurable, specific outcomes.
A highly effective framework for measuring how these changes affect performance is championed by DevOps Research and Assessment (DORA), a long-term research program that “seeks to understand the capabilities that drive software delivery and operations performance.”
One performance metric that stands out in this framework is “releasability.” This measurement places a major emphasis on software development teams always being able to go to market with their products and acts as the concrete measurement of the efficacy of a business’ continuous delivery (CD) pipeline.
In reality, achieving a high releasability score means working to a consistently high quality, with quick incident detection and resolution, and rapid feedback and recovery all built into the platforms developers are working on to deliver high-quality software for their customers.
The Core DORA Metrics for Continuous Delivery
Within the DORA framework, there are a vast range of metrics that can be measured, alongside the impact they have on an organization’s software development performance. These metrics can be technical capabilities and processes — such as artificial intelligence (AI), continuous integration (CI) or code review speed — or metrics that can be more closely linked to performance.
This second group is vital for measuring CD performance. The two key metrics in this group are stability and throughput. The former is a demonstration of an organization’s change failure rate and the time to recover from these failures, while the latter measures the lead time for any change and the frequency of these changes. Together, these paint a picture of an organization’s software development successes, as well as the time it takes to recover whenever a change is unsuccessful. As organizations aim for the gold standard of CD, understanding their performance, especially when changes fail, is vital.
The Challenge of Knowing Why Your Product Broke
In the context of CD, developers must be able to easily and quickly understand why a product or update has failed. Given that between 50% and 80% of updates to software fail, developers need to be able to rapidly identify the exact point of failure and resolve it. This reduction in incident resolution time — or bug fixing — is one of the significant benefits of developers consistently working toward the metric of releasability. This means that when problems arise, they are easy to fix and recovery cycles are quick.
To meet increasingly quick development targets, developers need to find ways to reduce the time they spend on incident response and troubleshooting. To help with this, they need access to real-time insights that allow them to identify, diagnose and resolve any incidents as they arise. These insights can give developers an instant, digestible understanding of how changes affect their software development pipelines, even when changes may not be significant enough to cause an incident.
These “change events” offer a trail of breadcrumbs through every change made to a product throughout its development cycle, allowing developers to see the direct effects of each update. These range from how application code is deployed, all the way through to how scaling a service up or down can affect its performance. Perhaps most usefully, they are not only available throughout a product’s development cycle but also after it has gone to market. This allows any incidents or performance drops to be addressed and resolved as soon as they arise.
Change Correlation: Key to Continuous, Quality Delivery
While the information provided by change events is useful for developers, change correlation takes incident resolution one step further. This provides developers working on returning a product to a releasable state with the recent change events that are most relevant to an incident. The data from these events can then be fed into a machine learning model and analyzed to draw correlations between past change events and incidents, allowing for rapid diagnosis and resolution of incidents. To take this incident resolution strategy one step further, all of this data can be situated within a single platform that tracks change events and also helps with correlating individual changes to breakages within products.
It is possible to detect the exact impact of a change on a product because every single change event contains contextual information related to its time and the service to which it was applied. This is highly valuable data and, when it is available at a glance, it allows incident response teams to triage incidents quickly and reduce time to resolution, ensuring that the services provided by an organization are stable.
The Benefits of Understanding Change
The insights provided by change events allow developers to pinpoint exactly which changes have affected their products and how. The direct result of this is a significant reduction of unplanned work. With intelligent change correlation, the time-consuming process of identifying exactly which changes have broken a product can be automated, further reducing the workload for developers in critical moments. This frees up developers to focus on innovation and bringing value to their customers, rather than simply having to fix broken products.