TNS
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
NEW! Try Stackie AI
CI/CD / DevOps / Operations

4 Ways To Facilitate a Successful Learning Review

Building a culture of blamelessness for learning reviews transforms each incident from a problem into an opportunity for organizational growth.
Dec 4th, 2024 9:00am by
Featued image for: 4 Ways To Facilitate a Successful Learning Review
Image from Owlie Productions on Shutterstock.

When conducting a learning review after an incident, it is often necessary to balance thoroughness with efficiency. With incident management teams often working at capacity, time can be scarce, especially in the immediate aftermath of an incident. But organizations need to analyze performance around incidents to fully understand what happened and to generate insights to improve the resilience of systems and the ability to respond to future incidents.

While learning reviews are a relatively new process — with quick postmortems and trying to identify a “root cause” more common — they have come to involve extensive preparations, including time-consuming one-on-one interviews. An alternative solution gaining popularity is to assemble available data in advance of the post-incident meeting. This ensures that the focus of the meeting can be to prompt responders to share their experiences of the incident. This approach changes the dynamics by integrating knowledge elicitation into the post-incident meeting.

What Does a Learning Review Look Like?

Standard incident postmortems are simple. They aim to identify what went wrong to cause an incident and generate action items for teams and individuals to focus on to ensure that the same, or highly similar, incidents do not recur. At times, this process might also include an attempt to understand what led to an incident and the steps taken for remediation.

In contrast to this approach, mature learning reviews differ in that they prioritize driving an improved understanding of the complex system in which the incident occurred. The core aim of these reviews is to answer the questions: “Now that we know that this incident happened, what have we learned about the system, including the people in it? And how can we use this information to improve that entire system?”

With these questions and the goal of continuous learning from incidents in mind, there are several steps organizations can take to improve their culture of post-incident learning reviews. These steps aim to establish a culture where every incident is an opportunity for learning and process improvement.

Step 1: Establish a Blameless Culture

Psychological safety is foundational to any learning review. This requires organizations to adopt the mindset that no one person is responsible for an incident. According to the Harvard Business Review, psychological safety refers to the “shared belief held by members of a team that it’s OK to take risks, to express their ideas and concerns, to speak up with questions and to admit mistakes — all without fear of negative consequences.”

In a one-on-one interview, psychological safety is easier to establish, but in a group setting, it requires more effort. The facilitator’s job is to remind all attendees they are in a safe environment where they can feel comfortable sharing their experiences without fear of embarrassment or retribution.

One way to promote this safe environment is to establish the Prime Directive of Retrospectives as a rule. This serves to stop a blame game and instead prioritize an atmosphere of learning and trying to find solutions to improve working processes.

Once the incident review starts, the facilitator must establish ground rules that encourage curiosity over judgment. This helps participants address uncomfortable topics candidly and builds a culture of accountability with no blame. This environment is crucial to driving a better understanding of each incident from the point of view of the responders who resolved it.

Step 2: Prioritize Conversation-Driven Reviews

If the goal of a post-incident learning review is to gather insights on how responders experienced an incident and why they took the steps they did to resolve it, then communicating with them is key. A learning review cannot simply be a reporting exercise. Instead, interactivity is crucial to encourage responders to learn from the review process.

Creating dedicated channels for the incident review is essential for collaboration. These provide an avenue for discussions to take place in a less formal context, while also allowing a huge volume of data surrounding an incident to be captured and used to support the takeaways from a learning review.

Additionally, throughout the review process, the facilitator should prioritize conversations with responders. This discourages the simple sharing of facts and action items and instead allows the facilitator to understand the human component of their organization’s system. This allows them to drive process improvements that have a meaningful impact on responders’ roles.

Step 3: Action Items Are Necessary But Not Sufficient

Most incident postmortems take a binary approach to incident management, with individual action items to remedy individual issues. The challenge this presents is that it never encourages organizations to improve their processes to better manage future incidents and only prevents the same incidents from recurring.

Making sure the exact same incident doesn’t recur is low-value work due to the number of contributing factors in each incident. Instead, learning reviews must improve processes and the system’s resilience to drive an improved response to future incidents. To achieve this aim, reviews must avoid simply generating action items as takeaways and instead uplevel their learning reviews to glean insights based on each responder’s experience of the incident. While action items provide useful first steps to resolve technological issues, these insights provided by responders allow the wider organization to understand their complex sociotechnical system, and how individuals respond and act within this framework.

By understanding how critical moments, such as incidents and outages, influence human behaviors, organizations can take steps to improve processes and their incident management processes by directly responding to individuals’ pain points.

Step 4: Run Diagnostics on Your Entire System 

This step doesn’t mean running diagnostics on your technical systems. Instead, to gain significant benefits from post-incident learning reviews, organizations must approach them as an opportunity to assess how their people and their technology interact — and the effects these relationships have on incident management.

Outside of any technical issues, there are several people-centric challenges that can affect incident response at any given time. From production pressure to an incident’s time of day, or whether team members are at an offsite or affected by wider business decisions, there are a significant number of factors that can affect incident management, and they vary between individual incidents.

Learning reviews offer an opportunity for responders to share their hands-on experience of an incident, and how differing factors affected their ability to effectively remediate any damage caused. This can help organizations build an action plan for future incidents, containing contingencies and accounting for differing levels of pressure on the people involved in each incident management process.

Refine, Refine, Refine

The groundwork for facilitating a successful learning review is building a culture of blamelessness. In this environment, responders can get to grips with the reality of their incident management processes, assess these objectively and make concrete improvements. This allows each learning review to truly become a learning experience and transforms each incident from a problem into an opportunity for organizational growth. By taking this approach to post-incident reviews, organizations can continuously iterate on their incident management plans, to ensure each incident is resolved effectively and without a significant burden on incident responders.

Group Created with Sketch.
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.