Beyond Monitoring: How Observability 2.0 Is Revolutionizing Developer Experience

Understand how Observability 2.0 addresses technical debt and optimizes developer workflows.

Oct 21st, 2024 10:00am by Thomas Johnson

Featued image for: Beyond Monitoring: How Observability 2.0 Is Revolutionizing Developer Experience

Observability has become a cornerstone of modern engineering strategies, though it can be argued that we never fully settled on a unanimous definition. That’s why this latest evolution—with “observability 2.0”—is so thrilling: we might finally align observability’s true meaning and potential with its name.

It’s worth going back in time to see how we got to the point of needing to version the name.

With its roots in control systems theory, the term “observability” was popularized by the Honeycomb team in 2016. They expanded upon Rudolf E. Kálmán’s definition —”a measure of how well internal states of a system can be inferred from knowledge of its external outputs”— and redefined it to mean “the power to ask new questions of your system, without having to ship new code or gather new data to ask those new questions”.

While this concept was gaining traction, in 2017, Peter Bourgon suggested that observability consists of “three pillars”—metrics, logs, and traces—a definition that found strong support within the application performance monitoring (APM) tooling industry, as it coincidentally aligned perfectly with their product offerings.

In the subsequent years, valiant efforts were made to clarify the true scope of observability, such as Ben Silgelman’s 2021 “Debunking the ‘Three Pillars of Observability’ Myth,” in which he explained that “metrics, logs, and traces aren’t “the observability;” they’re just the telemetry. “

Unfortunately, the tendency to confuse observability with telemetry or monitoring has persisted, no matter how many tech industry leaders try to clarify that monitoring may tell you when something is wrong, but observability enables you to understand why.

To put it plainly, observability extends beyond traditional monitoring and is about unveiling system behaviors, empowering teams to reveal ‘unknown unknowns,’ and gaining a complete understanding of complex systems.

In August of this year, Charity Majors proposed that we refer to the solutions that closely align with the three pillars and APM tools as observability 1.0. Observability 2.0 represents a move beyond traditional monitoring frameworks and APM tools and a shift in how developers approach system understanding and debugging.

In this post, I’ll explore the implications of observability’s evolution on developer happiness, productivity, and day-to-day experience.

Observability 1.0 vs. Observability 2.0

Let’s do a quick recap and clarify the difference between the two types of observability:

Observability 1.0

Observability 1.0, closely tied to APM tools, refers to the traditional approach where vast amounts of telemetry data (metrics, logs, and traces) are collected and then displayed with dashboards—or the often aspired-to “single pane of glass.”

Observability 1.0 focuses on operations at its core: it highlights known issues once the software is in production. It’s useful if you already know what to look for—the ‘known unknown’—but production failures in complex distributed systems are often non-linear and difficult to predict, requiring manual exploration to find the root cause of an issue.

As a result, developers don’t like relying on APM tools for things like debugging because they give you a large volume of aggregate data instead of the specific information you need to solve the problem. They often feel like they’re searching for a needle in a haystack.

For example, say you were monitoring your website’s errors and suddenly saw a spike. Observability 1.0 dashboards will alert you of the problem and help you understand the where, when, and what, but you need to dig deeper to understand the why.

In short, while observability 1.0 is still an indispensable tool for monitoring and managing distributed systems, it doesn’t fully address the day-to-day challenges developers face, nor does it aid them in proactively understanding their systems.

Observability 2.0

Observability 2.0 represents a shift in focus beyond simply identifying operational issues to empowering developers throughout the entire software development lifecycle. It’s the acknowledgment that our definition of “observability” is evolving—or better yet—finally meeting the promise of its original definition.

While Observability 1.0 emphasized identifying fires and monitoring system health, observability 2.0 is more developer-focused. It’s about addressing the root causes of issues and reducing incident frequency by embedding observability into the development process itself—in other words, solving problems before they appear on the observability 1.0 dashboards!

The two main problems Observability 2.0 addresses for developers are:

They need precise, real-time, context-rich insights into their system so they can depend on a single source of truth to understand unknown unknowns.
Faster debugging, where developers can easily introspect or understand complex systems.

This is possible because the building block of observability 2.0 is log events, which are more powerful, practical, and cost-effective than metrics (the workhorse of observability 1.0), as they preserve context and relationships between data.

Furthermore, observability 2.0 is built on open standards like OpenTelemetry, which allows devs to use a common standard for traces, logs, and metrics.

Adopting an open, portable means of gathering the telemetry data is not an insignificant innovation: I still remember the days when the only two alternatives were to staff up a major company initiative to gather telemetry data across our infrastructure or to pay a small fortune to a vendor to deploy their proprietary agents to gather said data.

How Observability 2.0 Will Change the Developer Experience

Developer Experience (DX) shapes how engineers perceive their work, impacting productivity, engagement, happiness, and retention. A strong DX fosters an environment where teams can perform at their best, tackling challenges efficiently and enthusiastically.

In this context, having the right tooling to manage the completeness of their software has a huge impact on DX: a recent Atlassian survey revealed that 8+ hours a week can be lost to inefficiencies, between struggling with technical debt, poor documentation, and insufficient debugging tools.

To improve DX —and, hence, the team’s ability to deliver reliable, scalable, and maintainable software—research has identified three core areas:

Feedback Loops: Enable continuous improvement through rapid learning and adjustments.
Cognitive Load Management: Provide accurate, accessible documentation.
Optimal Flow State: Minimize disruptions to maintain deep work.

Observability 2.0 addresses all three of these areas by empowering developers with increased visibility and reduced manual tasks:

Real-Time, Context-Rich Insights: Developers gain immediate feedback on system changes, helping them ship code faster and more confidently. With observability 1.0, I’ve often felt like debugging is an archaeological dig— painstakingly uncovering layer by layer to understand the system design, architecture, and design decisions before pinpointing the root cause of the problem. With observability 2.0, you get precise and real-time visibility into all components and their relationships and can easily avoid adding unintentional architectural technical debt with your changes.
Reduced Manual Work: Using observability frameworks like OpenTelemetry for documentation means your running system stays consistent with your documentation without making manual updates. Debugging also becomes more efficient with context-rich data, allowing developers to diagnose issues without sifting through overwhelming volumes of data.

Practical Example: Debugging with Observability 2.0

Observability 2.0 opens the doors for new use cases and tools that can solve developers’ daily problems and save them significant time and headaches.

This is my experience with the current debugging process and how I believe it will evolve with observability 2.0.

Traditional debugging involves a search-first approach: you sift through telemetry data, search through endless logs and traces, pattern-match using intuition, and rely on experience, educated guesses, and a (possibly outdated) mental model of the system.
You attempt to reproduce the issue, although that might not always be possible due to vague and incomplete reports—if any details are provided.
Lastly, you may also have to navigate across scattered resources— documentation, architecture diagrams, decision records, APIs, and repos—just to comprehensively understand the system.

As I hinted at, issues in complex, distributed systems are rarely isolated. Understanding not just what went wrong but also why and how something went wrong requires correlating data across various system layers, which is time-consuming and prone to human error.

Not to mention that sometimes teams are pressured to “just fix the problem as fast as possible” because of business needs and deadlines. With observability 1.0, this may result in addressing the “symptoms” of an issue but not the actual core problems.

With Observability 2.0, new developer tools like Multiplayer.app, leveraging OpenTelemetry, allow for platform-level debugging with deep session replays. In one click, developers can capture sessions showing steps to reproduce a bug, accompanied by data from frontend screens to deep platform traces, metrics, and logs.

Obtaining accurate, real-time data and documentation about your actual system architecture drastically reduces the time spent on troubleshooting and debugging.

Conclusion

The evolution of observability reflects the growing complexity of modern software systems and the need for more sophisticated tools to manage and understand them.

By leveraging observability 2.0 and frameworks like OpenTelemetry, organizations can develop a holistic, real-time view of their software, improving the developer experience and boosting overall productivity and long-term sustainability.

They are no longer just addressing “symptoms” — like constantly taking painkillers because you have headaches — but addressing the root problem and preventing the headache before it forms.

Looking ahead to 2025 and beyond, observability 2.0 will continue to fuel developer teams by automating complex troubleshooting processes, enabling rapid onboarding, and reducing knowledge silos—ultimately saving organizations time, money, and engineering effort.

Co-founder and CTO at Multiplayer, with 30+ years of experience as a backend developer building large-scale distributed software (and robots!).