Modern Apps Demand Advanced Observability and Live Debugging
Is there a more surprising buzzword over the past half decade than observability? Perhaps Kubernetes… But in general, who could have predicted that monitoring 2.0 would be something that vendors were clamoring to attach themselves to?
And yet, it makes sense. Systems have become more complex. It follows that the tools needed to monitor and heal these systems also needed to become more sophisticated. And in a world where we want answers and we want them now, where we expect five 9s of uptime and will be outraged at even minor inconveniences, it’s not terribly surprising that the world needed to evolve beyond the reactive world of monitoring to the more proactive world of observability.
And this gets to a topic of interest in the roundtable discussion that I hosted recently (you can watch the video at the end of this article): What is observability, exactly? The technical definition is about the ability to determine the internal state and health of a system by looking at its outputs. But going beyond that a bit, what I think everyone on the panel agreed on is that, unlike monitoring, the industry needed to arrive at a moment where developers or Ops folks can get real-time answers to their questions.
Monitoring is still a critical piece of running modern systems, and perhaps you could argue it’s even a subcategory of observability. But if you were to oversimplify it, monitoring involves knowing ahead of time what you think may go wrong and setting threshold alerts, etc. The most caricatured version of this persona is someone sitting in a network operations center, staring at a bunch of dashboards and waiting for a spike.
But in today’s complex, distributed systems it’s not always possible to know ahead of time what monitors should be in place. The promise of modern observability, in its purest form, is the ability to ask questions about your complex systems and to get relatively real-time and accurate answers. That way, you can resolve an issue, do probable cause analysis and hopefully stop that customer from complaining.
This was a great panel to dig into all of these issues and more. We talked about the reality of OpenTelemetry and its role in observability; we discussed shift-left and network observability; we talked about best-of-breed point solutions versus single pane of glass. And, of course, the role of AI — both in how it can help with observability and how AI itself may need to be observed.
Panelists
- Jason Yee, a staff advocate at Datadog
- Avi Freedman, CEO and founder of Kentik
- Ashley Sawatsky, a reliability advocate at Rootly
- Paige Cruz, a principal developer advocate at Chronosphere
- Tal Yitzhak, a solutions architect at Lightrun
Watch the modern observability roundtable discussion below: