How Conviva Uses Endpoint Event Data to Measure UX at Scale
Want to get your heart rate up quickly? Imagine you’re deep into work when your manager calls saying people are ranting on social media that the checkout process in your app isn’t working properly. The company is losing dollars every second, and the CEO wants to know when it’ll be fixed.
This sounds like a great use case for an observability solution. The challenge? Everything might look good in the tool’s UI, or excessive false alerts may mean teams don’t trust it or consider it noisy and confusing to work with. Regardless, you quickly dive into troubleshooting mode, pulling a team together to diagnose the problem and implement a fix that you hope will address the issue. But you are not sure if it will because you don’t have visibility into what users experience when they go through the checkout process in the real world. Meanwhile, you’re frustrated that, despite your investment in observability tooling, you weren’t aware of the problem, so you couldn’t predict the outage and prevent all the trouble — for your users, your CEO, your manager and now for you and your team.
Poor user experience is more than just a hassle for your IT team — it’s a major cause of customer churn (and therefore company profitability and, in turn, your job security). About half of customers will switch to a competitor after just one bad user experience, and 80% will walk away after several bad experiences, according to Zendesk research. That study also indicates the average customer shares a bad user experience with eight to 16 people, so one person’s negative interaction with your app or website can snowball quickly.
Companies have long used monitoring and observability tools to approximate user experience, but all too often, IT, engineering and product teams who build, deploy and optimize web and mobile apps don’t hear about user problems until they get that phone call from their manager.
What Observability Gets Wrong about User Experience
Traditional observability tools monitor core components of infrastructure, backend applications and web and mobile apps, then alert you about problems you need to fix. They provide detailed measurements pertinent to the system performance, including real user monitoring (RUM) for web and mobile apps, synthetic monitoring, crash analysis, JavaScript errors and session reply. Monitoring and observability are crucial to an optimally functioning app or web service, but they measure the performance of software components. What is missing is real-time measurement of the user experience — from the real-life users’ point of view.
There are three problems with the systems-centric monitoring paradigm.
First, observability tools produce an enormous amount of data. It’s common knowledge that more metrics, traces and events on top of other infrastructure and application data increases noise for IT and engineering teams. But the bigger concern is that none of that data is connected to the actual user experience. This makes it more difficult to weed out customer-impacting events from other alerts, potentially creating alert fatigue that causes you to overlook insights that really matter.
Second, most observability tools originated with backend services or infrastructure layers as their primary focus. IT and engineering teams using these solutions may not know there’s a customer problem until after users report it. With microservices-based architectures, distributed tracing seeks to provide visibility of requests spanning frontends and backends in distributed systems. However, teams receive minimal context to bring their user experience to life, and must often sift through mountains of duplicate or broken traces when problem-solving.
Looking from the Wrong Angle
But the biggest problem with using observability tools to measure user experience is that they look at the issue from the wrong angle, said Vyas Sekar, chief scientist at operational data company Convivia and a faculty member at Carnegie Mellon University’s College of Engineering.
“Observability can’t identify the experience issues users face, and can’t tie these to backend problems or business outcomes that you care about in the real world,” Sekar said.
Using observability to infer a user experience outcome is like trying to determine whether an e-commerce customer received a shipment by looking at the warehouse’s condition, said Jens Koerner, Conviva’s head of product management. There are a lot of things that can happen between when the product is packaged at the warehouse and when the customer opens the box in their kitchen, and if you don’t have visibility on all of them, you won’t know why the package was damaged or missing. The warehouse is simply not as close to the user as the desired business outcome demands.
Just like Amazon needs to make sure its warehouses are well stocked, staffed and organized, monitoring how your servers, API response codes, network requests, Kubernetes health and other parts of the infrastructure are performing is essential, Koerner said. But that is different from monitoring the quality of the user experience.
“Experience has to consider the technical, as well as user perception and engagement. It’s how people interact with an application to get something done,” he explained.
Whereas observability tools provide performance measurements of system components, quality of experience takes in all user data and models events across critical user flows with business context. This eliminates blind spots and brings data to life that represents the true customer, precisely aligned to business workflows.
For a more relevant example in the software world, imagine a web browser update deletes the user’s authentication token, so they are not logged in automatically but are redirected to log in with their username and password. Not only is that inconvenient, many users don’t remember their password, so they can’t log in to your e-commerce site. Your system does everything correctly on the backend — it recognizes the missing token and redirects to the login page, then recognizes the wrong password, pushes the password reset page and authenticates the reset request — but the user still has a bad customer experience. Your observability and monitoring tools will give you an A+, but your customer will downgrade you to a C-.
In Measuring User Experience, Sequence Matters
When Conviva’s data scientists began trying to solve the disconnect between observability and user experience, they started out using existing big data tools including Apache Spark, Flink and Kafka. But they found fundamental challenges in using them to track user experience at scale. Those tools use traditional tabular data abstractions like SQL, which works great in stateless applications when event order isn’t critical. But to measure the quality of user experience you need to account for the order of events, the time between each event and the state of the system overall. This requires a stateful approach.
“Almost every existing solution is built on an idea from the 1970s: relational databases and tabular processing,” explained Sekar. “That’s the foundation of modern data processing systems. But when you have to track stateful behaviors to model user experience and connect the dots between experience and backend issues at scale, that classical technology completely breaks.”
Conviva’s data scientists created a new approach to calculating and storing data called time-state analytics that uses an abstraction called timelines. This concept, which they presented as a research paper at January’s Conference on Innovative Data Systems Research (CIDR), considers all the factors that can affect user experience, including the device, the network, the application burden and more, along with the sequence of those operations.
“Timelines offer a more efficient way to write queries compared to the conventional approaches,” wrote Aditya Ganjam, co-founder of Conviva, in a blog post. This technology foundation “represents all event stream data as timelines with a set of timeline operators to compute stateful metrics.”
This approach drives IT efficiency in three ways, said Sekar. First, it reduces false alarms by focusing attention on the high-priority issues that affect user experience; second, it removes blind spots that wouldn’t be identified by traditional observability tools; and third, by zooming into the exact problem impacting user experience, you need fewer people and less time to resolve issues, decreasing costs and improving your mean time to recovery (MTTR).
A ‘Radically Different Approach’ to Observability
Conviva is “taking a radically different approach in redefining observability,” said Sekar. “We call it experience-centric observability (ECO) and make the quality of experience that people observe in the real world the ‘first-class citizen’ of observability.”
The company cut its teeth in the media streaming industry, with customers such as Paramount+, NBC, Sling and Univision, which process streaming data at enormous scale — thousands of times what companies like BMW and Google ingest. Latency is a serious issue in this industry — if a televised game starts buffering during the last two minutes of the Final Four basketball championship, college hoops fans will blow up social media.
Or, as Koerner explained, if your checkout process lags for, say, Android users with Samsung devices in North America, cart abandonment will spike. If that causes 10,000 users to not complete a $50 purchase, that could cost your company a half million dollars in one day.
Instead of looking at the backend infrastructure and trying to infer user experience, Conviva collects data directly from user events, the closest signal to the customer. It deploys a simple, lightweight software development kit (SDK) that collects every user event as the user moves through an application. The data is sent to the backend where it is correlated and mapped, meaning that it doesn’t create extra work for the device or impact the quality of the user experience. The SDK connects the app to Conviva’s platform, which applies time-state analytics to map that data across all user flows with business and service context.
The UI also enables any team in the business to customize the metrics that matter to them in just a few clicks, democratizing data across the organization. For example, if you know you’ll have an 80% abandonment rate if user login takes two minutes or more, the user experience team can create a metric for that.
By processing data from millions of devices in real time and at scale, Conviva can directly measure user experience across all devices connected to a service. If it identifies trouble with users’ quality of experience, the platform uses AI to power root-cause analysis, focusing attention on the biggest customer problems relative to business goals and reducing the time and effort required for its customers to resolve incidents.
Looking Ahead
Conviva has made its mark on the media and entertainment industry, working directly with 12 of the 15 largest media streamers. Now the team has its sights on bringing the technology to other industries.
“We have solved this problem for the leading media and entertainment customers at a scale that nobody else can. If we can solve that problem, we can democratize this technology to everyone,” said Sekar.
To learn more about harnessing time-state analytics to more efficiently measure user experience and align observability with business outcomes, register for Conviva’s webinar “5 Strategies to Escape the Observability Money Pit.”