TNS
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
NEW! Try Stackie AI
Data / eBPF / Observability

Metrics, Traces, Logs — And Now, OpenTelemetry Profile Data

With the addition of profiling to OpenTelemetry, we expect continuous production profiling to hit the mainstream.
May 31st, 2024 6:28am by
Featued image for: Metrics, Traces, Logs — And Now, OpenTelemetry Profile Data
Featured image via Unsplash+.

OpenTelemetry‘s profiler represents what its creators call another milestone, as one of the more dynamic and important projects for open source in general, and especially for observability. If it lives up to the ambitions of OpenTelemetry’s (OTel) creators, continuous profiling signals could be at least as critical as metrics, traces and logs data.

The OTel profiling isn’t available for use yet — its general availability is targeted for the end of year for version 1.0. It has also been available to the public to “some degree” for over six years, Morgan McLean, senior director of product management for Splunk and also one of OpenTelemetry’s main creators, told The New Stack.

“While it provides a simple and powerful way for developers and their organizations to reduce infrastructure costs and improve performance by giving them visibility down to individual code functions, profiling still isn’t incredibly well known and isn’t used across the industry to the degree that metric, log and trace analytics are,” McLean said. “With the addition of profiling to OpenTelemetry, we expect continuous production profiling to hit the mainstream.”

About profiling: Continuous profiling has been available to the public to some degree for over six years. With the addition of profiling to OpenTelemetry, we expect continuous production profiling to hit the mainstream.

To understand why the profiler is important, it’s necessary to put it into context. First, there was telemetry data consisting of logs, metrics, and, more recently, traces, offering the required data to be scrutinized or collected. But once it is collected and observed through monitoring, it doesn’t mean that much if the data has not been parsed or channeled in an appropriate way to eliminate irrelevant telemetry data.

At the same time, the observation of events or performance as an operator using the different telemetry data has been useful to a certain extent. But it falls short of observability, which involves drawing actionable insights based on inferences from this data that was collected using monitoring.

OpenTelemetry offers a standardized process for observability. It’s vendor-neutral and is used to make sense of telemetry data consisting of metrics, logs and traces. It’s also more than merely vendor-neutral in that it is designed to allow the user to integrate the observability tools of their choice into a common approach, thereby unifying them.

In this way, OpenTelemetry plays a pivotal role in integration, serving as a central component in enabling seamless data monitoring and analysis across various environments. With OpenTelemetry, the latest evolution of this integration underscores its significance: This manifests in the form of enhanced profiling capabilities, allowing users to gain deeper insights into system performance and resource utilization.

Going Deeper

OpenTelemetry’s profiler should prove to be useful for users because it goes deeper for observability analysis by extending to the code level. It instrumentalizes a deeper analysis of metrics, traces and logs by extending telemetric data pulled together in a unified stream, which extends to the code level for applications throughout the network. Code is analyzed and stored.

In practice, this means that when a problem arises, or when looking at certain performance aspects that an observability data stream offers — such as when a CPU is running slow or when an end user’s request for data is taking too long — the profile discerns the code at issue. With the right additional tools for observability, fixes should be provided faster, as users will pinpoint problem code more easily through their queries.

The relationships between the messages. Source: OpenTelemetry Project

In a blog post, Austin Parker, director of open source for Honeycomb, described profiling and offered examples, noting that profiles offer support for bidirectional links. This means that the user can dig deeper on a code level from the aspects provided by telemetry data to the corresponding profile. Examples Parker communicated include:

  • Metrics to profiles: Spikes in CPU or memory usage are translated into the code consuming the resources at runtime.
  • Traces to profiles: In addition to being able to pinpoint where high latency is manifested across the network, the profile attached to a trace or span reveals the code responsible for the high latency.
  • Logs to profiles: Logs remain a crucial part of observability along with metrics and traces, but beyond using logs for tracking such issues as out-of-memory errors, the code responsible for the extra memory consumption is shown for further analysis.

Big Contributions

The project should be finalized or working toward its general availability functionality this year — thanks to the continued effort of the community, of course. The project’s creators highlighted some of the key contributors among the OpenTelemetry community, including the following:

  • Felix Geisendörfer (Datadog)
  • Alexey Alexandrov (Google)
  • Dmitry Filimonov (Grafana Labs)
  • Ryan Perry (Grafana Labs)
  • Jonathan Halliday (Red Hat)

Additionally, Elastic and Splunk are making significant donations. According to the proposal documentation, the donation of the Elastic profiling agent will:

“Fill the gap in OpenTelemetry’s component landscape/architecture with a mature, feature-rich and efficient profiling solution. With that, cutting-edge technologies in eBPF and profiling would become a standard through OpenTelemetry for collecting in-production profiling data. Collecting profiling data with OpenTelemetry across a broad range of languages/technologies would come with a frictionless deployment experience.”

The donation follows the “marriage” between the observability tools Elastic Common Schema (ECS) and OpenTelemetry Semantic Conventions. Specifically, the creators of open source Elastic are contributing ECS to OpenTelemetry and are committed to the joint development of the two projects.

Both Elastic and Splunk contributions “are critical to making profiles a first-class signal in OpenTelemetry,” McLean said.

As McLean explained, most profilers don’t use eBPF, as language runtimes like the JVM, .NET CLR, Go runtime, etc., have this functionality built in. OpenTelemetry will be pursuing both direct language profiling and eBPF-based profiling. Directly profiling a language runtime typically provides more data and requires less processing, while eBPF-based profiling can be applied to languages that don’t have profiling features built-in, are easier to set up, and require very little processing (slightly more than direct), McLean explained.

The integration of the Elastic profiling agent, as well as ECS, with OTel underscores Elastic’s and OTel’s combined reach, and its creators’ commitment to allow users to merge telemetry data into a single panel for a more comprehensive analysis for observability. Indeed, the integration of ECS with OTel helps the OTel project move toward the ultimate goal of total compatibility and standardization with any observability tool or process.

In other words, both Elasticsearch and OpenTelemetry — especially since their general availability was released a few weeks before 2024 — are very popular platforms to integrate and work with data logs, metrics and traces from various sources. Their further integration should be appreciated by many.

Splunk has begun the process of donating its .Net profiler. This, its project’s creators explain, will allow OTel to capture profiles from C#, F# and other .NET applications.

The work for Splunk’s profiler for OpenTelemetry also remains ongoing, as is the case for Elasticsearch’s contribution. According to the project’s documentation, the Continuous Profiling configuration is demonstrated by how the OpenTelemetry .NET Automatic Instrumentation logs the profiling configuration at the debug log level during startup. The profiler leverages .NET profiling to perform periodic call stack sampling. For every sampling period, the runtime is suspended and the samples for all managed threads are saved into the buffer; then, the runtime resumes.

Status and Future

Again, the OpenTelemetry Profiler should be finalized this year. It represents the project’s latest milestone following the completion of logs capabilities with OpenTelemetry in 2023. For the future, the project’s creators list these features as “future capabilities” in the documentation for the OTEL specification:

  • Profiles Data Model
  • Profiles API
  • Profiles SDK

“Speaking for myself, back when we were forming OpenTelemetry, the focus was on traces and metrics, and logs were the obvious next step after that. At the time, I had also helped launch what I think was the first publicly available distributed profiling product, and I was really excited about giving all developers insight into how their actual code performs in production,” McLean said. “Making profiles a first-class signal in OpenTelemetry and making these kinds of tools accessible has been a dream and a goal of mine since starting OTel, and it’s incredible to see it realized.”

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Honeycomb.
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.