How OpenTelemetry (OTel) is underpinning Observability

What is Observability?

Observability lets us examine a system from the outside by letting us ask questions about that system without knowing its inner workings.

The main difference between observability and monitoring is the level of insight they provide into a system. Monitoring involves collecting data and tracking predefined metrics or events to detect when something goes wrong or deviates from expected behaviour. Observability, on the other hand, involves capturing and analysing all available data to gain a complete understanding of the system’s behaviour, even when unexpected events occur.

Observability provides a more comprehensive view of a system, enabling engineers to ask any question about its behaviour and get meaningful answers, even for issues they were not explicitly monitoring. Monitoring, in contrast, is more focused on predefined metrics and may miss issues that fall outside of those metrics.

To observe a system, the application must be properly instrumented. That is, the application code must emit signals such as traces, metrics, and logs. An application is properly instrumented when developers don’t need to add more instrumentation to troubleshoot an issue because they have all of the information they need.

OpenTelemetry is the framework for instrumenting application code to help make a system observable.

What is OpenTelemetry?

OpenTelemetry (also referred to as OTel) is an open-source observability framework made up of a collection of tools, APIs, and SDKs.

The primary goal of OpenTelemetry (OTEL) is to offer vendor-neutral ways of application instrumentation such that customers are able to switch between Telemetry backends. There are three main components of OpenTelemetry: OpenTelemetry SDK, OpenTelemetry API, and OpenTelemetry Collector.

Further, tracing distributing systems could be complex, especially when you have so many services in between and multiple telemetric vendors like Jaeger, Prometheus, AppDynamics, Datadog, Elastic, etc. The pain point is you have to deal with specific protocols or formats that all these vendors used, and the lack of standardisation causes a burden on the instrumentation maintenance and lack of data portability. That’s why OpenTelemetry(OTel) has been created as a CNCF(Cloud Native Computing Foundation) project to standardise the telemetry data model, architecture, and implementation for observable software.

There are OpenTelemetry concepts we need to understand before implementing distributing tracing using OpenTelemetry, let’s review them:

OTel Collector

The OTel collector is a vendor-agnostic proxy/middleman between your application and a distributed tracing tool. The collector receives telemetry data, processes it, and then exports the data to tracing tools that can store it permanently.

Also, the OpenTelemetry Collector is written in Go and licensed under Apache 2.0 license which allows you to change the source code and install custom extensions. That comes at a cost of running and maintaining your OpenTelemetry Collector instances.

Otel configuration Image source : https://opentelemetry.io/docs/collector/

Receivers

Receivers get the data into the collector. There are two kinds of receivers: push-based and pull-based. A receiver takes data in a format, converts it to the internal format, and delivers it to the processors and exporters described in the relevant pipelines. The format of the available traces and metrics varies depending on the receiver.

Processors

Processors are used to process data before it is sent to exporters. Processors may be used to change the name of the span or to transform the metrics. You may also batch the data before sending it out, retry if the exporting fails, add metadata, and use tail-based sampling.

Exporters

An exporter is a component in the OpenTelemetry Collector configured to send data to different systems/back-ends. Different exporters converts OpenTelemetry protocol (OTLP) formatted data to their respective predefined back-end format and exports this data to be interpreted by the back-end or system.

Extensions

Extensions provide capabilities on top of the primary functionality of the collector. They do not require direct access to Telemetry data and are mostly used for managing and monitoring an OTEL collector. Extensions are optional.

Service

The service section is used to enable the components which are configured within receivers, processors, exporters, and extensions sections. The service section consists of two sub-sections: extensions and pipelines.

The extensions consist of a list of all of the extensions to enable. Pipelines can be of traces, metrics, and logs type and consist of a set of receivers, processors, and exporters. Each receiver/processor/exporter must be defined in the configuration outside of the service section to be included in a pipeline.

Configuration

Otel configuration

The above is an example of otel collector configuration YAML file. There are a few components here in the configuration of the otel collector. These components are receivers, processors, exporters, extensions and services, which were described in the previous section.

The OTel collector configuration file contains an OTLP - this stands for OpenTelemetry protocol, which defines the encoding, transport, and delivery mechanism used to exchange data between the client and the server.

Consider this use–case: assume you need to export data to several observability vendors such as AppDynamics, Jaeger, and Datadog; how do you adapt to the various forms required by each platform? Of course, you may manually convert the raw data to the format required by each vendor, or you can simply use the OTLP format, which is the universal, standardised format accepted by the majority of telemetric providers currently. In summary, the data collected by the collector will be in OTLP format if OTLP is used as a receiver. OTLP is a request/response protocol; you can receive traces via HTTP/JSON **or **gRPC. Refer to What is gRPC for details.

Conclusion

The primary benefit of OpenTelemetry is that it provides a unified standard for creating and ingesting telemetry data, much like container orchestration standards set years ago by Kubernetes. OpenTelemetry provides a consistent collection mechanism and format without locking technologists to a specific vendor to store and analyse this data.

Using metrics, events, logging, and tracing (MELT) to practice observability, OpenTelemetry can provide developers and SREs with a complete impression of app performance previously provided by basic monitoring.