Kubefeeds Team A dedicated and highly skilled team at Kubefeeds, driven by a passion for Kubernetes and Cloud-Native technologies, delivering innovative solutions with expertise and enthusiasm.

Observability in 2025: OpenTelemetry and AI to Fill In Gaps

7 min read

An open box with a star atop it is encircledd with various lines coming out from it to symbols representing growth and trends to watch.

Observability, like software testing, should be a way to detect and analyze any code anywhere in the supply chain or on a network. It should predict coming errors and even disasters, or the feasibility of a particular project. It also should increasingly automate these tasks, such as in the event of a compromise when a bad actor has gained access to a network, stack, container, etc.

Gone are the days when observability was mainly handled by operations engineers, who parsed through firehoses of logs, metrics and traces to figure out and debug when and how things went wrong. In 2024, we saw the application of observability and its use beginning with the shift-left cycle for developers, extending through greater capabilities in the stack, and now also extending to the network and continuing through the edge in highly distributed systems.

OpenTelemetry made some huge strides in 2024 as a way to stay in our sights and deliver other benefits. We’ve already begun to see the repercussions in 2024, and 2025 should see even greater things in what OpenTelemetry offers through one of the most ambitious and successful open source projects.

AI, of course, is a big story everywhere, but now we’re starting to see through the hype. Its revolutionary aspects, at least in observability, will not be immediate. Stung by shrinking budgets and rising cloud and observability costs, organizations are demanding not only cost reductions but are demanding that observability platforms live up to their paid-for promises with added features, as providers look to lower the costs with what they offer.

In this context, 2025 should be a good year for observability, at least in terms of growth. The market is expected to grow 15% from 2022 through 2027, according to Gartner. Enterprises will rely on observability for productivity improvement, revenue growth and organizational culture transformation, according to Gartner analysts Pankaj Prasad and Matt Crossley write in “Gartner’s Hype Cycle for Monitoring and Observability, 2024.

Here are five predictions on what to expect in the observability space in 2025.

1. OpenTelemetry Scales Up

OpenTelemetry is the success story of 2024, as a way to instrumentalize the standardization of tools for observability, covering metrics, traces, logs and much more.

OTel has already offered immense benefits to user organizations as an open source standard that provides greater freedom of interchangeability between different observability solutions and tools. Observability providers are increasingly offering OpenTelemetry in a de facto standardized way, simplifying and enhancing freedom of use across different providers.

“It’s clear that many are seeing the standardized benefits of OTel’s offerings in collecting observability data,” Morgan McLean, senior director of product management at Splunk, a Cisco company, and co-founder of OpenTelemetry, told The New Stack. “By 2025, OpenTelemetry will be firmly established as the industry standard, with major companies across industries — airlines, banks and other businesses — utilizing OpenTelemetry and adopting it widely.“

OpenTelemetry is seen as a key component in cost optimization. Earlier this year, OpenTelemetry‘s profiler was shown to be as important as metrics, traces and logs data. General availability of OTel’s profiling signals is targeted for mid-2025, although the profiler has been available for use to some degree for over six years, McLean told the New Stack in November.

“With the upcoming release of OpenTelemetry profiling signals, organizations of all sizes will soon have access to the tools necessary to identify code inefficiencies without the need for custom-built solutions,” he said in a separate email to TNS.

As it becomes more widely adopted, OpenTelemetry will be cemented as a key driver of innovation in the observability space, McLean said. “This shift will mark the beginning of a new era in observability maturity, which will be characterized by seamless, standardized data collection,” he said. “This will enable organizations to gain deeper insights and streamline data management, ultimately supporting more effective decision-making and enhanced operational efficiency.”

2. Observability Shifts Right

The number of devices available in consumer and industrial use for edge-computing environments is expected to increase rapidly. These devices continue to offer more powerful computing and connectivity capabilities.

Their increased use also means that observability and monitoring must extend to edge devices. For observability companies that have not already offered this functionality, addressing this need in 2025 will be critical to meet more customers who are extending their stack-to-edge environments.

Better frontend observability offers a direct window into the user experience. Thanks to the standardization that OpenTelemetry offers, users should be able to benefit from tools and platforms that not only allow more dynamic debugging of application errors that users face or for connected edge devices such as sensors, but be able to detect potential problems before they occur, while also feeding telemetry data for improved backend-performance analysis.

The idea is to help provide real-time fixes and improvements to the customer experience. The need to improve customer experience was always there for the millions of mobile apps in use, to edge devices and deployments.

Now that observability has moved beyond simple monitoring of logs, traces, and metrics — and thanks to OpenTelemetry — observability providers will increasingly be able to offer organizations much more visibility into the user experience than before, said Torsten Volk, principal analyst of application modernization at Enterprise Strategy Group.

“OpenTelemetry provides and devotes resources to building an observability platform so that providers can concentrate on creating features and supporting frontend services more than they could before standardization and other benefits OpenTelemetry provides became more widely available,” Volk said.

3. Observability Shifts Left

Platform engineers, operations engineers, DevOps, and all stakeholders are realizing that observability can be useful to developers during the development cycle. This is especially important for highly distributed and interconnected services and applications, such as Kubernetes, which are also highly distributed.

While going beyond testing, observability into the stack at a very detailed level—and how it interplays with the rest of an application across the development cycle — is another critical aspect of observability. This aspect should finally see more wide-scale deployment in 2025.

Again, thanks to OpenTelemetry and profiling signals, organizations of all sizes will soon have access to the tools necessary to identify code inefficiencies without the need for custom-built solutions, McLean said.

“This enhancement will also help improve the developer journey as profiling offers an unparalleled view of their code’s impact on facilitating faster, more cost-effective optimizations,” he said. “Observability leaders specifically are experiencing lower observability costs due to OpenTelemetry.”

Gartner describes this shift-left trend as observability-driven development (ODD) software becoming part of engineering practice that “provides fine-grained visibility and context into system state and behavior by designing observable systems,” Gartner analysts Prasad and Crossley wrote in their “Hype Cycle” report, cited previously. “ODD works by instrumenting code to unravel a system’s internal state with externally observable data. As part of a shift-left approach to software development, ODD makes it easier to detect, diagnose and resolve unexpected anomalies early in the development life cycle and in production environments.”

4. AI: Still Hyped, but Now More Relevant

AI/machine learning and generative AI will, of course, continue to have a potentially profound impact on observability’s development and use for observability. While 2025 will invariably see new offerings use AI/ML to analyze and process telemetry with well-trained LLMs,  we are just at the beginning stages of its use and adoption.

So far, there is a lack of clarity on how GenAI might be used to create low-code observability artifacts, according to Gartner’s Prasad and Crossley.  Business focus is shifting from excitement around foundation models to use cases that drive ROI.

“Most GenAI implementations are currently low-risk and internal. With the rapid progress of productivity tools and AI governance practices, organizations will be deploying GenAI for more critical use cases in industry verticals and scientific discovery,” wrote Prasad and Crossley. “In the longer term, GenAI-enabled conversational interfaces will facilitate technology commercialization, democratizing AI and other technologies.”

Indeed, we are experiencing an AI bubble already, Honeycomb.io’s Charity Majors, CTO and co-founder, and Phillip Carter, principal product manager, wrote in a post on their company’s blog, but that is par for the course:  “Is there an AI bubble? Yes, almost certainly. However, in technology, the size of the bubble often correlates with the magnitude of its ultimate impact. AI is not magic, but it is a tool with many powerful applications.”

At least for observability, the idea is that it will continue to improve as a less-than-perfect co-pilot assistant and for observability analysis and predictive outcomes.

“The ultimate goal is to give engineers more time to innovate rather than troubleshoot. AI/ML should be a co-pilot, not an autopilot, and will help junior developers perform at the level of a senior SRE,” Tom Wilkie, Grafana’s CTO, told The New Stack. “2025 will get us closer to this reality, but we don’t see a future in which AI/ML replaces humans — it just makes them smarter. AI/ML in observability is about scaling human intelligence, not replacing it.”

In 2025, Wilkie said, the integration of AI/ML should offer:

  • Cost optimization: Manually analyzing usage patterns for millions of time series is not feasible, which is why we’ve created a suite of Adaptive telemetry tools powered by AI/ML. These solutions automatically aggregate unused and partially used data (metrics, logs, and traces) into lower cardinality versions of themselves to reduce costs.
  • Reduced operational toil with predictive capabilities:  Anomaly detection was the most sought-after features among respondents in a Grafana study. “We believe there is a lot of time-saving potential in this area, which is why we’re investing deeply in it. AI can automate routine tasks that traditionally consume engineers’ time,” Wilkie said. “Instead of manually sifting through logs and metrics, engineers can leverage AI/ML to quickly surface anomalies — and potential root causes.”

Data lakes are an emerging technology for observability that should see widescale adoption thanks to GenAI in 2025. To make the most of data lakes, a platform that can observe LLM-based applications is required. With a data lake, an observability platform can be used to analyze the data without the customer having to forgo data sovereignty and security, ensuring control over where data resides and enabling compliance with regulations.

A single data lake can also scale infinitely, while continually training LLMs for improved AI-assisted data analysis. With the rise of LLMs and Generative AI, data lakes are becoming essential for troubleshooting these models. More organizations will likely look to take advantage of these benefits in 2025.

“Generative AI and LLMs are on the rise, but monitoring them requires processing large volumes of data in real time, which can be costly with [Software as a Service] observability vendors,”  Krishna Yadappanavar, co-founder and CEO of Kloudfuse, told The New Stack. “Deploying observability on a customer’s private cloud infrastructure offers a more cost-effective and scalable solution.”

These models also rely on multistep reasoning, sourcing data from various LLMs, databases, vector embeddings, function calls, and more, Yadappanavar said. To effectively analyze and troubleshoot performance across LLM chains, Yadappanavar said, a unified data observability platform is essential. It must consolidate and analyze distributed tracing, logs, metrics and events, with broad data source integrations and an open architecture to bring data from all sources.

5. Observability Costs Should Decline

During the past couple of years, there has been a back-and-forth between highly valuable data feeds or telemetry data feeds that are beneficial for developers and operations teams. However, those observability feeds come at a cost. It’s not unheard of for some large customers to spend tens of millions of dollars annually on an observability solution. In certain cases, these costs include security coverage, depending on the observability provider.

This pay-as-you-go model is increasingly scrutinized by CFOs and other financial decision-makers, who are under pressure to reduce spending. As a result, DevOps teams are being asked to be more selective about the telemetry data they pay for, focusing on observability and service analysis. In 2025, as customers and organizations demand more advanced features, they certainly won’t want to pay more. Instead, they will look for ways observability providers can help them reduce costs through better tools or practices.

Said Wilkie, “Users are right to demand not only more features for the price they pay but for data feeds that discern between costly waste and bills and metrics and other telemetry data they really need.”

The post Observability in 2025: OpenTelemetry and AI to Fill In Gaps appeared first on The New Stack.

Kubefeeds Team A dedicated and highly skilled team at Kubefeeds, driven by a passion for Kubernetes and Cloud-Native technologies, delivering innovative solutions with expertise and enthusiasm.

Leave a Reply

Your email address will not be published. Required fields are marked *