Mphasis | How to Scale Observability and Build Value for Enterprises

December 01, 2021

How to Scale Observability and Build Value for Enterprises

Nitin Gokhale

Global Vice President, Head - Transformation Solutions, and DevOps

Every day, the old ways of monitoring systems are becoming obsolete. New tools and approaches for observability are gaining momentum. In a recent VmWare study, more than 86% of the engineers surveyed agreed cloud environments are more complex than they were just five years ago. As many as 80% agreed legacy monitoring systems were no longer enough. In such an environment, how do organizations successfully scale observability for enterprise customers? In this blog, I discuss the observability challenges facing modern enterprises today and I put forward an approach to guide organizations through this difficult landscape, based on the Mphasis experience.

Observability in enterprises today

Three typical challenges limit the ability to scale. First, there is the choice of an IT paradigm. Next, organizations are confronted by a fragmented tool chain. Finally, organizations find linking “observations” to “resolutions” requires manual intervention. Let us consider each of these problems.

Choosing an IT paradigm: The observability solution framed by an organization is also the result of its choice of IT paradigm. Often, teams within an enterprise focus solely on uptime availability and infrastructure assurance. Their focus is entirely on avoiding a system-wide shut down. Understandably, all their efforts are directed towards assessing and mitigating negative impact, which could bring their systems to a grinding halt. Yet not all transactions have high workloads. But these transactions could help a business make money and create high value. This makes it imperative for organizations to ensure such transactions run smoothly. If the flow of these transactions is interrupted, an organization will need to convert data into insights to prevent such outages. In our experience, organizations always struggle to balance these two points of view. But it is essential for organizations to be able to quickly switch between the two paradigms.

Navigating a fragmented tool chain: Monitoring transactions is a complex process in organizations with thousands of applications. Some of these could have been built decades ago. These transactions flow through legacy system applications, COTS products, and third-party applications. How do organizations ensure those technological barriers are overcome to create transaction-oriented telemetry for better intelligence and understanding? That’s where tools that support standardization are critically important for next-generation enterprises. If ten years ago there were not enough open standards to support observability, today, there are many, such as open telemetry. It becomes essential to build products around these standards, so they are valuable for the long run.

Linking “observations” to “resolutions” requires manual intervention: In a digital world powered by intelligent technology, it is no longer sufficient for organizations to have “single pane of glass” observability. Enterprises need to choose self-healing deployments. Traditionally, when support teams receive a system alert, they try to identify the problem and take remedial action. But today, using solutions such as transaction-oriented telemetry puts us in a much better situation to decode the problem, propose a solution, and even automate the remediation action. This gives us more than just a single pane of glass. It becomes a self-sustaining and self-healing environment. That is clearly our preference.

The Mphasis approach to observability in enterprises

There is indeed a need to a need to simplify observability for organizations in today’s complex application landscape. We approach this for large scale deployments by defining team topology, measuring flows, and improving flow metrics. Let me summarize what this means for organizations.

Defining team topology: We assign a system engineering team to the task, focused on keeping the production environment running and ensuring transaction flows are moving smoothly.

Measuring flows: No matter which team is working or the primary applications running, our principal goal is always to measure transaction flows. An application may have a very high uptime, but in the context of specific transactions this may not be very relevant. Understanding the context of transitions flows is essential to ensuring high flow and high value is created through business assets. At the heart of our measurement framework is transaction flow and resource health. To measure transaction flows, we assess call-flow bottlenecks by identifying critical end-user transactions, tracing the process flow and setting up measurements, and using dynamic thresholding techniques to generate proactive alerts. To track resource health, we spot resource hotspots by Identifying critical resources participating in key end-user transaction, setting up measurements for critical resources, and using dynamic thresholding techniques to generate proactive alerts.

Improving flow metrics: Our success is measured by three metrics we are constantly striving to improve—Mean Time To Recovery (MTTR), Mean Time Between Failures (MTBF), and Turn Around Time (TAT). We use these measures to continuously monitor what we can do to ensure the health of our applications and quicky fix issues related to system failure. The best practices an organization adopts must focus on improving these three metrics. When we focus on this, we can improve observability across the organization.

We did this successfully when we brought the Mphasis approach to observability to a leading brokerage firm in U.S. Intermittent issues with critical transactions had resulted in millions of dollars in lost revenue for the organization. Our transaction-based measuring of flows showed dramatic results in just six months. There was a 90% improvement in selected flows with the Mphasis approach to preempting incidents. Today, we continue to power the firm’s performance, observing 80-100 flows within the organization for proactive alerts and automated recovery.

At Mphasis, this is an integral part of our DevOps offering. In today’s world, we recognize, organizations must focus on accelerating the delivery of business value. We identify the key metrics for an organization and use it drive IT value stream optimization. Our approach scales observability and builds value for enterprises. That’s how we #leadthechange.