Intelligent Monitoring – Cluster Outlier Detection

Here at Stackdriver, we continue to improve the techniques we use to analyze your cloud deployments. We aim to highlight unusual behavior in your systems to help you focus your limited time to the interesting and potentially impactful events in your infrastructure.

Today, we have released a new analysis technique. We call this technique cluster anomaly detection. As you may know from working in our app, we define a cluster to be a collection of instances in your application that you expect to behave similarly. In a cluster, for example, you would expect network traffic or memory usage across instances to trend in concert. When the behavior of instances diverges, it is often a clue that something is failing…or about to fail. By focusing on outliers in your clusters, you can find instances that are not performing according to your expectations and take steps to address the problem.

To identify outliers in your clusters, we compare a number of metrics across the instances in your cluster. When we find an instance acting differently from the rest, we make note of this observation under the Events -> Insights link in the application.

Anomaly detection for clusters

The technique identifies the instance and metric that has diverge from it peers.

Cluster anomaly detection

The analysis can also detect changes across multiple metrics.

Cluster anomaly detection - multiple metrics

If you’re already a Stackdriver user, log into your environment and check out the Insights tab. Otherwise, we invite you to sign up for our free beta and see how we can help make DevOps easier.

Comments are closed.