Here at Stackdriver, we continue to improve the techniques we use to analyze your cloud deployments. We aim to highlight unusual behavior in your systems to help you focus your limited time to the interesting and potentially impactful events in your infrastructure.
Today, we have released a new analysis technique. We call this technique cluster anomaly detection. As you may know from working in our app, we define a cluster to be a collection of instances in your application that you expect to behave similarly. In a cluster, for example, you would expect network traffic or memory usage across instances to trend in concert. When the behavior of instances diverges, it is often a clue that something is failing…or about to fail. By focusing on outliers in your clusters, you can find instances that are not performing according to your expectations and take steps to address the problem.
To identify outliers in your clusters, we compare a number of metrics across the instances in your cluster. When we find an instance acting differently from the rest, we make note of this observation under the Events -> Insights link in the application.
The technique identifies the instance and metric that has diverge from it peers.
The analysis can also detect changes across multiple metrics.