Announcing Policy-Driven Automation

Available to All Stackdriver Pro Customers who Sign up by December 31st, 2013

At Stackdriver, we focus on helping DevOps identify, diagnose, and address performance issues as quickly as possible.

Alerting policies play an important role in this process.  Stackdriver customers rely on our system to notify them when issues occur in their environments.  Typically, after receiving notifications, administrators use our dashboards and logs to diagnose issues and then take action within their environment to resolve issues.  There are some scenarios where they skip the diagnosis step altogether–where the appropriate action is well-known–and it is just a matter of going through the motions in order to resolve the incident.

Today, we are announcing Stackdriver Automation, which enables customers to use alerting policies to configure actions that the system should take when specified criteria are met.  When the policy is triggered, the Stackdriver service automatically makes appropriate API calls to initiate the actions on the customer’s behalf.

1bis

When configuring an action, a user can choose a delay as well as how they want to be notified. The Action Delay specifies how long the resource should be in violation before Stackdriver executes an action. In the event of a transient issue, this allows for the resource to recover before an action is taken. Notify Me specifies how you want to be notified when actions are happening. During a multi-step process such as Move Host, where a stop and start API call is issued, Stackdriver will notify you after each step is completed. You can optionally be notified only once the entire process is complete.  We see several uses for the automation capability:

Degraded Host

A common scenario is that EBS-backed instances end up on a host with degraded performance.  In nearly all cases, customers address this issue by stopping the instance and starting it again.  We decided to automate this:  A policy can now be configured to trigger when CPU Steal exceeds a particular threshold.  In addition to generating a notification (email, SMS, PagerDuty, etc.), the policy can initiate a “Move Host” action that will stop and then a restart the instance. The result is that the EBS-backed instance will be moved to a new host.

RDS Capacity Issues

RDS instances frequently run low on storage capacity.  Again, in these cases, the initial step is pretty clear; add more capacity.  Rather than performing this manually, customers can now use Stackdriver to “autoscale” their RDS capacity.

2

Memory issues with applications

This arises, for example, when working with web apps running under Apache. There are times when long-running processing will consume all of the available memory on a box, which can render the service unavailable. Often, a simple reboot is all that is needed in order to address the issue. In this scenario, you can now set a policy that will trigger a reboot action when memory reaches a threshold.

3

Initially, Stackdriver Automation will include the following actions:

  • Reboot an Instance

  • Move an EBS Backed Instance

  • Add Additional Capacity to an RDS Instance.

We are already working to add additional actions, and we are eager to hear what our customers come up with!

Note:  The automation feature set will be included in a new premium Stackdriver subscription that will be announced later this month.  All Stackdriver Pro customers who sign up by December 31st, 2013 will be automatically upgraded to the premium plan at no additional cost.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>