Remember to build for failures 

Always be prepared for downtime in AWS or any other cloud host

The Internet is abuzz with talk about failures of the cloud again after a brief AWS outage yesterday in the us-east region that managed to take some high profile sites offline for most of an hour. From the status report from Amazon, there was a partial network outage within an availability zone which cascaded to […]

Using Amazon EC2 IDs Instead of Names 

When working with cloud resources, it’s often useful to know the Amazon EC2 ID of the resource instead of using a name. This is especially true when tying data together into multiple systems and connect between them. Using Amazon EC2 IDs can be useful when tying data into into multiple systems and connecting between them. This guide will show […]

5 Tips for Getting the Most Out of Stackdriver 

To celebrate our public beta launch today, I’ve assembled instructions for those of you new to intelligent monitoring on how to make the most of Stackdriver. This guide will cover creating policies, groups and dashboards, alongside tips on health checks and inviting your co-workers.

Organizing Amazon S3 

One anti-pattern that we have observed during the course of building our intelligent monitoring product has to do with how companies use Amazon S3. S3 is a great product, unmatched in its simplicity, reliability and availability. Amazon claims that customers store over 1 trillion objects in S3, so there’s no overstating the degree to which […]

6 Dos and Don’ts of AWS Elastic Load Balancers 

Whether you’re a traffic-heavy media site or a small simple e-commerce site that foresees a wave of traffic coming its way, you wil find Elastic Load Balancers incredibly useful to keep your ship from tipping. With that in mind, I’ve decided to ask the web and our engineering team for a few tips and tricks that will ensure you’re making the most out of your ELBs.

Using Persistent Job Flows with AWS EMR and mrjob 

Here at Stackdriver, we use Elastic MapReduce (EMR) to perform certain compute operations on our growing data sets. We have been using Yelp’s terrific tool, mrjob, for easy, programmatic control of our EMR jobs. One roadblock that we bumped up against is one that other mrjob users also battle with (based on the number of unanswered questions on message boards); how can one make use of persistent job flows with EMR and mrjob? We figured it out the hard way. This describes our approach.

Computing last access time for S3 objects 

Developers occasionally want to calculate the last access time of objects on storage systems. Amazon S3 does not provide direct access to this data, but one can infer last access time from the S3 logs (provided that Server Access Logging is enabled for objects in the specified bucket). In this post, I walk the reader though the process of configuring server access logging for S3 buckets, accessing that data, and calculating last access time using Python and Amazon Elastic MapReduce.

Using Rackspace Cloud Block Storage from Python 

Rackspace announced the availability of their cloud block storage offering for their OpenStack based public cloud last week.  This is intended to provide some parity with Amazon’s Elastic Block Storage (EBS) service with some additional nice features, not the least of which is being able to back your volumes with an SSD.  But while there […]

What Other Services Fail with EBS? 

Amazon had another outage today which impacted users of EBS in a single availability zone of the us-east-1 region. A number of other services, including RDS, the AWS Console, Elastic Beanstalk, CloudSearch, and ElasticCache, were affected. These simultaneous failures provide interesting insights into dependencies within the services provided by AWS.