I am here in Las Vegas as AWS re:Invent. Today, I attended a mind-bending talk called “Why Scale Matters and How the Cloud Really is Different”. The presenter was James Hamilton, Vice President & Distinguished Engineer at Amazon Web Services.
In the introduction, Hamilton observed that rarely is there a shift in the industry as large as the current shift to the cloud. Even more rare is the speed with which this major shift is occurring. This historic shift is driving innovation to support ever larger scale.
In the first part of the talk, Hamilton provided a few data points to hint at the scale of the AWS infrastructure and services.
On the infrastructure side:
* Consider the amount of infrastructure that Amazon required when it was a $7B business. AWS is adding more server capacity than that every day. And that is every day–seven days a week–not just business days.
* They host in 9 data center regions and 25 AZs. Each AZ is at least one DC. The us-east region alone has more than 10 data centers.
On the services side:
* The S3 object store is now hosting trillions of objects. At peak, it serves 1.5M requests per second.
* The Dynamo NoSQL database serves, in a single region, more than 2.3M requests per month.
The second part of the talk, Hamilton outlined some of the aspects of infrastructure where AWS innovates to reduce costs.
* AWS designs custom servers. They achieve cost and performance benefits by designing specifically for the AWS workload. They also enjoy a 30% price savings by eliminating distribution channels.
* AWS designs custom high-density storage enclosures. While he would not describe the AWS design, he did compare it to a commercial offering. One of the densest commercial storage designs puts 60 disks in 4U, or 600 disks in a rack. The weight is roughly 3/4 ton per rack. The AWS design is higher capacity and more dense, weighing over 1 ton. Their design uses less power and less space while being cheaper and more efficient.
* AWS designs custom networking gear. Arguments of cost and usage patterns drive this decision. On the cost side, the relative price of hardware components is shifting. While server and storage costs are falling with Moore’s Law, the cost of networking gear is stuck. Consequently, networking is becoming a bigger fraction of the total cost each year. As for usage models, traditional workloads could tolerate up to a 100x overscription on the network without any performance impact because each server was only occasionally using the network. With new computation models, like Map/Reduce, it is often the case that all nodes will want to use the network at the same time. To address these problems, AWS designs their own networking gear. They have also developed a custom software networking stack to improve performance. Outside the single data center, AWS owns both metro-area networks (to connect data centers in a region) and the long-haul fiber (that span regions).
* AWS even optimizes the power infrastructure. In addition to negotiating power purchasing agreements, they design and build their own electric substations. The primary driver here is time. In a typical jurisdiction, it takes two years to build a new substation. By owning the design and construction, AWS can do it in less than half the time. They are also focusing in carbon neutral power options. The multi-data center us-west (Oregon) region, one of the largest and fastest growing regions, is 100% carbon-neutral.
* AWS also innovates in the supply chain and procurement process. They use direct component purchasing and forecasting purchases only in the near future to reduce costs.
Finally, Hamilton argued that utilization was one of their strongest levers for reducing costs. Typical enterprise data centers have utilization of 10-20%. 30% is considered terrific utilization. By diversifying across multiple sectors and workloads, AWS achieves much higher utilization, effectively reducing costs. Spot instances exploit short-term excess capacity to drive more profit. As long as the price is greater than the marginal cost of power, running the instance is profitable for them.
During the question period following the talk, Hamilton went on the record on a topic I have heard broadly speculated. “Does the Amazon retail business use AWS?” Unequivocally, yes. He added that some legacy systems have not yet migrated to AWS, but all new services and most of the old services do run in AWS.
This talk provided a peek into the amazing scale of the AWS infrastructure in a truly entertaining and mind-bending way. If you have a chance to watch the talk on-line in its entirety, I highly recommend it!