In the early “noughties,” I began a stint at a consulting company under the wing of a wonderful Indian gent who drilled systems thinking into my head during my tenure at the company. He taught me valuable lessons about how to perceive our clients’ complex technology environments in a holistic way.
Fast forward to 2006, when I was running StyleFeeder, a startup that I founded around that time, my natural tendency was to avoid architecture that increased systemic complexity and, by extension, unnecessary distractions. We ran our customer-facing web infrastructure in a managed hosting facility on dedicated hardware with a private network. I kept things as simple as possible for as long as possible.
And that worked fine until we needed to put 50 million files somewhere. Amazon S3 had arrived on the scene shortly beforehand, so we were compelled to give it a try, mainly because the alternatives were horrible: large filesystems would be hard to maintain, databases were ill-suited to our use case and we needed a solution that didn’t require a massive up-front investment. We got what we needed with S3.
When we needed to integrate two applications in different datacenters, we used SQS for our plumbing. We pushed many tens of millions of messages across the country per month with adequate performance, minimal cost and a simple interface. Once again, we got what we needed.
We ran large offline Hadoop clusters on top of EC2 with great success as well. On-demand elastic computation power worked perfectly for our CPU-bound analysis jobs. Another win.
If the cloud utopia I’m describing sounds too good to be true, that’s because I’m leaving out the hard parts. Many times, we had no insight into our underlying cloud infrastructure. Monitoring, measuring and understanding our usage of cloud resources was near impossible for our small team. We were too busy building our business to dedicate time to this effort and there was a noticeable dearth of tools available to us. Somehow, we managed to skate by.
As I spoke with people who had apps running in the cloud, I came to realize that my cloud experiences were not unique: everyone was suffering from the same problems. Lack of visibility, lack of knowledge, lack of experience. All of these contribute to a degraded cloud experience.
My goal in joining Stackdriver is, in a sense, to build the modern equivalents of the tools that I wish I had at my startup. Many companies, large and small, are investing in internal tools to manage their infrastructure rather than focus on on their core business. While cloud computing has many benefits, it’s not a panacea. In fact, most applications that I have seen people building in recent years are not properly designed for distributed environments and require continual care and feeding.
The bet I’m making is that we can significantly reduce that complexity. What happens after that is particularly exciting, as I hope that people will be able to iterate on their ideas faster, spend less time on system needs and more time on the things that matter.
We’d love to hear your stories of cloud complexity and how you handled it. We’ll also be at re:Invent in Vegas at the end of November, so please find us and say hi!