Tuesday, July 05, 2011

Amazon Web Services Summit 2011: Navigate the Cloud

On June 21, 2011 I went to an Amazon Web Services conference in San Francisco. These are my notes.

In general, the conference was informative, but a bit subdued. It was about a quarter of the size of Google IO, and they weren't giving away anything huge for free. Furthermore, it was only one day, and the size of the crowd was much smaller. However, that makes sense since they have multiple of these conferences per year. Nonetheless, it was interesting, and I learned some stuff.


The part of the keynote was given by Dr. Werner Vogels, the CTO.

AWS doesn't lock you into an OS or language.

There are 339 billion objects in S3, and S3 handles 200,000+ storage transactions per second.

On a daily basis, they add as much capacity as they had in total in

Spot pricing allows you to spin up an instance when the price is right.

AWS CloudFormation deploys a stack of software.

AWS Elastic Beanstalk is easy to begin and impossible to outgrow.

CloudFormation and Elastic Beanstalk don't cost anything extra.

They now have Simple Email Service.

They're building features by working from the customer backwards. They start by writing the press release, then the FAQ, next the documentation, and then finally the code.

They mentioned that Amazon RDS (Relational Database Service) is not as scalable as S3.

AWS is now a PCI level 1 service provider.

amazon.com (i.e. the retail business) is moving to AWS.

They mentioned Amazon VPC (Virtual Private Cloud).
There have been more than 2 million downloads of Sketchbook Mobile.

Homestyler is the fast, easy, free way to design your dream home.

Autodesk uses EC2, S3, EBS, and CloudFront.

Homestyler can now do photo realistic renderings of your home.

They mentioned project Nitrous. It provides a rich, in-browser experience. It's like Google Docs but for your 3D modeling documents.

They're doing stuff we're they're rendering in the cloud and then streaming to an iPad.
They handle customer data in real-time (i.e. with 100ms latencies) on AWS.

They have a primarily ex-Google team.

They provide customized ads based on user behavior.

They have some serious latency requirements. They had to hit 120ms end-to-end. They worked closely with AWS to meet their needs.

Their ad performance is incredible. They have a 7.5% click through rate.
This part of the keynote was given by Mitch Nelson, Director, Managed Systems.

LiveCycle is a managed service.

They talked about the Flash Media Server.

Adobe Connect is a managed service. It's an online meeting product. It's based on Flash.
Cloud Computing at NASA
This part of the keynote was about the JPL (Jet Propulsion Laboratory).

They wanted to augment availability in multiple geographic regions.

Some things can be safer in the cloud.

They're using the VPC (Virtual Private Cloud). They use IPSec.

They've saved a lot of money by adopting cloud computing.

Buying infrastructure way ahead of time is expensive for NASA. Cloud computing saves NASA a lot of money by allowing them to scale to meet their needs.

He talked a lot about the Mars Rover.

They spin up a bunch of instances to handle bursty traffic, processing data from the Mars Rover, which sends data once a day (I think).

The speaker said that the "missions" are his "customers".

These projects are using cloud computing: Mars Rovers, Deep Space Network, Lunar mapping
product, and airborn missions. None of them look like they involve astronauts.
SmugMug is profitable. They have no debt. They're a top 250 website. It's a private company. They had $10M+ in yearly revenues in '07.

They provide unlimited storage for storing photos and unlimited bandwidth for uploading and download photos. They're motto is "More, better, faster" (in comparison to other photo sites).

They can handle photos up to 48 megapixels in size.

They can handle 1920x1080p video.

They use a hybrid of AWS and their own datacenter. They have 5 datacenters. They're moving completely to AWS. They have many petabytes of data stored in S3.

Amazon really listens to their customers, trying to solve their needs.

One time, SmugMug's RubberBand project tried to spin up 1920 cores in a single API call. Amazon spun them all up as requested. SmugMug renamed that project SkyNet.

SmugMug lets customers tie their Amazon and SmugMug accounts together so that SmugMug can pass on the cost of storing raw photos and videos onto their customers (since they're so expensive to store).

SmugMug is designed for breakage, so there was minimal impact during the "Amazonpocalypse". They made use of multiple availability zones.

EBS is just like a hard disk. If you want local redundancy, use RAID. Want GEO redundancy? Replicate. EBS can/does fail just like HDDs. EBS does solve lots of problems, but use it in the appropriate way.
Customer Panel
Use Akami for dynamic generation of content, and use CloudFront for more static content.

Dealing with Akami is a pain in the butt.

The customers on the panel made it through the Amazonpocolypse fairly well. However, they had to architect for it.

The elasticity of Amazon's services is what saves the most money. Customers only pay for resources when they use them. The customers on the panel were pretty good about spinning up and spinning down instances. Also, the customers said that they can't build their own data centers fast enough to meet their own needs.

TellApart uses Spot instances to run Hadoop jobs. That means they only spin up instances when the price is low enough.

Without AWS, a lot of the customers would never have been able to attempt certain projects.

Akami hasn't broken for SmugMug. HAProxy also hasn't broken for them.

AutoDesk is using Scala.

TellApart uses GAE (Google App Engine) for its own dashboards.

The NASA guy uses CloudFormation to setup templates for their clusters.

Amazon often builds infrastructure that their customers have already built so that all the other new customers don't have to rebuild that same infrastructure.

RDS has very unpredictable latency. None of the panel customers were using it.

Security and Compliance Overview

This talk was given by Matt Tavis, Principal Solutions Architect.

Here's a link to the AWS Security Center.

Security on AWS is based on a shared responsibility model.

AWS is responsible for facilities, physical security, network infrastructure, etc.

The customer is responsible for the OS, application, security groups, firewalls, network configuration, and accounts.

AWS has complex features for who has access to interact with which AWS APIs if you have multiple people at the same company.

AWS favors replication over backup.

I wonder if you can mess around with EC2 or EBS and try to read the raw disk to see if there is stuff left over from a previous user.

By default, there's a mandatory inbound firewall, which defaults to deny.

VPC is a networking feature.

High Availability in the Cloud: Architectural Best Practices

This talk was by Josh Fraser, the VP of business development at RightScale.

RightScale is the world's #1 cloud management system.

RightScale serves 40,000 users and 2.5 million servers.

RightScale is a SaaS.

RightScale is the layer between the app and the cloud provider.

You need to design for failure.

Backup and replication need to be taken seriously.

A cloud is a physical datacenter entity behind an API endpoint.

He says that Amazon has 5 clouds. AWS is a cloud provider.

RightScale has ServerTemplates (which act like recipes). They don't like cloud images (because they're too fixed).

They have an integrated approach that puts together all the parts needed to build a single server or set of servers.

ServerTemplates are a server's DNA.

DR is disaster recovery. HA is high availability.

Implementing HA best practices is always about balancing cost, complexity, and risk.

You need to automate your infrastructure.

Always place at least one of each component (load balancers, app servers, databases) in at least two AZs (autonomous zones?).

You need alerting and monitoring.

Use stateless apps.

It's critical to be able to programmatically switch DNS.

Use HAProxy, Zeus, etc. for load balancing.

Scale up and down conservatively. Don't bring up 1000 instances because of 1 minute's demand.

Snapshot your EBS volumes.

Consider a NoSQL solution.

They use Cassandra in multiple regions, and it works well for them.

EBS snapshots can't cross regions.

There is no one size fits all solution.

The most common approaches they see involve multi-AZ configurations with a solid DR plan.

Amazon.com's Journey to the Cloud

Retail is everything at Amazon that isn't AWS. Retail is a customer of AWS.

This talk summarizes the history of Amazon retail from 1995 to 2011. I think the switch to AWS was in 2011.

They initially ran on a single box. They housed the server in their main corporate offices. There was a "water event" that caused them to move to a datacenter.

In 2001, they started switching from Digital Tru64 to Linux.

In 2004, they had 50 million lines of C++ code.

AWS started in 2006. S3 came first, and then came EC2. AWS actually wasn't built to solve problems for the retail side. It was always built to be general purpose, and the retail side was responsible for figuring out how to use it.

IMDb is an Amazon owned subsidiary. It's very separate of the rest of Amazon.

Amazon has strict runtime latency and scale requirements. If your widget can't meet them, your widget won't get shown.

Retail uses VPC. VPC makes AWS look just like your own datacenter.

November 10, 2010 is when they turned off the last frontend server not using AWS. All traffic for amazon.com is now served from AWS.

They've had billions of orders over the lifetime of amazon.com.

Amazon has taken really old orders out of the database and moved them into S3.

They use Oracle.

Be sure to consider compliance issues.

When moving to the cloud, start with simple applications (or simple parts of larger applications).

Iterate toward your desired end-state.

The cloud can't cover up sloppy engineering.

No comments: