Saturday, July 23, 2011

Go: A Second Impression

My initial impressions of Go have always been somewhat negative because I was comparing it to things like Scala, Haskell, and Python. However, having read a good part of the tutorial, I've changed my opinion. I really like Go!

Go doesn't try to be an extension of C to make it more like Smalltalk--that's Objective C. Go doesn't try to be the end-all-be-all of object-oriented languages derived from C--that's Scala.

Since it neither maintains backwards compatibility with C nor adds a ton of features, I originally had a hard time getting excited about Go. Now I see that Go is a modern language that tries to follow what I think of as C's philosophy. It's simple, elegant, small, and native.

There's an old saying:
Do not seek to follow in the footsteps of the wise men of old. Seek what they sought.

- Matsuo Munefusa (”Basho”)
I think that perfectly describes Go.

Saturday, July 16, 2011

Humor: Proving Programs

I've always been weary of programming proofs.

For instance, I can mathematically prove that 4195835 / 3145727 > 1.3338. However, I know of certain Pentium processors that would disagree with me. If I try to prove that the following bit of C code prints out "Hello World":
if (4195835.0 / 3145727.0 > 1.3338)
printf("Hello World\n");
system("rm -rf /");
I might be a bit surprised when it deletes my hard drive instead ;)

1 bunny + 1 bunny = 2 bunnies, right? Well it depends on their sexes. It's possible that in a given time period, 1 bunny + 1 bunny might equal 5 bunnies. As I joked in a previous blog post, "All models are wrong. Some models are useful."

I really think this same thing applies to proving programs. Donald Knuth famously said, "Beware of bugs in the above code; I have only proved it correct, not tried it."

Books: Masterminds of Programming

I just finished reading Masterminds of Programming: Conversations with the Creators of Major Programming Languages
Masterminds of Programming features exclusive interviews with the creators of several historic and highly influential programming languages. In this unique collection, you'll learn about the processes that led to specific design decisions, including the goals they had in mind, the trade-offs they had to make, and how their experiences have left an impact on programming today.
In short, I really enjoyed it. Here's an extremely abbreviated and opinionated summary:

Adin D. Falkoff (APL) made programming as mathematical as possible.

Thomas E. Kurtz (BASIC) was generally a nice guy who wanted to bring programming to the masses.

Charles H. Moore (FORTH) frustrated the heck out of me. He stated that operating systems are the software industry's biggest con job. I disagree. Operating systems protect me to some degree from bad and malicious code. They also let me run multiple programs at the same time and allow me to keep running even when one of the programs crashes. He also said that a piece of code written in any other programming language will be 10 times as large (in number of lines of code) as the same code written in Forth. I'd like to see him try that trick with Python!

Robin Milner (ML) was completely fascinated with programming models and proving the correctness of code. That reminds me of the quote, "All models are wrong. Some models are useful."

Donald D. Chamberlin (SQL) showed me some of the history of SQL. I didn't know IBM research was such an interesting place.

Alfred Aho, Peter Weinberger, and Brian Kernighan (AWK) were as good as I expected.

Charles Geschke and John Warnock (PostScript) talked about Adobe and the history of PostScript. I just don't like that Charles guy, and I don't like Adobe. However, they're smart guys.

Bjarne Stroustrup (C++) was as frustrating as I expected.

Bertrand Meyer (Eiffel) was really interesting. He wrote a book in French that has had a profound impact on French programmers. If he had translated that book into English, it's likely he'd be as famous as, say, Richard Stevens (the author of "UNIX Network Programming").

Brad Cox and Tom Love (Objective-C) showed me that Objective-C's goal was to enhance C in the smallest way possible to make it a bit more like Smalltalk.

Larry Wall (Perl) was awesome, as usual.

Simon Peyton Jones, Paul Hudak, Philip Wadler, and John Hughes (Haskell) were fascinating, as usual.

Guido van Rossum (Python) was practical and interesting, as usual.

Luiz Henrique de Figueiredo and Roberto Ierusalimschy (Lua) were okay. (I'm a Python guy, so it's a bit hard for me to get excited about Lua.)

James Gosling (Java) appears to suffer from premature optimization.

Grady Booch, Ivar Jacobson, and James Rumbaugh (UML) left me even less interested in learning UML.

Anders Hejlsberg (Turbo Pascal, Delphi, C#) was awesome. I knew I liked this guy from previous books, but this interview made me like him even more.

Overall, I think this book was a bit drier than, say, Coders at Work: Reflections on the Craft of Programming, so you should read that one first. However, if you're a guy like me who loves programming languages, this book is a must read.

Friday, July 15, 2011

Books: Python 3 Web Development Beginner’s Guide

Packt Publishing asked me to review Python 3 Web Development Beginner's Guide. I'll have to admit, it's a bit of an odd duck. A better (albeit overly verbose) title might have been "An Introduction to Rich Internet Application Development Using jQuery UI, a Very Modern Version of Python, a Relatively Old Python Web Application Framework Named CherryPy, and an Ancient Version of HTML Written by a Guy Who Uses Windows".

The first tipoff that this book was a bit strange was that the author uses Windows and some combination of Firefox and IE. It seems like most web developers use OS X (or occasionally Linux), and they prefer Chrome over IE.

The next tipoff was the use of jQuery UI. jQuery UI is a very modern technology which is often used to build rich internet applications. RIAs really aren't the sort of thing that I would expect to see in a book for beginners. What happened to the old days when beginning web applications focused on the server dynamically generating HTML? If I took the time to count the number of lines of code, I wouldn't be surprised if this book had more JavaScript than Python.

The title of this book mentions Python 3, but if you search for "Python 3" in the book, there are extremely few mentions of it. This book really isn't about Python 3 per se (as compared to Python 2); it has a lot more to do with jQuery UI.

Whereas Python 3 and jQuery UI are very modern technologies, standing in contrast is the book's use of HTML 4 and CherryPy. HTML 4 is an *ancient* version of HTML. I would expect anyone using jQuery UI to use either XHTML or HTML5. At the very least, I would have expected one of the transitional DTDs. Similarly, he uses CherryPy. Although I agree that CherryPy is solid code, it's also fairly old. It predates any of the modern Python frameworks.

This book claims to teach web development "without having to learn another web framework" [p. 1]. That's simply not true. It makes heavy use of CherryPy. The home page for CherryPy calls it an "HTTP framework" and says that it has "everything you would expect from a decent web framework." It's not as full-featured as, say, Django, but parts in the example code such as "@cherrypy.expose" [p. 36] are certainly framework features. In fact, "@cherrypy.expose" is part of CherryPy's object publishing system, which it uses as a replacement for regex-based URL routing.

Another thing that's a bit strange about this book is that the author doesn't use a client-side or a server-side templating language. In JavaScript, he tends to use string concatenation, which is weird because there is a templating plugin for jQuery. On the server, he embeds HTML directly in the Python code, which is pretty ugly (as he mentions on p. 229).

Furthermore, the code is extremely sloppy. The code does not follow Python's style guide concerning whitespace (PEP-8) (see, for example, p. 145) even though PEP-8 is extremely standard in the Python community. I don't know of anyone who puts a space before the colon in expressions such as "if not isinstance(name,str) :" [p. 146]. Nor is it even self consistent. The indentation in the JavaScript is not only non-standard and inconsistent, it's occasionally completely wrong [p. 118] (i.e. the indentation disagrees with the braces).

Aside from bad style, I'm a little concerned about various coding practices. For instance, the JavaScript at the bottom of p. 40 has variables that don't use var. This means they're effectively global. This is extremely bad practice. Fortunately, he does use var in other places in the book.

On the subject of security, there are several standard security vulnerabilities that web applications must protect against: XSS (cross-site scripting vulnerabilities), SQL injection attacks, XSRF (cross-site request forgeries), and session fixation (or session hijacking) attacks. Every book on web development should cover these.

The book mentions XSS, but I fear it's approach may not be thorough enough. It does not mention the term "SQL injection" attack, but the ORM shown in the book does look to be somewhat safe. It mentions XSRF, but says that it's out of scope. It doesn't mention "session fixation" or "session hijacking" at all. In general, I don't think the book is good enough about "escaping things" properly. For instance, on p. 293 the author creates a URL in JavaScript using values from a form, but he doesn't take care to URL encode the parameters.

Despite all of the above, I can say this about the book. The author does a good job explaining the web to beginners. Modern web applications are fairly complicated beasts. There's the client, the web server, and the database server, and they each require their own syntaxes. The author does a decent job explaining what runs where. It can be difficult for an expert web developer, such as myself, to remember that newbies might not know all these things.

In summary, will this book help you become a competent, professional web developer? Absolutely not. Is it as well written as, say, Agile Web Development with Rails. No. However, might it be a good way for a beginner to dip his toes in web development with Python and jQuery UI? Maybe.

(Disclaimers: Packt gave me a free electronic copy of this book in trade for my review. I have not read the whole thing. I did read the first 50 pages and skimmed various key sections.)

Thursday, July 07, 2011

Google I/O 2011

I went to Google I/O May 10-12, 2011. Note that the first day of Google I/O was my second day as a Google employee. These are my personal notes. If you're interested in any of these talks, you can find them on YouTube.


One of the slides during the keynote showed a picture of an android eating an apple.

There were 5000 people in the room for the keynote.

There were 110 cities watching at viewer parties.

Android has come a long way: 100 million activations, 450,000 developers, 215 carriers, 310 devices, 112 countries, 400,000 activations per day, 200,000 apps in the Android market, 4.5 billion apps installed.

Android 3.1 is Honeycomb.

Android can now act as a USB host. Hence, you can now get a keyboard and a mouse for your tablet.

You can use an X-Box controller with it.

Android 3.1 will be used for Google TV.

Android market will be available on Google TV.

Ice Cream Sandwich will be the new Android release.

The hope is for one Android OS everywhere.

It will be open source.

They have Android software that recognizes where you're looking and can recognize your head movement too.

There are books on Android market.

There are movies on Android market. They're not tied to any specific device. They're just tied to your Google account.

You can "pin" a movie in order to download it to your device.

Verizon seems to have the coolest new Android stuff.

Google has a new music service in beta.

If you add music to your account, you can listen to that music from any of device.

There are Windows and Mac clients.

There's a web-based music manager.

It's completely synced on all devices.

There's an "instant mix" feature based on machine learning.

The music is kept in the cloud. There's no need for syncing.

You can cache music recently played.

You can make stuff available instantly.

The Music application is initially by invitation only.

It's free while in beta.

You can upload 20,000 songs.

You need Android 2.2 or higher for the Android app.

There's going to be an alliance to determine how quickly Android devices should be updated and how long they should be maintained.

Sprint is in the Alliance. So is Verizon. I'm a little peeved at Sprint because they won't upgrade my Samsung Moment beyond Android 2.1, and I need at least 2.2 for some of the apps I'm interested in.

Android open-accessory is a standard accessory platform. It has hardware and software components.

There was a demo of an exercise bike plugging into a phone.

The accessory development kit (ADK) is based on Arduino.

There was a demo of a full-sized (i.e. man-sized) labyrinth board controlled by tilting a tablet.

The ADK doesn't have any NDAs or fees. It's completely open.

Android @ Home is an OS for your home.

There's an Android @ Home framework. It has its own wireless protocol. It has very low cost connectivity.

He said to envision a real world Farmville ;)

He showed a demo of the lights being connected to a game of Quake.

LightingScience has prototype lightbulbs.

Project Tungsten is a hub for your home. It runs Android @ Home.

They showed a prototype where you can just touch a CD to the hub and it starts streaming music from the cloud.


They gave away free tablets!!!

There wasn't any mention of GAE. It was mostly about Android.

Life of a Google API Developer

They were showing off Google API explorer, which looks pretty neat.

There's a list of APIs. Try them.

There are lots of APIs.


The API explorer is much nicer than using curl.

They use OAuth2 with delegated auth.

The Google API Discovery Service is an API to serve docs about APIs.

There are generic client libraries for the Google stuff for Java, Python, PHP, .NET, Ruby, etc.

OAuth2 is complicated, but they have nice helper libraries.

They're using Mercurial on

Daryl Spitzer said Go is amazingly succinct.

Passwords are like underwear. Keep them secret. Change them often.

The APIs have quotas. However, they're reasonable for moderate apps.

Google is introducting some APIs that you have to pay for.

5000 tablets were given to users for free a month before the actual release of the tablets.

The talks for Google I/O are on YouTube.

Go for Web Apps

Rob Pike (of UNIX fame) gave part of the talk!

Go is fun, efficient, and open source.

Go has a web server.

It's natively compiled.

It has gdb support.

It simplifies some stuff.

It was originally intended for systems code like building web servers, but it's now more general.

It has 1st class functions.

It has low level types, like float32.

I don't understand its replacement for exceptions. [Several weeks later, I read an article on Defer, Panic, and Recover.]

goinstall is its package manager.

It has closures.

It is statically typed.

Google API clients are coming for Go.

Go now has support for GAE! The apps are compiled in the cloud.

Go is the first native language on GAE.

Go is not theoretically interesting. It's just really useful.

Go's library support is improving quickly.

It has garbage collection and full control over how memory is laid out.

Programming Well With Others

The format for of this talk was really entertaining. In fact, it was my favorite talk. They pretended to do a radio show called "The Ben and Fitz Show".

There are people who are friendly but steal too much time and attention.

Perfectionists suck energy out of a project.

"You are not your code."

Python at Google

This talk was by Wesley Chun.

io2011 is the tag to use for Google I/O this year.

Python just turned 20.

C++ is Google's primary language.

Python was used at Google before Google really existed. It was used in the original crawler.

Java came later.

Google has a Python training class.

YouTube gets 2 billion views/day.

There are 35 hours of video uploaded to YouTube every minute.

Welcome to Computer Programming For Kids (CP4K) looks like a fun book.

HTML5 Showcase

This talk was amazing.

Here's the video.

Here's some code.

HTML5 can handle binary data.

There's a file uploader that can upload multiple files or even a whole directory. You can drag and drop images to upload them.

The talk was overwhelmingly awesome!

There's 3D support in the browser.

There's WebGL 3D. It runs on the GPU.

They showed a 3D file browser tied to an in-browser command line.

They showed real-time audio processing in JavaScript.

They showed CSS that produces 3D effects declaratively.

The three parts of this talk were about files, graphing, and audio.

YouTube APIs iframe Player

Using the iframe player simplifies things.

HTML5 doesn't have fine tuned video support yet: there's a content protection problem; there's no full-sceen HD (except Webkit); there's no microphone and camera access; there's a format problem.

However: HTML5 has better accessability; there's mobile support; it has faster start times; it's more reliable.

HTML5 doesn't support all the features used by YouTube.

Custom AS3 players won't work on iOS, of course.

There's a YT object you can play with on the Chrome console.

A guy named Greg wrote the HTML5 player.

WebM is a better encoding.

The performance of HTML5 video on older hardware is not necessarily better than Flash. (That's disappoints me because I know Flash can be a hog. I know Quicktime ran pretty well on older Mac hardware that can't keep up with watching modern movies via Flash.)

Fireside for Developer Relations

Mike Wenton leads developer relations.

There are lots of GTUGs (Google technology users groups).

The public forums need more Google attention, especially for GAE.

Google needs to reach out to unconverted fans at business conventions.

Android's scale of growth has been mind blowing.

There are a bunch of programmers who are nuts about Google.

There were lots of developer advocates in the crowd.

Closure Talk

The speaker was the one who did did the closure book. He worked on Google calendar. He's no longer at Google.

He defined a large project to be 30k+ lines of code, 4+ people, and 6+ months.

How should you code a RIA like Gmail and Calendar? Use GWT? Write a ton of JS? Calendar, Gmail, Maps, Blogger, Docs, and Reader were all pure JavaScript.

Closure has many parts. There is the the Closure library, Closure templates, and the Closure compiler (which also does static checking). They're all independent. It even makes sense to use the Closure compiler with jQuery.

Code written using the Prototype library is not easy for people to read unless they already know Prototype.

This code always returns undefined because of ";" insertion:
function f() {
Too many script includes in the head leads to slowness.

Create a special version for mobile. However, this can lead to ugly forking.

The Closure compiler can compile templates + your JavaScript + the Closure library iteself into optimized JavaScript.

The compiler is a whole program optimizing compiler.

The templating language escapes HTML by default.

The closure library can compress an amazing amount. In advanced mode, it's incredible.

YUI compressor can compress JavaScript to 27% of original size. However, the Closure compiler running in advanced mode can compile JavaScript to 0.5% of its original size.

It does things like strip functions that are never called, which is useful if you're using a library like jQuery.

Don't create a mobile version of your JavaScript library. Use 1 version, and let the Closure compiler strip stuff not needed by defining which user agent you can assume. Hence, you can compile multiple different versions of the same library.

The compiler can statically enforce type hints that you give in docstring comments.

Closure tools help manage and optimize large JavaScript apps.

Wednesday, July 06, 2011

Software Engineering: Code Reviews vs. Peer Programming

I've been thinking lately about the benefits of code reviews vs. peer programming. In general, I think most companies that really care about code quality use either code reviews or peer programming. Various companies are famous for using either one approach or the other, which leads me to wonder which approach is better under which circumstances?

Pivotal Labs is famous for doing full-time peer programming. It's well known that they do a good job writing software. However, I wonder if full-time peer programming might be too expensive for mundane code. I also wonder if it makes sense to work as a pair when someone needs to spend a few days reading, learning, and researching.

In contrast, Google is famous for code reviewing all commits before checkin. Certainly this frees up engineers to get more work done since they can spend a high percentage of their time working in parallel on separate tasks. However, I wonder if a code reviewer really has the ability to make the same level of architectural improvements as a peer programmer. Certainly a code reviewer can catch style mistakes, but it's much harder to tell someone their entire approach is wrong (for instance, threaded code vs. asynchronous code).

Furthermore, I wonder if code reviewers in general can catch all the little assumptions that get built into code. It reminds me of a story my boss once told me. A few decades ago, he was working on satellite control software. There was a piece of code that made it through three levels of code review even thought it contained a bug. It was missing a minus sign in some equations having to do with navigation. When the satellite was launched, it started spinning because it kept "thinking" it was upside down. It wasted half of its fuel before they could get the problem under control. My boss said that for satellites, the lifespan of a project is directly connected to the amount of fuel onboard. Hence, this came to be known as the three million dollar minus sign.

If this code had been peer programmed rather than code reviewed (or in addition to code review), would the peer have spotted the problem as the equation was being worked out? Certainly this taught me a valuable lesson about code review. It's far too easy to get hung up on less critical issues that are easy to spot, like style, instead of focusing on more important issues that require more brain power to understand.

In the book Professional Software Development: Shorter Schedules, Higher Quality Products, More Successful Projects, Enhanced Careers (see my blog post), Steve McConnell said that NASA found that the single most effective way to cut down on defects was to always have a second pair of eyes present. They were talking about building the shuttle, but I think the same thing applies to software.

Despite their pervasive use of code reviews or peer programming, Google and Pivotal Labs have both had their fair share of bugs. Even NASA makes mistakes. Hence, it's easy to see that neither code reviews nor peer programming can banish all defects. Given how fallible human beings are, it seems to me that the best way to keep people from dying because of mistakes is to avoid situations where mistakes are fatal. Driving is a fairly dangerous activity, and tens of thousands of people die each year in the United States because of driving errors. However, even more than that make mistakes and yet survive because of various safety precautions.

So even though I think code reviews and/or peer programming are important factors in writing high quality software, I think it's even better to avoid situations where software defects can cause serious damage. I certainly think that the more mission critical a piece of software is, the smaller and stupider it should be.

Tuesday, July 05, 2011

Amazon Web Services Summit 2011: Navigate the Cloud

On June 21, 2011 I went to an Amazon Web Services conference in San Francisco. These are my notes.

In general, the conference was informative, but a bit subdued. It was about a quarter of the size of Google IO, and they weren't giving away anything huge for free. Furthermore, it was only one day, and the size of the crowd was much smaller. However, that makes sense since they have multiple of these conferences per year. Nonetheless, it was interesting, and I learned some stuff.


The part of the keynote was given by Dr. Werner Vogels, the CTO.

AWS doesn't lock you into an OS or language.

There are 339 billion objects in S3, and S3 handles 200,000+ storage transactions per second.

On a daily basis, they add as much capacity as they had in total in

Spot pricing allows you to spin up an instance when the price is right.

AWS CloudFormation deploys a stack of software.

AWS Elastic Beanstalk is easy to begin and impossible to outgrow.

CloudFormation and Elastic Beanstalk don't cost anything extra.

They now have Simple Email Service.

They're building features by working from the customer backwards. They start by writing the press release, then the FAQ, next the documentation, and then finally the code.

They mentioned that Amazon RDS (Relational Database Service) is not as scalable as S3.

AWS is now a PCI level 1 service provider. (i.e. the retail business) is moving to AWS.

They mentioned Amazon VPC (Virtual Private Cloud).
There have been more than 2 million downloads of Sketchbook Mobile.

Homestyler is the fast, easy, free way to design your dream home.

Autodesk uses EC2, S3, EBS, and CloudFront.

Homestyler can now do photo realistic renderings of your home.

They mentioned project Nitrous. It provides a rich, in-browser experience. It's like Google Docs but for your 3D modeling documents.

They're doing stuff we're they're rendering in the cloud and then streaming to an iPad.
They handle customer data in real-time (i.e. with 100ms latencies) on AWS.

They have a primarily ex-Google team.

They provide customized ads based on user behavior.

They have some serious latency requirements. They had to hit 120ms end-to-end. They worked closely with AWS to meet their needs.

Their ad performance is incredible. They have a 7.5% click through rate.
This part of the keynote was given by Mitch Nelson, Director, Managed Systems.

LiveCycle is a managed service.

They talked about the Flash Media Server.

Adobe Connect is a managed service. It's an online meeting product. It's based on Flash.
Cloud Computing at NASA
This part of the keynote was about the JPL (Jet Propulsion Laboratory).

They wanted to augment availability in multiple geographic regions.

Some things can be safer in the cloud.

They're using the VPC (Virtual Private Cloud). They use IPSec.

They've saved a lot of money by adopting cloud computing.

Buying infrastructure way ahead of time is expensive for NASA. Cloud computing saves NASA a lot of money by allowing them to scale to meet their needs.

He talked a lot about the Mars Rover.

They spin up a bunch of instances to handle bursty traffic, processing data from the Mars Rover, which sends data once a day (I think).

The speaker said that the "missions" are his "customers".

These projects are using cloud computing: Mars Rovers, Deep Space Network, Lunar mapping
product, and airborn missions. None of them look like they involve astronauts.
SmugMug is profitable. They have no debt. They're a top 250 website. It's a private company. They had $10M+ in yearly revenues in '07.

They provide unlimited storage for storing photos and unlimited bandwidth for uploading and download photos. They're motto is "More, better, faster" (in comparison to other photo sites).

They can handle photos up to 48 megapixels in size.

They can handle 1920x1080p video.

They use a hybrid of AWS and their own datacenter. They have 5 datacenters. They're moving completely to AWS. They have many petabytes of data stored in S3.

Amazon really listens to their customers, trying to solve their needs.

One time, SmugMug's RubberBand project tried to spin up 1920 cores in a single API call. Amazon spun them all up as requested. SmugMug renamed that project SkyNet.

SmugMug lets customers tie their Amazon and SmugMug accounts together so that SmugMug can pass on the cost of storing raw photos and videos onto their customers (since they're so expensive to store).

SmugMug is designed for breakage, so there was minimal impact during the "Amazonpocalypse". They made use of multiple availability zones.

EBS is just like a hard disk. If you want local redundancy, use RAID. Want GEO redundancy? Replicate. EBS can/does fail just like HDDs. EBS does solve lots of problems, but use it in the appropriate way.
Customer Panel
Use Akami for dynamic generation of content, and use CloudFront for more static content.

Dealing with Akami is a pain in the butt.

The customers on the panel made it through the Amazonpocolypse fairly well. However, they had to architect for it.

The elasticity of Amazon's services is what saves the most money. Customers only pay for resources when they use them. The customers on the panel were pretty good about spinning up and spinning down instances. Also, the customers said that they can't build their own data centers fast enough to meet their own needs.

TellApart uses Spot instances to run Hadoop jobs. That means they only spin up instances when the price is low enough.

Without AWS, a lot of the customers would never have been able to attempt certain projects.

Akami hasn't broken for SmugMug. HAProxy also hasn't broken for them.

AutoDesk is using Scala.

TellApart uses GAE (Google App Engine) for its own dashboards.

The NASA guy uses CloudFormation to setup templates for their clusters.

Amazon often builds infrastructure that their customers have already built so that all the other new customers don't have to rebuild that same infrastructure.

RDS has very unpredictable latency. None of the panel customers were using it.

Security and Compliance Overview

This talk was given by Matt Tavis, Principal Solutions Architect.

Here's a link to the AWS Security Center.

Security on AWS is based on a shared responsibility model.

AWS is responsible for facilities, physical security, network infrastructure, etc.

The customer is responsible for the OS, application, security groups, firewalls, network configuration, and accounts.

AWS has complex features for who has access to interact with which AWS APIs if you have multiple people at the same company.

AWS favors replication over backup.

I wonder if you can mess around with EC2 or EBS and try to read the raw disk to see if there is stuff left over from a previous user.

By default, there's a mandatory inbound firewall, which defaults to deny.

VPC is a networking feature.

High Availability in the Cloud: Architectural Best Practices

This talk was by Josh Fraser, the VP of business development at RightScale.

RightScale is the world's #1 cloud management system.

RightScale serves 40,000 users and 2.5 million servers.

RightScale is a SaaS.

RightScale is the layer between the app and the cloud provider.

You need to design for failure.

Backup and replication need to be taken seriously.

A cloud is a physical datacenter entity behind an API endpoint.

He says that Amazon has 5 clouds. AWS is a cloud provider.

RightScale has ServerTemplates (which act like recipes). They don't like cloud images (because they're too fixed).

They have an integrated approach that puts together all the parts needed to build a single server or set of servers.

ServerTemplates are a server's DNA.

DR is disaster recovery. HA is high availability.

Implementing HA best practices is always about balancing cost, complexity, and risk.

You need to automate your infrastructure.

Always place at least one of each component (load balancers, app servers, databases) in at least two AZs (autonomous zones?).

You need alerting and monitoring.

Use stateless apps.

It's critical to be able to programmatically switch DNS.

Use HAProxy, Zeus, etc. for load balancing.

Scale up and down conservatively. Don't bring up 1000 instances because of 1 minute's demand.

Snapshot your EBS volumes.

Consider a NoSQL solution.

They use Cassandra in multiple regions, and it works well for them.

EBS snapshots can't cross regions.

There is no one size fits all solution.

The most common approaches they see involve multi-AZ configurations with a solid DR plan.'s Journey to the Cloud

Retail is everything at Amazon that isn't AWS. Retail is a customer of AWS.

This talk summarizes the history of Amazon retail from 1995 to 2011. I think the switch to AWS was in 2011.

They initially ran on a single box. They housed the server in their main corporate offices. There was a "water event" that caused them to move to a datacenter.

In 2001, they started switching from Digital Tru64 to Linux.

In 2004, they had 50 million lines of C++ code.

AWS started in 2006. S3 came first, and then came EC2. AWS actually wasn't built to solve problems for the retail side. It was always built to be general purpose, and the retail side was responsible for figuring out how to use it.

IMDb is an Amazon owned subsidiary. It's very separate of the rest of Amazon.

Amazon has strict runtime latency and scale requirements. If your widget can't meet them, your widget won't get shown.

Retail uses VPC. VPC makes AWS look just like your own datacenter.

November 10, 2010 is when they turned off the last frontend server not using AWS. All traffic for is now served from AWS.

They've had billions of orders over the lifetime of

Amazon has taken really old orders out of the database and moved them into S3.

They use Oracle.

Be sure to consider compliance issues.

When moving to the cloud, start with simple applications (or simple parts of larger applications).

Iterate toward your desired end-state.

The cloud can't cover up sloppy engineering.


I just wrote my first blog post for the YouTube API blog: Introducing is a platform for third-party applications that enable users to create videos. The idea is simple. The third-party application runs in an HTML iframe on YouTube. The user creates a video with the application, and then the application uploads the video to YouTube for the user to watch and share.

To read more, check out the story on the YouTube API blog.