Skip to main content

Web Performance Short Course

I went to a tutorial on web performance at HTML5DevConf. These are my notes:

Daniel Austin was the instructor.

He worked down the hallway from Vint Cerf when he was creating the world wide web, and he was the manager of the team at Yahoo that created frontend performance as a discipline. He was the manager of the guy who created YSlow. A lot of the books on web performance are from people he used to manage. He was the "chief architect of performance" at Yahoo.

He's writing a book called "Web Performance: The Definitive Guide".

He started by asking us how many hops it took to get to Google (per traceroute).

He had us install the HTTP/2 and SPDY indicator Chrome extension.

He's given this class at this conference 5 years in a row. It's changed dramatically over the years.

This is only a class on the basics.

The most important key to understanding performance problems is to understand how the web works and work within those constraints.

Understand what's going on under the covers, and identify the problems. That's half the battle.

There are lots of tools.

We're always focused on the end user's point of view (the "user narrative").

This is both an art and a science.

Most of the people doing web performance now started at Yahoo.

He didn't think the first book on web performance was very good.

All of his slides are on SlideShare.

Capacity planning and performance are opposite sides of the same coin.

Most performance problems are actually capacity problems.


  • spreadsheets
  • your browser's developer tools
  • YSlow
  • netmon
  • dig
  • ping
  • curl
  • Fiddler
  • there are a bunch of mobile tools

The site is fast enough when it's faster than the user is.

Theme: Ultimately, performance is about respect. He thinks Google is just making stuff up when it says that slower responses result in X amount of lost dollars. He thinks it's really just about respect.

He seems to have an anti-Google bias ;) He even asked who in the class was a Googler.

Section I: What is Performance?

It's all about response time!

Latency is about packets on a wire. Humans experience response time, not latency.

The goal: "World-class response times compared to our competitors."

We want reliable, predictable performance.

It must be efficient and scalable.

We want to delight our users.

Performance is a balancing act.

Security vs. performance is a common tradeoff.

Section II: Performance Basics

Statistics 101:
  • sort
  • mean
  • median
  • mode
  • variance
  • standard deviation
  • coefficient of variation (variance)
  • minimum
  • maximum
  • range

He compared the mean, median, and mode. The mean is rarely used in performance work.

The median is the number in the middle. We use that more often than the mean.

The mode is the most frequent number in some set.

Performance data is full of outliers. The outliers disturb the mean which is why we can't use it.

Pay close attention that you're talking about the median, not the mean.

If the mean and the median are more than one standard deviation apart, then the data is wrong because that's not possible.

The standard deviation is the average distance between a point and the mean. It's a measure of how scattered the data is.

Performance is vastly different between Asia, the EU, and the US. It's a network infrastructure issue.

The margin of error is a measure of how close the results are likely to be.

The more data you have, the lower the margin of error.

You need 384 data points to get a 5% margin of error.

You need to gather a considerable amount of data to be confident in your analysis.

"5 Number Reports" consist of:

  • median
  • 1st quartile
  • 3rd quartile
  • minimum
  • maximum

The typical performance curve:

  • No users get response times less than a certain amount.
  • Most people get response times somewhere in the middle.
  • There's a long tail of people getting much longer response times.
  • Sometimes they even time out.

You know you have a problem if:

  • A lot of people are getting bad response times.
  • A lot of people are timing out.
  • There's a second hump in the curve for the people getting slower response times.

Curl is our favorite browser! ;)

curl -o /dev/null -s -w %{time_total}\\n

Run it 10 times in a row, put the numbers in a spreadsheet, and calculate a 5 Number Report.

The results are really messy!

Curl is way more widely used than you might think. It's even used in production at very large companies.

I don't think I got this completely right:

0.039 min
0.045 1st quartile
0.05 median
0.107 3rd quartile
1.233 max

25% of the data points are in each of the quartiles.

In the performance world, there's usually a big difference between the mean and the median.

You can look at Wikipedia to get the exact formulas for these things.

Across the class, we did 100 measurements, and we had a huge range.

There's a significant amount of variation on the Internet in general. "The web is subject to very high variance."

dRt / dt = crazy ratio = the derivative of the response time

Anytime your slope is greater than 0.5, then it's crazy. It's possible that your connection is bad.

We have to figure out: Is the DNS slow? Is the SSL slow? Are the servers slow?

The first thing you want to do is calculate the crazy ratio.


  • Min = MIN(Data Range)
  • Q1 = QUARTILE(Data Range, 1)
  • Q2 = QUARTILE(Data Range, 2)
  • Q3 = QUARTILE(Data Range, 3)
  • Max = MAX(Data Range)


  • RT <- ...="" 0="" 1="" c="" li="" numbers="">
  • fivenum(RT)

Operational research:

  • Supply chains
  • Utilization Law
  • Forced Flow Law
  • Little's Law
  • Response Time Law

You have to understand how queues work. Think of freeways.

Resources and queues:

  • Service time (Si)
  • Queue residence time (Ri)
  • Queue length (i)

In general, systems consist of many combined queues and resources.

Workload differentiation: different lanes for different speed vehicles.

The Utilization Law: Ui = Xi * Si

The utilization (Ui) of resource i is the fraction of time that the resource is busy.

If you let your systems get to 95% load, you should be fired.

Xi: average throughput of queue i, i.e. the average number of requests that complete from queue i per unit of time.

Si: average service time of a request at queue i per visit to the resource.

The Interactive Response Time Law: R = (N/X)-Z

R = response time
N = number of users
X = number of requests/s
Z = time the user is thinking (think time)

This doesn't make sense to me because in my mind, the number of requests/second varies a lot based on the number of concurrent requests.

He suggested that you can always increase the number of requests/second by adding more capacity. However, my understanding is that it takes a lot of work to get to a horizontally scalable architecture, and that it's often the case that there is a bottleneck that throwing more servers at the problem can't
immediately solve.

Figure out if you have a capacity planning problem.

Capacity and performance are intimately related.

Often, your performance problems are really capacity problems.

Antipattern: keyhole optimization: optimizing your project at the expense of everyone else.

Section III: The MPPC Model

Dimensions of performance:

Network location
Transport type
Browser/device type:
RT varies by as much as 50%
Page composition:
Client-side rendering and execution effects
Network transport effects:
Number of connections; CDN use

You have to test on multiple types of devices.

Take some crap off of your page to make it faster.

CSS used to be benign in terms of performance. That's now no longer true. CSS can cause performance issues.

He's big on CDNs.

He talked about how hardware and routing work. It was a pretty complex slide.

The backbone is about as good as it can get. It's the last mile that is the problem.

He talked about the OSI Stack model.

Microsoft invented ethernet type 2.

MTU = 1500 bytes = maximum transmission unit

MSS = 1460 bytes = maxiumum segment size = the size of data in the packet

20 bytes for IP, 20 bytes for TCP.

SSL is a good example of the session layer.

He said HTTP is layer 7, application.

We care about:

  • IP (layer 4)
  • SSL (layer 5)
  • HTTP (layer 7)

OSI = Open Stack Interchange

HTTP connection flow:
  • Make a TCP connection
  • Send a request
  • Get the response

HTTP is a request/response protocol.

MPPC = Multiple Parallel Persistent Connections

He wrote the original paper on this model.

To calculate the end-to-end time, you can use the given equation: E2E = T1 + T2 + T3 + T4

  • T1 = network connection:
    • T1 = T(DNS) + T(TCP) + T(SSL)
  • T2 = server duration = time it takes the server to respond
  • T3 = network transport
  • T4 = client processing = process the response, display the result, plus the user's think time

For Facebook, it's usually T3 (network transport) that takes the longest, whereas most developers are almost entirely focused on T2 (server duration).

Don't go chasing after T2 too quickly. Figure out all of the Ts.

He thinks Microsoft's browser, Edge, is perfectly fine.

There are two types of hyperlinks on the web:

  1. Transitive hyperlinks: The ones you click on.
  2. Intransitive hyperlinks: The ones that browsers clicks on for you (images, JS, CSS, etc.).

He said that the number of intransitive hyperlinks is way more than the number of transitive hyperlinks. [I did some tests on a bunch of sites, and that turns out to often not be true.]

95% of the bytes are from intransitive hyperlinks (images, JS, CSS, etc.).

DNS is typically a larger part of the E2E than expected.

TCP is highly variable.

SSL is slow!

T1 might be bigger than you think. For PayPal, T1 accounts for 40% of their E2E.

Nothing happens and the user doesn't see anything before DNS.

Google runs their own DNS servers to improve response times. It makes it more reliable and predictable.

He had us install Dyn Dig on our phones.

Using Dyn Dig on my iPhone, it took 40 msec to resolve

Using dig, it took 175 msec to resolve

He worked on It's the only single letter domain that you can sign up for your own email address.

Among all the people in the class, there was very high variance in the DNS response times. There was a factor of 10 difference. A factor of 3 is more common.

A DNS lookup anywhere on earth should take less than 500 ms.

It shouldn't take you longer than 10ms to get to your ISP.

Mobile DNS lookup times are all over the map.

For popular sites, DNS lookups are fairly constant because it only involves talking to your ISP.

It takes 14 steps to make an SSL connection.

He said that wherever he said SSL, he really meant TLS.

SSL takes up the lion's share of T1.

If you're using SSL, it is likely the biggest thing in T1-T4.

EV certificates = extended value certificates

EV certificates take twice as long. It's a 2048 bit key.

Banks use EV certificates.

When they're used, there's a nice green bar in your browser.

"The current internet is overly reliant on encryption and confuses encryption with security...Don't confuse being encrypted with being secure."

T2 - The Server Duration

He treats the server as a black box. He doesn't care what's inside it. He only cares about how long the server takes to respond to a request.

If there's a lot of variance in T2, it's a capacity problem.

We want servers to scale linearly with the number of users.

At some point, a server can't respond to more load in a linear way. Don't load your machines past the point where they go non-linear.

Typically, machines in production run at 70% utilization or less. 40% is actually pretty common.

You have to have enough capacity to account for machines going down.

T3 - TCP Transport Time

This is the part he likes the most since he's a network guy.

TCP is pretty predictable.

Remember that HTTP has evolved over time.

He said that HTTP/1.1 came out in 1998.

We got HTTP Keepalive in HTTP/1.1.

HTTP/2 became a standard on May 14, 2015.

Firefox will open up to 6 connections for each unique host on a page. IE will only open 2.

"There was no equation until your truly solved it...published in IEEE."

With HTTP/2, you make one connection, but then there's a bunch of streams within that one connection.

TCP is not very efficient for transferring small files.

The size distribution of objects on the internet peaks around 7k.

The Yahoo logo is always fairly small in file size. It only uses a single color.

T4 - What the Browser Does

He showed the waterfall of request times for

T4 is especially important for mobile devices. They have smaller processors, so they take longer to render pages.

The big guys have mobile versions of their sites that have less stuff on them.

Mobile devices often run JavaScript much more slowly than desktop devices. Part of this is because of how they do floating point arithmetic.

Bandwidth and latency have to do with the network.

More bandwidth is like having a wider hose.

Latency is like the length of the hose.

Adding bandwidth only helps up to about 5 Mbps.

Reducing latency helps linearly with reducing response times.

In the US, more than 90% of people have a 5 Mbps connection or better.

If the pipe is fixed, then put stuff in the pipe more efficiently.

He talked about packet loss.

He talked about the congestion window.

Every time TCP looses a packet, it cuts the bandwidth in half.

He really likes using equations with Greek characters to model things. He calls it "solving the equation".

On mobile, packet loss is typically 5-7%.

For any given user, their latency and bandwidth is fairly fixed.

Packet loss is a limiting factor for bandwidth.

Packet loss almost always happens because of overflowed buffers or failure to reassemble fragmented packets.

Antipattern: saying "that's outside my control."

It's never the case that there is nothing you can do about a performance problem.

Compensate in some other part of the E2E. Think outside the box.

Section IV: Tools and Testing

I didn't get his entire list of tools. Sorry.

  • YSlow
  • HTTPWatch (very good)
  • Your browser's development tools

You must gather data from lots of users. Performance work is statistical in nature.

Remember, we have a special position here in the valley. Think about people who don't have internet connections as good as ours.

Those tools aren't going to help you make the network faster in India. But, they can help you fix problems with page composition.

For instance, what things are being loaded? What things are blocking progress on your page?

There might be an ad making your page slow.

There are commercial performance services:

  • Gomez (Compuware)
  • Keynote
  • AlertSite
  • ThousandEyes

Gomez and Keynote are super expensive corporate tools.

New Relic is a less expensive tool to try to solve some of those problems.

Performance numbers are going to vary between the backbone and the last mile (of course).

gamma = last mile response time / back bone response time

His goal is to identify problems. How to solve them is another thing.

He's worked at a lot of the big dot coms.

RUM = real user measurements

Yahoo alone was responsible for 16% of JavaScript errors on the Internet. It was mostly because of ads.

Users were seeing response times that were 10X the response times on the backbone.

When he tests things, a test is a set of pages. A monitor is a set of tests.

He talked about YSlow. The rules were published by his team at Yahoo. People pay attention to the first 14 rules, but there were actually 105.

PageSpeed is from Google.

HTTPWatch is the commercial software.

UNIX tools: ping, dig, traceroute, curl

MSS / RTT = maximum segment size / round trip time = a good way to guess how long it'll take for your page to arrive.

Use ping to figure out the RTT. will give you a lot of the same information that the commercial tools provide.

All the performance work in the web world came out of Yahoo.

The 14 YSlow rules are all about T3.

Here are the original 14 YSlow rules:

  1. Make fewer HTTP requests.
  2. Use a CDN.
  3. Add an expires header. (There are now better headers.)
  4. Gzip components.
  5. Put CSS at the top.
  6. Put scripts at the bottom. (We now say to put them in the head.)
  7. Avoid CSS expressions.
  8. Make JS and CSS external.
  9. Reduce DNS lookups.
  10. Minify JS.
  11. Avoid redirects.
  12. Remove duplicate scripts.
  13. Configure ETags. (He says don't bother.)
  14. Make AJAX cacheable.

Mobile devices are weak on floating point operations. Hence, they may not be as good at decompressing things.

Do not put the scripts at the bottom. The advice has changed. Chrome compiles your scripts to binary, but only if you put them at the top.

It halts the rendering process if you have JavaScript in the body. If it's in the head, it doesn't.

He's mixed on whether minification is good or not. It makes debugging harder. Maybe gzip is enough.

The rules are now different.

Unix performance testing tools:

  • ping
  • nslookup, dig (These are somewhat interchangeable.)
  • traceroute
  • netstat (This lists the network connections on the machine.)
  • curl

When you traceroute a site, the number of hops varies between runs.

If you get stars during a traceroute, that means there's a firewall that is preventing you from getting that information.


When tracerouting google, we got a range of 13-17 hops.

UNIX can't really measure things less than a millisecond.

netstat -a
netstat -A
netstat -A | grep -i HTTP

curl only returns the base page. It doesn't retrieve the images, etc. is really good.

When you look at a waterfall diagram, find the long pole.

cache ratio = cached response time / uncached response time

Cache more. is running real browsers.

It's okay if your base page isn't cached. Make sure the images, etc. are cached.

Task-based performance thinking: Users have use cases. They don't care about just a single page.

Look at the paths users use. Then, make those paths easier.

Users don't do what you thought they would do when you designed the website.

Focus on optimizing the 2-3 things the users do the most.

Test your competitor's performance.

Tumblr was a top 20 website, and it ran out of the founder's basement before Yahoo bought it.

Stormcat is for global performance testing.

Antipattern: design-time failure.

You can't bolt performance onto your website after you launch it.

Section V: ???

He talked about "W3C navigation timing". He almost never uses this. He doesn't think it's very good even though he worked on it.

Antipattern: we'll be done with this soon.

Performance is an ongoing activity, not a fire and forget activity.

Antipattern: not treating performance as a property of the system, or only testing at release time.

Pattern: establishing a long-term performance management plan as part of your cycle.

Native apps run 5X faster than HTML5.

Mobile is 10X slower than desktop.

HTML5 on mobile devices can be 50X slower:
  • 10X from the ARM chip
  • 5X from JavaScript

However, chips have gotten a lot better lately.

3G adds 2000ms of latency.

3G is not very common here, but it's very common overseas.

4G is much better.

Since 2009, mobile browsers went from 30X to 5X slower than desktop browsers.

In the US, we're generally on LTE, not 4G.

HTTPWatch is a good app for mobile.

Amazon's home page makes 322 requests. It's insane.

74% of users will leave if a mobile website takes more than 5 seconds to load.

Use the right tool for the right job:

  • Server
  • HTML
  • CSS
  • JavaScript

Nick Zakas architected the Yahoo homepage.

Doug Crockford said, "Don't touch the DOM!" [Not sure about that.]

TTFB = time to first byte

TTFB is not a good measure of server duration.

Use web workers for preloading.

Test performance on different transport types.

Test battery consumption.

The NYT website eats up your battery life.

Mobile networking is a big challenge, so design for delay tolerance.


There's iCurl for the iPhone.

Antipattern: Failing to recognize that the distribution of the mobile E2E is very different from a desktop performance profile.

The server duration is about 35% of the total E2E.

Section VI: Psychology of Performance

100ms to identify distinct objects
150ms to respond
250ms for user "think time"

TVs delay the sound by 30ms.

Th = Tp + Tc + Tm
T(human) = T(perceptual processing) + T(cognitive) + T(motor)

When faced with N choices, users will take O(log N) cycles to proceed.

The size of UI objects on small screens limits your accuracy.

Wearables and small devices are near the point of minimum usability for visual interactions.

My initial response time was 244ms ;)

  1. Make performance a priority.
  2. Test, measure, test again.
  3. Learn about tools.
  4. Balance performance with features.
  5. Track results over time.
  6. Set targets.
  7. Ask questions; check for yourself!

He pointed at me and said, "This guy has been asking me questions all day, and he's not entirely sure I'm right about everything, which is good...I'm not right about everything...I can be wrong."

Tim Berners-Lee invited the WWW, HTTP, and the URL addressing scheme.

Doug Engelbart invented the mouse and hypertext. He died 2 years ago.

Dr. Charles Nelson (?) invented SGML. [Hmm, Wikipedia says something else.]

HTML is based on CALS which is an SGML dialect.

Tim Berners-Lee wrote the original code for all of this, although he's not very good at writing code. His genius was assembling all the parts into a working system.


Popular posts from this blog

Ubuntu 20.04 on a 2015 15" MacBook Pro

I decided to give Ubuntu 20.04 a try on my 2015 15" MacBook Pro. I didn't actually install it; I just live booted from a USB thumb drive which was enough to try out everything I wanted. In summary, it's not perfect, and issues with my camera would prevent me from switching, but given the right hardware, I think it's a really viable option. The first thing I wanted to try was what would happen if I plugged in a non-HiDPI screen given that my laptop has a HiDPI screen. Without sub-pixel scaling, whatever scale rate I picked for one screen would apply to the other. However, once I turned on sub-pixel scaling, I was able to pick different scale rates for the internal and external displays. That looked ok. I tried plugging in and unplugging multiple times, and it didn't crash. I doubt it'd work with my Thunderbolt display at work, but it worked fine for my HDMI displays at home. I even plugged it into my TV, and it stuck to the 100% scaling I picked for the othe

ERNOS: Erlang Networked Operating System

I've been reading Dreaming in Code lately, and I really like it. If you're not a dreamer, you may safely skip the rest of this post ;) In Chapter 10, "Engineers and Artists", Alan Kay, John Backus, and Jaron Lanier really got me thinking. I've also been thinking a lot about Minix 3 , Erlang , and the original Lisp machine . The ideas are beginning to synthesize into something cohesive--more than just the sum of their parts. Now, I'm sure that many of these ideas have already been envisioned within , LLVM , Microsoft's Singularity project, or in some other place that I haven't managed to discover or fully read, but I'm going to blog them anyway. Rather than wax philosophical, let me just dump out some ideas: Start with Minix 3. It's a new microkernel, and it's meant for real use, unlike the original Minix. "This new OS is extremely small, with the part that runs in kernel mode under 4000 lines of executable code.&quo

Haskell or Erlang?

I've coded in both Erlang and Haskell. Erlang is practical, efficient, and useful. It's got a wonderful niche in the distributed world, and it has some real success stories such as CouchDB and Haskell is elegant and beautiful. It's been successful in various programming language competitions. I have some experience in both, but I'm thinking it's time to really commit to learning one of them on a professional level. They both have good books out now, and it's probably time I read one of those books cover to cover. My question is which? Back in 2000, Perl had established a real niche for systems administration, CGI, and text processing. The syntax wasn't exactly beautiful (unless you're into that sort of thing), but it was popular and mature. Python hadn't really become popular, nor did it really have a strong niche (at least as far as I could see). I went with Python because of its elegance, but since then, I've coded both p