Skip to main content

Extreme Web Performance

I went to an all-day tutorial on web performance at ForwardJS. These are my notes:

Here are the slides.

Max Firtman gave the tutorial. The thing I love about Max's tutorials is that there's lots of good, well-organized, approachable content, but there's minimal ego.

He's the author of "High Performance Mobile Web".

I missed the first 20 minutes.

Things we'll cover:
  • Performance and Perception
  • Understanding the Web
  • Basic Optimizations
  • Extreme Optimizations
He gave homage to Steve Souders. He said that Steve started the discipline of web performance while he was at Yahoo. He wrote "High Performance Web Sites". However, that book is a bit dated.

Performance affects your search ranking.

#perfmatters

He recommended the Velocity Conference.

The biggest problem isn't the server. Based on the top 100,000 sites, only 16% of the time is spent on the backend. If you cut your backend time by 50%, that might only impact the frontend time by 10%. Optimizing the frontend is easier and has a bigger impact.

In a waterfall chart, for each resource, there are the following:
  • Queuing
  • Stalled
  • DNS lookup
  • Initial connection
  • Request/Response
  • Request sent
  • Waiting (TTFB)
  • Content downloaded
There's a difference between the first view and the repeat view.

In a waterfall chart, there are lines for milestones:
  • Start Render
  • msFirstPaint
  • DOM Content Loaded
  • On Load
  • Document Complete
He showed the Filmstrip view to show what your site looks like when it's loading.

He mentioned WebPageTest.

In web performance, perception is more important than reality.

Chrome DevTools >> Network tab

There's a capture screenshots icon.

He used whitehouse.gov for his example. He said this administration and the last administration both had terrible web sites from a performance point of view. It took 8 seconds to get the first paint.

There's an icon to select a device to emulate.

WebPageTest

It's sponsored by Google and the community. It's an open source project.

It also lets you select a device. It uses real devices and real networks.

It has a "Capture Video" check box.

Interesting: webpagetest.org/compare will take your website and optimize it automatically to see how much better your website would be just with automatic tools.

Looking at the filmstrip is a great way to analyze the user's pain. In the video, when all the above the fold content is loaded, it shows it in gray.

Important: ATF = above the fold.

It took 6.4 seconds for whitehouse.gov to load the ATF content.

There's a book on WebPageTest, "Using WebPageTest".

By default, it'll run 3 tests.

Speed Index

Speed Index is a measurement. It measures the ATF content that isn't yet showing. It's a graph with time at the bottom and the percentage of ATF content that's showing on the Y axis. It measures how fast your website is being rendered. It's based on taking screenshots over time. Measure the area above the line (i.e. the stuff the user isn't seeing yet). It measures "how much nothing" the user is seeing.

Google DevTools doesn't show Speed Index. However, WebPageTest does.

1000 = a good Speed Index
4156 = whitehouse.gov
20000 = a bad web site

On the bottom of the filmstrip on WebPageTest, it shows you the Speed Index.

We want a lower Speed Index, which means we're showing more of the ATF content sooner, which means the graph will look more like an "r" and less like a skateboard ramp.

Progressive JPEGs are apparently not very popular.

Remember, the Speed Index diagram is going to be different per device.

Lighthouse

This is from Google. It gives you a score. It's available as a Chrome extension or as a standalone npm command line tool.

Some other metrics

Page load time is important, but it's not the most important metric. It doesn't say much about the user's experience.

Time-to-first-byte is important. This might point to an issues on the server.

Start render is important because it tells you when the user is seeing something on the screen.

Above-the-fold time is important. It measures when all the ATF content is loaded.

The HAR format

He introduced the HAR format. HAR stands for HTTP archive. He mentioned the Wayback Machine (archive.org). The HAR format has all the HTML, images, etc. However, it does not include HTTP headers.

httparchive.org is from Steve Souders. It doesn't just store the content, but also how the content was served.

Today, a typical website is 2.4 MB.

To create a HAR file, use Chrome DevTools >> Network >> Save as HAR with Content. Now, you can analyze it with various different tools. There are a bunch of different HAR analysis tools, even online. It contains the waterfall, etc.

Remember, the HAR file is going to be completely different on a mobile phone.

Perception and Goals


Jacob Nielsen gave these numbers:
  • Immediate feedback: 100ms
  • Losing user's flow of thought: 1s

RAIL

Google came up with this approach and these goals about 1.5 years ago.

R = Response: Respond to the user within 100ms.

A = Animation: Each frame should rendered within 16ms in order to hit 60 FPS. Having smooth scrolling is part of this.

I = Idle: If you're doing things behind the user's back, it should be in 50ms chunks.

L = Load: The user should perceive that the page is ready to use within 1s. It's easy to understand, but it's hard to achieve. Often, it's more perception than reality. Some big, important companies are actually able to achieve this.

Desktop browsers

Chrome, Firefox, Edge, IE

iOS and Android browsers

It's not just Safari and Chrome. A lot of times, the user is using the in-app browser. On iOS, 48% of the browsing is done via Facebook in-app Web views.

In iOS, Firefox and Chrome use iOS's own HTML renderer under the covers.

On Android, 34% of all browsing is done via Facebook in-app web views.

45% of mobile users are on Android, 45% are on iOS, and 10% are on other mobile platforms.

Samsung has its own browser, the Samsung Internet Browser, based on Chromium. It has its own implementation of Service Workers.

Users who browse the web on an Android device:
  • 38% Chrome
  • 32% Web view
  • 15% Some Chinese browser
  • 4% The old Android browser
  • 5% Opera
  • 5% Samsung's browser
  • < 1% Firefox

Simulators and emulators

A simulator just looks like the thing it simulates.

An emulator runs the actual code of the thing being emulated.

On Mac, there's an iOS simulator. When you run Safari in the iPhone simulator, it's still just a simulator. It's not exactly how a real device would perform.

Android has real emulators. Genymotion is a real virtual machine. It's based on VirtualBox.

Remember, Google Chrome is not part of Android. That means it's not in the virtual machine. And, you can not install it on the emulator.

The browser in the emulator is the old browser; it's not what you'd see on a real phone.

Hence, measuring performance with an emulator is a problem.

In Chrome DevTools, there are simulators for various devices. It doesn't match how real devices would perform.

WebPageTest has real devices.

BrowserStack has real devices as well.

Samsung has a Remote Test Lab that you can use.

In Chrome DevTools, you can also throttle the network (including latency) and the CPU.

There are other connection simulators:
  • Network link conditioner  for Mac/iOS
  • Charles Proxy
  • Clumsy for Windows
  • Net Limiter for Windows
  • SlowyApp for Mac
WPO = web performance optimization?

He posts his website on Facebook in a private post so that he can try his website from within the Facebook web view.

If you're browsing through the Facebook web view, there are no Service Workers. The cookies aren't shared between that and the real browser.

Web views on Android are a mess. And they're ancient. It's a problem.

What happens when you load www.whitehouse.gov?
  • DNS resolution: ~100ms
  • TCP handshake:
    • SSL negotiation (HTTPS)
  • HTTP request:
    • Headers: User Agent, more data about the request
  • Server receives the request
  • HTTP response:
    • Headers
    • Body
  • Parsing the response: HTML
  • List of additional resources that need to be downloaded
  • Starts rendering
Remember, you're probably using multiple domains, each which require DNS resolution and TCP/SSL negotiation.

SSL requires a different number of round trips depending on the version. Expect roughly 5.

HTTP/2

It was built for performance from scratch.

It has header compression. Consider that the headers (for the request and response) are about 1k in total, and this is per request.

It has TCP connection reuse. Over the same TCP connection, you can multiplex multiple requests. On HTTP/1.1, the browser opens about 6 concurrent connections. It downloads 6 things in parallel. In HTTP/2, you can download lots of things concurrently over the same connection.

It has push to cache. It's unclear exactly how we're going to use this. If the server pushes something to the client, and the client didn't request it, it just puts it in the cache. Later, if it needs to request it, it's already in the cache. We don't yet have a standard for telling the server which files to push.

Early tests show a 15-60% load time improvement.

It doesn't hurt to implement HTTP/2.

You have to use TLS.

It has pretty good compatibility among modern browsers.

You can upgrade your servers or use a CDN in order to use HTTP/2.

A connection is "upgraded" to HTTP/2.

Low Impact (I'm guessing I heard him wrong, and he was referring to Load Impact) is a service that can tell you how much of a performance gain you'll get from switching to HTTP/2.

Only 25% of the world is on 4G globally. 30% of the time 4G is not used even if you do have it.

When you think of 2G vs. 3G vs. 4G, don't just think about bandwidth. The real problem is latency.

RTT = round trip time
  • On 2G, the RTT can be up to a second
  • On 3G, it's a little less than 500ms
  • On 4G, it's a little over 250ms
  • At home, it's like 50ms
Important: On 3G, it might take up to a couple seconds just to start up the SSL connection.

Important HTTP headers

  • Connection: Keep-alive
  • Content-encoding
  • Cache headers
  • Cookies

The browser cache

    Request
    Response:
        Headers:
            Max-age: 2 days

    Request #2:
        Expired:
            Make a conditional request.
            The server can respond with:
                Not Modified
                200 with a new version
        Not expired:
            Take the file from the cache

Service Workers

What's a service worker?
  • It's an in-browser network proxy.
  • It's a process that can run in the background even if the tab is closed.
  • It can make requests and responses and take control of the cache.
It's not yet in all modern browsers.

A service worker can run even before DNS resolution if it has already been loaded. It can make up a response from scratch or return something from the cache if it wants.

It can have a big impact on performance.

Network tips

He used the hashtag #PERFTIP.

Enable GZIP on text-based files. This can save 80% of the response size. Obama's version of whitehouse.gov didn't have GZIP enabled! About 95% of sites are now using GZIP. BTW, you should compress your JSON responses as well; they compresses very well. People commonly forget to compress JSON and SVG responses.

Reduce DNS queries. They're about 50-120ms each. The average is 100ms.

Enable HTTP Keep-alive. It's on by default.

Make static content expire late in the future. Take advantage of the cache. If you change the file, change the name (i.e. use versioned filenames). Or, you can request the file with a query parameter (e.g. logo.png?20170228). This is better than using E-tags because E-tags still require a request to the server. It's better if the browser just has a version cached without needing to make a conditional request.

Domain sharding

This is a technique from the IE6 era. It's one of the oldest web performance tricks.

It's so that you can have more connections to the server. Think of it as providing "more lanes" to the server. Browsers provide 6 connections per net location (scheme, host, port) these days.

Setup multiple domains even if they're just aliases to the same server.

Research suggests that any more than 2 has diminishing returns.

You're also limited in terms of overall bandwidth.

There's also the cost of the DNS query to worry about.

You don't need domain sharding in HTTP/2. Multiple requests can go over the same connection simultaneously.

Putting JS and CSS on one server, and images on another will have an impact.

A lot of people are skipping this technique these days. It's less important these days.

Use cookie-less domains.

Reduce your cookie sizes. If your cookie size is 1k, you're adding 16ms per request.

Consider implementing HTTPS. This can help prevent proxies from messing with you. 20% of people are behind proxies that will freak out if you start doing non-standard things without using HTTPS. Security is another important reason to use HTTPS ;)

Reduce redirects

There's a header that you can use to tell the browser that from now on, all requests should be made over HTTPS, and the browser will respect that. That helps you avoid the redirect from HTTP to HTTPS.

Interesting: It takes 300ms to redirect from non-www to www. Is it really worth it? He said to stop doing that. Return some content. Redirect later when the user navigates to a new page.

In general, a redirect takes between 100ms to 1s.

When you click on a link on a social network, there's already a redirect built in so that they can track your click.

In the console, you can inspect performance.navigation to see how many redirects there were.

aa.com uses 8 redirects on a mobile phone!
  • aa.com
  • www.aa.com
  • www.aa.com/
  • www.aa.com/HomePage
  • www.aa.com/HomePage/Default
  • mobile.app.com
  • mobile.app.com/
  • mobile.app.com/HomePage
  • mobile.app.com/HomePage/Default
If you're using a redirect, use a permanent one so that the browser can cache it.

Remove "stop signs"


CSS scripts block rendering. The browser will wait until they're downloaded and parsed before continuing.

Synchronous JavaScript scripts also block rendering.

Important: Remove interstitial app banners (screens that stop the user and encourage the user to download the native app on mobile). They consume 1s to 4s. 70% of users will just abandon the web site. Google will flag you as not mobile friendly.

Avoid client-side rendering for the initial content. He cited Twitter. You can start rendering some stuff server-side for the initial page load. It's painful, but it's important.

Announce DNS queries ASAP:

<link href="https://newdomain.com" rel="dns-prefetch">
<link href="https://newdomain.com" rel="dns-prefetch">

You can also do this via an HTTP header.

Use CSS as an appetizer, which means add style tags to the head. Avoid @import in CSS files since they require an extra network request while the rendering is blocked. Use HTTP/2 if possible.

Use JavaScript as dessert, which means add the script tag at the end of the document.

We no longer care about the DOMContentLoaded event. We used to. It doesn't matter much anymore.

Compress/obfuscate JavaScript and CSS, even if you're already using gzip.

Combine scripts and CSS files.

Kahn Academy wrote a blog post, "Forgo JS packaging? Not so fast". It basically said that bundling still makes sense even in HTTP/2 because the compression is better.

If you use media queries, it downloads them all even if they aren't going to be used:
<link href="p.css" media="{orientation: portrait}" rel="stylesheet">
<link href="l.css" media="{orientation: landscape}" rel="stylesheet">
Use non-blocking scripts. Use <script defer=""> or <script async>. Defer downloads in parallel, but executes in order. Async downloads in parallel, but executes in whatever order the scripts finish arriving. You're likely to use async for things like tracking libraries.

Use on-demand code. You don't have to load all of your JavaScript and CSS up front.

Interesting: Stop using onload. It only fires once the entire page is ready, including iframes. Execute stuff sooner.

Interesting: Be careful with blocking images. If you use data URIs to embed images in HTML or CSS, these can make the HTML or CSS bigger. By default, images are not blocking, but if you use data URIs, they've basically become blocking.

Embrace responsive images.

Use CSS sprites instead of a bunch of separate, small images. It reduces the number of requests. You can save 1k per images. However, this is not appropriate for HTTP/2; it's probably an anti-pattern these days.

Icon fonts have advantages and disadvantages. They don't support colors. They're bad for accessibility. Usually SVG is better in terms of performance. They're better than images, though.

Embrace SVG. Of course, make sure SVG files are gzipped.

Compress images. 40% of the bytes being downloaded would be saved if we just did this properly.

Be careful with web fonts. Even if the text is loaded, if the font isn't downloaded, the user can't see it. You can load them asynchronously. You can remove characters that you aren't going to use.

Release the main thread ASAP:
function clickHandler() {
    setTimeout(doSomethingHeavy, 0);
}
Or use a web worker.

Avoid CPU repaints. Consider GPU vs. CPU repaints. These are important when scrolling, on transitions, and during animations. Avoid things with text shadows that are going to be animated because the text shadows are rendered by the CPU.

GPU: Transforms and opacity
CPU: Border-radius, gradients, shadows, filters

In Chrome DevTools, near the Console tab (hit escape), click on the Rendering tab. There's lots of useful stuff in there, such as the FPS meter. There's also an Animations tab near the Console tab.

There are also experiments you can turn on in the Chrome DevTools settings.

Your new enemy: itt's not your users. It's your designers :-P If they add a 1px drop shadow to a bunch of stuff, it'll kill your performance. Make your designers part of the performance solution.

In general, reduce requests.

Extreme Optimizations :)


The goal is to load in 1s. On 3G, you're using 600ms just for the network. That only leaves you with 200ms for the server response and 200ms for rendering! However, we just need to optimize for perceived load time.

Consider trying to load your ATF content in 14Kb. This basically puts the response in 1 TCP packet.

ATF in 1s = 1 RTT ~ < 14Kb (compressed)
       
> 14Kb will create another roundtrip.

This includes HTML + CSS (inline) + JavaScript (inline).

Images might need to be separate, unless they're small enough to be inlined.

He said that bbc.com.uk was doing this for mobile, and they wrote a blog post about it. When I tried it, the first HTML file was 33Kb.

We need to separate ATF content.

He talked about the importance of embedding the CSS in the HTML file for ATF content.

Google is doing this for mobile browsers. They're even embedding images in the HTML (compressed, base64 encoded, etc.).

Interesting: Google's old logo was 14Kb. They're new logo is 305 bytes. They blogged about it. They're using SVG.

Facebook has 2G Tuesdays. If you opt in, for an hour a day, your connection will be throttled to 2G to remind you of what the rest of the world goes through.

He said we should read: https://amphtml.wordpress.com/

The average time it takes to fully load a mobile landing page is 22 seconds, according to a new analysis. Yet 53% of mobile site visitors will leave a page if it takes longer than three seconds to load. That’s a big problem.

Going from 1s to 3s increases your bounce rate by 32%.

Going from 1s to 10s increases it by 123%.

Responsive Web Design (RWD)

For most people, Responsive Web Design is only about playing with the width of the browser.

Responsive Web Design is a tool. It's a way of accomplishing mobile friendliness. It's not a goal in and of itself.

He wasn't really that much of a fan of RWD. He liked other approaches better. He thought that we should probably serve different code for mobile web users.

Users care if the site is fast.

He said that Google serves different HTML for mobile devices.

He says so does Twitter. They have m.twitter.com.

What really matters is being mobile friendly. Any of these 3 approaches work:
  • Responsive Web Design
  • Dynamic serving (different content for different browsers)
  • Separate URLs (m.example.com)
There are benefits and drawbacks to each of these approaches. Using the same URL is a good idea.

WURFL and DeviceAtlas are server-side libraries to figure out whether the client is a mobile device.

It's hard to hit the 1s response goal if you're serving the same HTML to everyone.

Interesting: A "frontend server" is a server between the backend server that is managed by the frontend engineers. It's an idea that gives frontend engineers more control over the user experience. Not everyone is going to do it this way.

Not everyone needs to build a responsive web app.

If you're responsive, but your performance sucks, your site is still going to suck.

Interesting: Article from him: You May Be Losing Users If Responsive Web Design Is Your Only Mobile Strategy.

Navigation timing API

window.performance

It has lots of useful information.

You can use this information at runtime to change how your code behaves.

Resource timing API

It has information per resource.

Network information API

It gives you information about the user's network connection.

Different browsers implement different versions of the spec.

Silk is the browser on the Amazon Fire. It's Chromium based.

Service workers

  • Manage an offline cache
  • Background sync
  • Push notifications

Request idle callback

Do something when the browser is idle. It's only in Chrome (beta).

HTTP Client Hints

If you enable this, the client will give you headers containing the dpr (device pixel ratio), the viewport's width, etc.

It's in Chrome. It's coming to Edge and Safari. It's not coming to Firefox.

That way, the server can make intelligent decisions about how to serve the client.

There's a new header called save-data that is a hint from the browser to try to save data. You might use this, for instance, if the user is on a crappy internet connection that charges per byte.

AMP (Accelerated Mobile Pages)

"For a faster, open mobile web."

It's for publishers.

He still has mixed feelings about this.

If you use AMP, Google will give you a bump in their search rankings.

Google will cache your pages. They'll serve your content from their servers.

It's based on Web Components.

You use <amp-img> instead of <img>, etc.

You cannot add your own JavaScript. If you want JavaScript, you need to put it in an iframe.

Tracking providers have to be AMP compatible. There's a tag for that, <amp-pixel>.

It started as a solution for publishers, not for apps.

People are playing around with mixing AMP with progressive web apps. Load the initial page immediately. Then install the service worker. Then, any new page will be served from the service worker.

In Google search results, it shows you an AMP icon if the website uses AMP.

Going from search results to your app is almost instant.

However, the page is served through google.com.

There is a way to do Twitter and YouTube embeds.

AMP is the result of the mobile web being too slow.

I think he said Facebook articles are a similar technology. However, they're rendered by the Facebook native app, not by a browser.

Google's carousel can drive a lot of hits to your site, but it's not always possible to get your site into that carousel.

He thinks AMP is interesting, but there are a lot of haters. It's very controversial.

Summary

Deliver ATF content in 14Kb.

Embed all the CSS and JavaScript needed for the ATF content.

If you have enough space, embed the logo and/or low-res images.

Avoid web fonts that block your text from showing.

Re-evalulated how you're using responsive web design.

Consider using AMP.

Consider implementing an SD vs. HD approach. I.e. use different versions of your website for different browsers, etc.

Create a custom cache using service workers.

Predict the near future

Start actions on mousedown. Don't wait for the click. That takes 100ms longer. However, you'll need to manage cancellations.

Start actions at hover. This can save 200ms. However you have to use this technique with caution.

Prefetch the resources the user is going to need:

<link ref="prefetch" href="">
<link ref="prerender" href="">

Provide immediate feedback for touch users:
<meta name=viewport conte="width=device-width,user-scalable=no">

html {
-ms-touch-action: manipulation;
touch-action: manipulation;
}

Lazy load

Use a multistage download strategy.

There's an older trick where you load below the fold content the first time the user scrolls.

Alternative compression methods

Zopfli is better than gzip or deflate. It takes 8x longer to compress the file, but it's 8-12% better. This might be appropriate if you don't have to do it real time.

He joked about https://github.com/philipl/pifs. Apparently, all sequences of digits are present somewhere in pi.

There's an even better compression algorithm: Brotli. It's available in Chrome 49+. It's coming to Edge and Safari. It's TLS only. It's about 15-20% better than gzip.

Alternative image formats

WebP, JPEG-HDR, BPG

These aren't supported in every modern browser. These are about 10-30% more efficient than JPG or PNG. To use these, you need to use the picture tag to let the browser decide which version to download.

There are JavaScript decoders that can let you decode these in the client ;) Consider doing this in a service worker.

FLIF is in the future, but it's better than all the others.

You can compress your PNGs even further using zopflipng without losing quality or compatibility.

About 80% of the download size is images. However, JavaScript costs not just in terms of download speed, but also in terms of execution cost.

Act like a magician. Use illusions.

Consider making your actions optimistic. For instance, when you click Like on Facebook, it doesn't actually go to the server before returning. It's lying. The same thing is true when sending email on GMail.

Render placeholders instead of progress indicators. Progress indicators are becoming an anti-pattern. If it's within 1s, don't bother using an progress indicator.

Final thoughts

Measure and profile in the real world with real devices.

Don't redirect. Reduce requests.

Try to deliver the ATF content within 1s. Defer the rest.

Be simple; be aggressive; be smart.

He answered a question about WebASM. It's for specific use cases. One of these is to compile a C++ app to run in the browser. It's really more for full screen apps.

He talked about the importance of creating your own metrics. For instance, Twitter has "Time to First Tweet".

Comments

Jeffrey Posnick said…
Hey JJ! Interesting writeup! Re:

Lighthouse

This is from Google. It gives you a score. It uses WebPageTest under the covers.


Lighthouse runs its own diagnostic suite using Chrome's Debugger API: https://developer.chrome.com/devtools/docs/debugger-protocol

It's pretty different from what WebPageTest does.
Shannon Behrens said…
Thanks for the correction, Jeffrey. What you said matches my understanding. I was just writing down what the speaker said. I'll fix it.