Saturday, October 16, 2010

Practical Web Performance Optimization

In web development nowadays, the importance of minimizing page load times is widely acknowledged.  (See, for instance, http://www.stevesouders.com/blog/2010/05/07/wpo-web-performance-optimization/.)  There are all manner of best practices for reducing load time.  However, many of these practices make day-to-day development more complicated.  In this post, I'll discuss some techniques we're using to reconcile performance with ease of development.

Consider the "primordial" approach to web serving: a site is a collection of files (static resources and/or dynamic templates) sitting in a directory.  JavaScript and CSS code is written in a modular fashion, grouping code into files in whatever organization makes it easiest to locate and manage.  Templating aside, files are served raw, always using the latest copy in the filesystem.  This is quite nice for development:
  • You can edit using whatever tools you like.
  • Changes are immediately visible in the browser.  No need to restart the server, or invoke any sort of recompilation tool.
  • Comments, formatting, and file organization are preserved, making it easy to inspect and debug your JavaScript, CSS, and HTML code.
However, the resulting site is likely to break many rules of web performance.  Notably:

1. Each script module, CSS module, and image requires a separate HTTP request.

2. No CDN (content distribution network) is used -- all requests are served directly from the application server.

3. Content is not cacheable.  (Or, if you do mark some files as cacheable, you'll run into trouble when deploying a new version.)

4. JavaScript, CSS, and HTML code are not minified.

Development mode vs. production mode

The usual approach to this convenience / efficiency tradeoff is to implement two serving modes.  In development mode, the "primordial" model is used.  In production mode, a preprocessing step is used to minify script code, combine scripts, push content to a CDN, and otherwise prepare the site for efficient serving.

This approach can get the job done, but it is a pain to work with.  It's hard to debug problems on the production site, because the code is minified and otherwise rearranged.  It's hard to test performance fixes, because the performance characteristics of development mode are very different than production.  Developers may have to manually invoke the preprocessing step, and problems arise if they fail to do so. Web development is hard enough without this added complexity.

The preprocessing approach is also limited to static content.  For instance, it is difficult to minify dynamic pages with this approach.

Solution: optimize on the fly

In our site, optimizations are applied on the fly, in a servlet filter that postprocesses each response.  (Of course, the server is free to cache the optimized version of static files.)  Current optimizations include:
  • Minify JavaScript and CSS files.
  • Minify HTML pages, including embedded JavaScript and CSS.
  • Rewrite script/style/image URLs to support caching (described below).
Forthcoming extensions:
  • Asset bundling
  • Rule checking
  • Asset repository to avoid version skew (described below)
Because optimization is done on the fly, it is never "stale"; developers don't need to remember to re-run the optimizer.  At least as important, it can be enabled or disabled on the fly.  When investigating a problem on the production site, a developer can disable minification and debug the code directly.  Or they can turn on minification in a development build and observe the effects.

Optimization is enabled and disabled using a simple configuration system we've built.  Configuration flags can be specified at the server, session, or request level.  The session mechanism is particularly convenient: simply by invoking a special form, a developer can disable minification for themselves on the production server, without affecting the way content is served to other users.  And toggling minification in a development build doesn't require a server restart.

URL rewriting for cacheability and CDNs

For performance, it's critical to allow browsers to cache asset files (e.g. images and scripts).  However, this causes problems when those files are updated.  The best solution I'm aware of is to change the filename whenever an asset is modified.  This allows you to mark the files as permanently cacheable.  However, it's a hassle to implement: whenever you modify a file, you have to update all links referencing it.

Our servlet filter scans HTML content for asset references (e.g. <script src=...> or <img src=...>).  For any such reference, if the reference refers to a static file in our site, the filter rewrites the reference to include a fingerprint of the file contents.  Thus, references automatically adjust when an asset file is changed.

When our server receives a request for an asset file, it looks for a fingerprint in the filename.  If the fingerprint is present, we set the response headers to allow indefinite caching.  Otherwise, we mark the response as noncacheable.

When we adopt a CDN, we'll use this same mechanism to rewrite asset references to point to the CDN.

One issue we haven't yet tackled is rewriting asset references that appear in JavaScript code.  Fortunately, the mechanism is failsafe: if our filter doesn't rewrite a reference, that asset will be served as noncacheable.

File bundling

A page often references many asset files.  Performance can be improved by bundling these into a smaller number of larger files.  For JavaScript and CSS files, this is fairly straightforward; for images, it requires CSS sprites.  In both cases, we run into another tradeoff between convenience and performance -- development is easier with unbundled files.

Once again, on-the-fly rewriting comes to the rescue.  When the servlet filter sees multiple <script> or <style> references in a row, it can substitute a single reference to a bundled file.  When it seems multiple CSS image references, it can substitute references to a single sprite image.  The configuration system can be used to toggle this on a server, session, or request level.

Rule checking

Some web performance rules require nonlocal changes to a page or site.  For instance (from http://developer.yahoo.com/performance/rules.html):
  • Put stylesheets at the top
  • Put scripts at the bottom
  • Avoid CSS expressions
Over time, we might extend our servlet filter to implement some of these optimizations by rewriting the response.  But in the meantime, it is relatively straightforward for the filter to detect and report violations.  It is easier to perform this detection at runtime than in a static lint tool, because we can observe the final, post-template-engine version of the page.  And unlike tools like YSlow, we don't have to manually test each release of the site.

Consistency across server updates

When you push a new version of your site to the production server, a "version skew" problem can occur.  Suppose a user opens a page just before the new version is activated.  They might wind up receiving the old version of the page, but the new version of the scripts.  This can cause script errors.  If your site is distributed across multiple servers, the window of vulnerability can be many minutes (depending on your push process).  This is not a performance issue, but it involves references from a page to its assets, so it ties into the mechanisms involved in asset caching.

Happily, the URL rewriting technique described in earlier offers a solution to version skew.  When a server receives a request for an asset file, it simply serves the file version that matches the fingerprint in the URL.  This ensures that all asset files come from the same site version as the main page.

This solution does assume that all servers have access to all versions of the site.  This can be addressed by copying asset files to a central repository, such as Amazon S3, during the push process.

Conclusion


A surprisingly large number of web performance optimization rules can be implemented with little or no impact on day-to-day site development.  Equally important, optimizations can be implemented with session-based disable flags, allowing problems to be debugged directly on the production site.  We've been using this approach in our site development, with good results so far.

2 comments:

  1. Are you running the JsCompiler as part of the servlet or using some other minification system?

    ReplyDelete
  2. Currently I'm using the YUI Compressor (http://developer.yahoo.com/yui/compressor/). It doesn't support the kind of advanced optimizations that Google's JavaScript compiler (http://code.google.com/closure/compiler/) does, but it was almost embarrassingly easy to integrate. The actual compressor invocation is literally one line of Java code, and it runs in-process, so I don't feel shy about invoking it for embedded script blocks in a page.

    I've been trying to implement performance optimizations in a way that doesn't complicate app development. More advanced optimization would be interesting eventually, but I'm leery of the rules that the Closure compiler imposes. If/when I tackle this, it will probably yield another blog post. :)

    ReplyDelete