As an experiment, I spent part of last week writing a simple HTML5-based multiplayer video game (a clone of the original 
Maze  War).  It works surprisingly well; browsers have come a long way.
A critical parameter for this sort of game is communication latency.   When I shoot at you, how soon does the incoming missile appear on your  screen?  If the latency is not very short, the game feels "laggy".  In  my playtesting, lag wasn't a major problem, but occasionally it seemed  like things would hiccup a bit.  I decided to do some more scientific  testing.  I wrote a simple web page that issues an XMLHttpRequest once  per second, and then reports the latency back to my microbenchmark  server.  Here is the resulting histogram:
| Operation | # samples | Min | 10th %ile | Median | Mean | 90th %ile | 99th %ile | 99.9th %ile | Max | 
| XMLHttpRequest (OS X Chrome N. Calif. -> EC2 US East) | 13929 | 104.0 ms | 98.9 ms | 105.0 ms | 123.0 ms | 132.0 ms | 417.0 ms | 713.0 ms | 8.64 sec | 
(This reports latency for an  XMLHttpRequest from Google Chrome 5.0.375.29 beta, on a MacBook Pro  running OS X 10.6.3, over not-very-good 1500Kbps/500Kbps DSL in the S.F.  Bay Area, to an Amazon EC2 server in Virginia.  All benchmarks  discussed in this post use the same setup, except as noted.  All  benchmarks also ran for the same time period, and so experienced the  same network conditions, again except as noted.)
These figures  aren't too shabby; 100ms or so strikes me as a reasonable response  time.  However, we have a fairly long tail, so players will see  occasional lag.  Here is the same data, graphed over time:
There was a spike around 9  PM, but if I exclude that time period, the histogram doesn't change  much.  The jittery latency seems to be present consistently over time.
Surprise!   It's not the network
An obvious explanation for this  latency tail is delays in the network.  To verify that, I measured ping  times between the same pair of machines.  Surprisingly, the ping data  shows a much shorter tail:
| Operation | # samples | Min | 10th %ile | Median | Mean | 90th %ile | 99th %ile | 99.9th %ile | Max | 
| Ping (N. Calif. -> EC2 US East) | 41717 | 91.9 ms | 89.6 ms | 93.5 ms | 97.2 ms | 101.0 ms | 140.0 ms | 414.0 ms | 907.0 ms | 
And most of the tail turns out to be localized to the 9 PM event:
Excluding that time  period, most of the tail vanishes from the ping histogram: 99.9th  percentile ping latency is only 154ms!  Maybe my DSL isn't quite as bad  as I'd thought.
It's partly the server
The tail in  the XMLHttpRequest data must originate elsewhere.  Perhaps on the  server?  My EC2 server is juggling quite a few tasks by now: it's  running dozens of benchmarks, pulling additional benchmark data from App  Engine, accumulating histograms, and serving the XMLHttpRequests for  this latest benchmark.  Overall CPU utilization is still fairly low, and  CPU-sensitive benchmarks running on the server aren't showing much of a  tail, but still perhaps there is some effect.  So I fired up a second  server, configured identically to the first but with no workload, and  ran the same test there.  The results are faster than for the original  server, but only slightly:
| Operation | # samples | Min | 10th %ile | Median | Mean | 90th %ile | 99th %ile | 99.9th %ile | Max | 
| XMLHttpRequest (OS X Chrome N. Calif. -> EC2 US East) | 13929 | 104.0 ms | 98.9 ms | 105.0 ms | 123.0 ms | 132.0 ms | 417.0 ms | 713.0 ms | 8.64 sec | 
| XMLHttpRequest (OS X Chrome N. Calif. -> EC2 US East) B | 10446 | 104.0 ms | 99.4 ms | 108.0 ms | 114.0 ms | 117.0 ms | 323.0 ms | 629.0 ms | 791.0 ms | 
(I've repeated the original  data for comparison.  The lightly-loaded "B" server is shown in the last  row.)
It's partly HTTP
So, server CPU contention  (or Java GC) may be playing a minor role, but to fully explain the  latency tail we have to keep digging.  Perhaps 
Web Sockets, by eliminating most of the overhead of the  HTTP protocol, would help?  Happily, Chrome now has Web Socket support.   I decided to try several connection methods: XMLHttpRequest, Web  Sockets, and 
JSONP (a trick wherein you issue a request by  dynamically creating a <script src=...> tag, and the server  returns a JavaScript file whose execution delivers the response).  JSONP  has the useful property of not being bound by the "same origin"  security policy, enabling one additional connection method: "best of  two".  In this method, I have the browser issue a Web Socket request to  one server, and a simultaneous JSONP request to the other.  The latency  is measured as elapsed time from when we begin sending the first of  these two requests, until either response is received.  Here are the  results:
| Operation | # samples | 10th %ile | Median | Mean | 90th %ile | 99th %ile | 99.9th %ile | 
| JSONP (OS X Chrome N. Calif. -> EC2 US East) | 13929 | 98.7 ms | 104.0 ms | 123.0 ms | 155.0 ms | 416.0 ms | 688.0 ms | 
| XMLHttpRequest (OS X Chrome N. Calif. -> EC2 US East) | 13929 | 98.9 ms | 105.0 ms | 123.0 ms | 132.0 ms | 417.0 ms | 713.0 ms | 
| Websocket (OS X Chrome N. Calif. -> EC2 US East) | 13930 | 89.6 ms | 93.8 ms | 108.0 ms | 122.0 ms | 375.0 ms | 582.0 ms | 
| JSONP (OS X Chrome N. Calif. -> EC2 US East) B | 10447 | 98.6 ms | 104.0 ms | 116.0 ms | 132.0 ms | 341.0 ms | 652.0 ms | 
| XMLHttpRequest (OS X Chrome N. Calif. -> EC2 US East) B | 10446 | 99.4 ms | 108.0 ms | 114.0 ms | 117.0 ms | 323.0 ms | 629.0 ms | 
| Websocket (OS X Chrome N. Calif. -> EC2 US East) B | 10447 | 89.5 ms | 93.1 ms | 96.3 ms | 96.8 ms | 119.0 ms | 556.0 ms | 
| Websocket best-of-2 (OS X Chrome N. Calif. -> EC2 US East) B | 10446 | 89.6 ms | 93.6 ms | 98.4 ms | 101.0 ms | 219.0 ms | 551.0 ms | 
| Ping (N. Calif. -> EC2 US East) | 41717 | 89.6 ms | 93.5 ms | 97.2 ms | 101.0 ms | 140.0 ms | 414.0 ms | 
| Ping (N. Calif. -> EC2 US East) B | 39146 | 89.5 ms | 93.1 ms | 94.8 ms | 96.9 ms | 107.0 ms | 424.0 ms | 
For each server, Web Sockets are  clearly the fastest of the three connection techniques.  On the lightly  loaded "B" server, the Web Socket latencies are almost as good as the  ping latencies.  However, if I exclude the 9PM spike, there is still a  noticeable difference in the tails: 99th and 99.9th percentile latencies  for Web Socket requests are then 127 and 396 milliseconds respectively,  while the equivalent ping latencies are 104 and 113 milliseconds.
It's interesting that the best-of-2 technique does not perform well.   To the extent that the latency tail is caused by server CPU contention,  best-of-2 should be a big improvement.  It's unclear how much it would  help with network issues, and it definitely can't help with client  issues.  The poor performance suggests that client issues contribute  significantly to the latency tail.
And it's partly client load
Now, we've more or less controlled for network latency and server CPU.   What factor could explain the remaining difference between ping and Web  Socket latencies?  Perhaps it's something in the browser.  During the  preceding tests, I had at least a dozen Chrome tabs open, some to  complex sites such as Gmail.  I restarted Chrome with only one tab, open  to the page that executes the benchmark:
| Operation | # samples | 10th %ile | Median | Mean | 90th %ile | 99th %ile | 99.9th %ile | 
| JSONP (OS X Chrome N. Calif. -> EC2 US East) B | 1038 | 98.6 ms | 103.0 ms | 115.0 ms | 134.0 ms | 248.0 ms | 492.0 ms | 
| XMLHttpRequest (OS X Chrome N. Calif. -> EC2 US East) B | 1038 | 106.0 ms | 112.0 ms | 112.0 ms | 117.0 ms | 211.0 ms | 446.0 ms | 
| Websocket (OS X Chrome N. Calif. -> EC2 US East) B | 1037 | 89.5 ms | 93.0 ms | 94.8 ms | 96.6 ms | 100.0 ms | 118.0 ms | 
| Websocket best-of-2 (OS X Chrome N. Calif. -> EC2 US East) B | 1037 | 89.5 ms | 93.1 ms | 95.9 ms | 96.9 ms | 107.0 ms | 369.0 ms | 
These results are from a short run (only about an hour), at a different  time period than the others, and so should be taken with a grain of  salt.  However, the Websocket figures look really good, comparable to  raw ping times.  The HTTP-based connection techniques still show a  significant tail.  (It's worth noting that this might not be entirely  due to HTTP overhead; it could also reflect overhead in the browser  implementation of the respective connection techniques.  JSONP in  particular is a rather baroque approach and requires DOM manipulation  and dynamic compilation.)
A convenient outage -- score one for  best-of-two
During an earlier benchmark run (not reported  here), my "B" server went offline.  Amazon's explanation:
5:39 PM PDT We are investigating instance  connectivity in the US-EAST-1 region.
5:55 PM PDT A subset of instances in a  single Availability Zone became unavailable due to a localized power  distribution failure. We are in the process of restoring power now.
No doubt this was inconvenient for some people, but for me it provided a  handy test of the best-of-2 connection method.  Indeed, all of the  other "B" server benchmarks began failing, but the best-of-2 results  were unperturbed.
This event also provided an interesting peek  into Amazon's handling of outages.  The event began at around 5:15, at  least for my machine.  Amazon's service dashboard was first updated at  5:39, roughly 24 minutes later.  Hence, when I first detected the  problem, I had no information as to whether it was something I'd done,  an Amazon problem specific to my machine, or a more widespread Amazon  problem.  It would be nice if the AWS dashboard included  up-to-the-minute service health metrics based on automated monitoring.   It would also be nice if health information was propagated to the status  page for individual EC2 instances.  All the while that the machine was  offline, the dashboard page for that instance continued to indicate that  the instance was up and running.
One other quibble: the status  reports never actually indicate 
which Availability Zone was  impacted (it was us-east-1b).  Seems silly not to provide that  information.
Stress test
Just for kicks, I decided  to repeat the tests while saturating my DSL line (by uploading a bunch  of photos to Picasa).  The results are not pretty:
| Operation | # samples | 10th %ile | Median | Mean | 90th %ile | 99th %ile | 99.9th %ile | 
| JSONP (OS X Chrome N. Calif. -> EC2 US East) B | 613 | 101.0 ms | 419.0 ms | 559.0 ms | 1.35 sec | 1.75 sec | 2.35 sec | 
| XMLHttpRequest (OS X Chrome N. Calif.  -> EC2 US East) B | 612 | 110.0 ms | 388.0 ms | 543.0 ms | 1.35 sec | 1.69 sec | 5.03 sec | 
| Websocket (OS X Chrome N. Calif. ->  EC2 US East) B | 612 | 91.1 ms | 398.0 ms | 524.0 ms | 1.31 sec | 1.69 sec | 2.35 sec | 
| Websocket best-of-2 (OS X Chrome N.  Calif. -> EC2 US East) B | 612 | 91.2 ms | 372.0 ms | 524.0 ms | 1.46 sec | 1.69 sec | 2.14 sec | 
Here, all of the connection methods were equally bad.  (It would have  been interesting to run ping tests at the same time, to see whether TCP  congestion control was part of the problem, but I neglected do so.)
Conclusion:  good luck, you'll need it
With Web Sockets, a lightly loaded  client, a lightly loaded server, a lightly loaded DSL line, and  favorable winds, it's possible to get uniformly fast communication  between a browser and server.  Disturb any of those conditions, and a  latency tail rears up.  Where does that leave us?
In principle,  you can control server load.  Of the major browsers, so far only Chrome  implements Web Sockets as far as I know.  (Flash provides similar  functionality, which I may benchmark for a later post.)  Client load and  DSL load are entirely out of our control; the best one might manage is  to give feedback to the user.  And, while Google is 
rumored to have some influence over the  weather, when it comes to favorable winds the rest of us just have to  hope for the best.