Thursday, April 8, 2010

Beginnings of a robust data collector

Since my last posting, I've completely rewritten the data collection system.  The first implementation was a quick hack, serving primarily to get my feet wet. The new version is built on more robust lines. I'll spare you the details, but briefly:
  • Data is summarized into hourly histogram buckets, enabling efficient reporting over long time periods. (Eventually I'll add coarser buckets, to support very long time periods.)
  • There are the beginnings of a reporting engine.
  • It's easy to add microbenchmarks.
A rudimentary dashboard can be seen at (updated -- the original post had a clumsier URL, which no longer functions).  It shows a latency summary for each microbenchmark, and a link per benchmark to a complete histogram.  It's been collecting data for a few hours now, sampling each operation every 10 seconds.  Here are the operations I'm benchmarking at the moment:

  • {read, write} a randomly selected entry from an int[] of size {16K, 16MB, 256MB}.  One thing I hope to probe here is lifetime of data in the processor cache.  If performance varies over time, that may suggest cache pollution from other VMs sharing a physical machine with us.  (On reflection, the parameters I'm using probably need to be tweaked.  Microbenchmarking is of course tricky, and I'm not an expert.  RAM benchmarks might form a topic for a later post.)
  • {read, write} 4K bytes from an int[] of size 16KB.
  • Invoke Math.sin() one million times.  (This was the "CPU" test from the original prototype.)
  • A simple multiply-and-add loop.
  • Read one small entry from a SimpleDB database, with or without consistency.  (Same as original prototype.)
  • Write one small entry to a SimpleDB database.  (Again, same as original prototype.)
Here's a snapshot as of this writing (all times in milliseconds):

Operation# samplesMin10th %ileMedianMean90th %ile99th %ile99.9th %ileMax
Read 4 bytes from a 16KB buffer (10,000,000 times)605021.954.95676.1126.9667.5707
Write 4 bytes to a 16KB buffer (10,000,000 times)6050205351.769.2115.5414.5423.3
Read 4K bytes from a 16KB buffer (100,000 times)6050479594.5823.61589.91985.520952032.8
Write 4K bytes to a 16KB buffer (100,000 times)6050260317.3449.6843.51877.32003.12005.6
Read 4 bytes from a 16MB buffer (1,000,000 times)605025.964.762.584.6169311.4311.7
Write 4 bytes to a 16MB buffer (1,000,000 times)605059.1123.3119.5152.3398.21182.61245.8
Read 4 bytes from a 256MB buffer (1,000,000 times)605032.379.380.599.2271.31075.11051.8
Write 4 bytes to a 256MB buffer (1,000,000 times)605093135.9149.9164.8774.215741562.6
1000000 repetitions of Math.sin6050341395.8597.31143.819542011.21987.9
10000000 repetitions of integer multiplication60501122.826.444.578.6376.8365.6
Read (inconsistent)605021.636.24371.9112.8376.8367.8
Read (consistent)605022.937.545.173.5134.5606.9622.4

[Update: removed broken links from the table above.  If you click on the dashboard link above, you'll see a table similar to this one, but with histogram links included.]

I'll wait for more data, and a better reporting tool (in particular, the ability to graph changes over time), before discussing these results.  I plan to add the following microbenchmarks in the near future:
  • Disk performance: {read, write} {small, large} blocks of data at random offsets in a file.  A very large file tests cache-miss performance; a smaller file could test cross-VM cache pollution.
  • Network: ping another EC2 instance with {small, large} requests.
  • Simple tests of Amazon's RDS (hosted MySQL) service, similar to the SimpleDB tests.
  • AppEngine tests -- as many of the AWS tests as are applicable.  (Local-disk tests are not applicable under AppEngine.  A form of network test is possible, but it would not be directly comparable to the EC2 test, as I don't believe AppEngine supports socket-level network access.)
  • Tests for AppEngine's memcache service.
Suggestions for additional microbenchmarks are welcomed.

No comments:

Post a Comment