Thursday, January 6, 2011

Rate Limiting in 150 lines, or, Simplicity is Contagious

I recently had occasion to write a rate limiter -- a library that tracks the rate at which some operation is performed, so that excessive bursts of activity can be throttled. I was worried that it would be a big job, but it turned out to be quite simple, largely because I had some good tools to build on. That led to the topic for this post: how good building blocks enable simple designs.

Really, the title should be Complexity is Contagious. If you've ever worked on a large system, you've had the experience of needing a week to implement something that "should have been" trivial. Here's a fictitious, but plausible, example:

You work at a photo sharing service. You want to tweak the search results page to display an author name next to each photo. Simple enough: each photo record includes an author ID, so you just need to look up the ID in the users table and fetch the name column. But... your service supports three login mechanisms (your own native accounts, Facebook login, and Twitter login), so the author ID might refer to any of three different tables. One of those table has separate "first name" and "last name" fields, which you need to concatenate. Unless they're both blank, in which case you use the "e-mail address" field. But you need to check a settings record to see whether the e-mail address is public, and that's in a separate table.

Oh, and each lookup in the users table requires an RPC, so you need to batch the lookups, and run them asynchronously. Sometimes an RPC gets dropped, so you need a timeout. And sometimes a query matches thousands of photos -- you can't fetch all those user records without overloading the database server, so you have to implement a cap. (Only 50 photos are actually shown on the page, but that winnowing happens downstream from your code.)  And the cap needs to be dynamically configurable, so that you can tweak it according to system health.

Bleah. Suddenly, a week for this trivial feature starts to look optimistic.

The key point is that none of the complexity was intrinsic to the feature. It all arises from weaknesses in underlying services. The database abstraction could automatically implement batching and asynchrony, hiding them behind a Future. The same abstraction could provide timeouts and caps. A user-management library could present a uniform interface for the three login mechanisms. And so forth.

Of course, even with a strong foundation, if you choose a bad design for your feature you might still need a week to implement it. But at least a good design would have been possible. Clean infrastructure doesn't ensure simple designs, but it's an enabler. Hence the title of this post. In what follows, I'll outline the task I faced, my solution, and describe the building blocks that allowed the solution to remain simple.

Rate limiting

By rate limiting, I mean a mechanism that rejects operations which occur more frequently than a specified threshold. Rate limiting has many uses; for instance, to prevent a user from occupying an unreasonable share of some server's capacity.

A popular approach to rate limiting is the Leaky Bucket model. In brief, this model allows events to occur at a specified rate, with short-term bursts up to a specified threshold. For instance, "one request per second, with bursts of up to 10 requests". Here's a complete implementation:
public class RateLimiter {
  private final double fillRatePerMs;
  private final double maxBudget;
  private       double currentBudget;
  private       long   lastUpdateTime;
  public RateLimiter(double maxBudget, double fillRatePerMs) {
    this.fillRatePerMs  = fillRatePerMs;
    this.maxBudget      = maxBudget;
    this.currentBudget  = maxBudget;
    this.lastUpdateTime = System.currentTimeMillis();
   * Attempt to consume the specified amount of resources.  If the resources
   * are available, consume them and return true; otherwise, consume nothing
   * and return false.
  public boolean consume(double amount) {
    long msSinceLastUpdate = System.currentTimeMillis() - lastUpdateTime;
    currentBudget = Math.min(maxBudget,
        currentBudget + msSinceLastUpdate * fillRatePerMs);
    lastUpdateTime += msSinceLastUpdate;
    if (currentBudget >= amount) {
      currentBudget -= amount;
      return true;
    } else {
      return false;
That was easy enough, but it's only a start. There are a lot of practical considerations to be addressed:
  • Multiple buckets. Suppose you want to limit the request rate from each user. You need a bucket per user.
  • Sparse buckets. You have many users, but only a fraction are active at a time. Tracking inactive users would waste memory.
  • Multiple request types. Your server handles several different operations, and you need different rate limits for each.
  • Configuration. You need a way to specify the request rate and burst size for each request type.
  • Dynamic configuration: when dealing with servers under stress, you may need to adjust rate limits without restarting.
  • Distributed configuration: configuration settings may need to be distributed to multiple servers.
  • Special cases. Perhaps your rate limit is one request / user / second, but you have a few special accounts which represent internal systems, and they need a higher limit. Or perhaps a particular server is behaving oddly, and you need to tweak one of the limits for that server without affecting other servers.
  • Monitoring and introspection. To diagnose production problems, you need a way to peek into the state of the rate limiter.
This could easily turn into a major project. In the next section, I'll show how a careful design, and some good building blocks, kept the project simple -- a couple of hours, and about 150 lines of code, including all the details of configuration and introspection. (There are at least two common real-world requirements for rate limiting that I don't discuss here, because they weren't requirements for my application. One is aggregation across servers. The other is persistence -- preserving the state of the rate limiter across server restarts.)


I've previously implemented a library, DynamicConfig, for managing configuration parameters. Most usage goes through a single method:
  public static String getString(String name, String defaultValue);
If the configuration file has an entry with the specified name, this method returns its value. Otherwise, it returns defaultValue. This looks simple, but it hides a lot of functionality. If a configuration file is changed, running servers detect the change. The configuration may come from a local file, or a global repository implemented on Amazon S3. Or both -- there's an inheritance mechanism which allows per-server and/or global configuration files to be layered on top of one another.

Using DynamicConfig as a building block, I developed the following design for configuring rate limits. The entry point into the rate limiter looks like this:
  public static boolean consume(String key, double amount);
The "key" parameter identifies a bucket, and the result is true if the specified resources were available in that bucket. The key corresponds to the name of a configuration variable, which is expected to hold the rate limit specification. However, if no such configuration variable exists, we split the key at slash characters, and look for a configuration variable corresponding to a partial key. For instance, if the key was "foo/bar/baz", we'd look for a configuration variable "foo/bar/baz", then for "foo/bar", and finally for "foo".  The configuration variable gives us the parameters (maxBudget and fillRatePerMs) for the leaky bucket algorithm.

By incorporating a user identifier into the key, this gives us per-user buckets, as well as support for custom rate limits by user. Suppose you want to allow 1 request/second from any IP address (with bursts up to 10 requests), but 100 requests/second from (with bursts to 500). Your configuration file might look like this:
  rate_limit: "10, 1/sec"
  rate_limit/ "500, 100/sec"
For each request, you'd do the following:
  boolean acceptRequest = RateLimiter.consume("rate_limit/" + client_ip, 1.0);
Observe how simple this is. Our new keyed consume() method is quite simple -- it just needs a hashtable to track buckets, and a short loop to probe DynamicConfig with a series of names. This simplicity is enabled by the simplicity of the DynamicConfig API. Some things RateLimiter doesn't have to do:
  • Define a configuration schema
  • Know where to find configuration files
  • Care whether the configuration is local, global, or a hybrid
  • Manage a "connection" to the configuration service
  • Handle errors accessing the configuration service (these are captured, and reported, by DynamicConfig)
One thing we don't get completely for free is sparse buckets. For long-running servers, we'll need a background task that periodically sweeps the hashtable and discards "full" buckets (those for which currentBudget equals maxBudget). But that complexity, at least, is intrinsic to the problem we're addressing.


Another building block I used is called Datasets: a mechanism for exposing server state via a web UI. Any object implementing the following simple interface can be exposed:
public interface Introspectable {
   * Return a unique, stable identifier for this object.
  public String getId();
   * Return all introspectable fields for this object (possibly excluding
   * fields for which the object has no value).
  public Collection getFieldNames();
   * Return this object's value for the field with the specified name, or
   * null if the field has no value in this object.
  public Object get(String field);
In this API, a "field" is anything we want to expose in the introspection UI. For the rate limiter, each bucket is an introspectable object, and a reasonable set of fields is: bucket key, fill rate, current budget, maximum budget, time since last use, and budget fraction (ratio of current budget to maximum budget).

Built on top of this (plus another simple interface to iterate over the set of all objects) is a web UI that allows viewing, filtering, and sorting objects. So, for example, to identify users who are in danger of exceeding the rate limit for message sends, we could say (in the UI): "show all RateLimiter buckets whose key begins with 'message_send_rate_limit', where the budget fraction is less than 50%". Without this building block, introspection support for the rate limiter would be a major subproject in its own right.


The next time you find a simple-seeming task turning complex, ask yourself: where is the complexity coming from?  Is it intrinsic to the feature, a flaw in your design, or is it coming from your building blocks?  If the answer is "building blocks", consider investing time in improving the building blocks, rather than working around them.

The latest xkcd went live as I was writing this -- it seems a fitting place to wrap up: