Stupid caching tricks #804
Suppose you have a number of endpoints in your app, such as /user/private_data and /user/secret_stuff, where logged-in users see information that’s only meant for them. Suppose further that you have a number of other endpoints like /user/1/quotes and /user/1/book_collection that are accessible by all users of the app, logged-in or otherwise. You want to cache the responses to all of these endpoints, but you have to be careful not to serve one user some other user’s cached private data. You also don’t want to do all of your caching in each controller action, because there are a lot of them, and we like our applications to be DRY. Going a step further, wouldn’t it be nice to support Etags so that if the document is in the cache, we can consider it unmodified and just return an HTTP 304 (Not Modified) instead of a full response? Even if it is a memcache response, it’ll be even faster to not pull it out of memcache and spend the bandwidth returning it when we can just tell the browser “It’s the same as it was the last time you saw it”. My approach, inspired by Michael Koziarski:
around_filter :cache_for_user, :only => [:private_data, :secret_stuff]
around_filter :cache_public, :only => [:quotes, :book_collection]
# Cache something that's specfic to a logged-in user
def cache_for_user
cache_response(request.request_uri, @user.auth_token) { yield }
end
# Cache something for the general public
def cache_public
cache_response(request.request_uri) { yield }
end
private
# Cache a response, using the passed-in values to create
# the key.
def cache_response(*keys)
# If it's still in the cache, return a 304.
# NEVER let a user retrieve something by Etag!!!
if etag = request.env['HTTP_IF_NONE_MATCH']
if CACHE.exists?(etag)
headers["X-Cache"] = "HTTP"
head :not_modified
return
end
end
# Set the etag for next time.
response.headers["ETag"] = etag
# Check memcache
key = keys * ':'
etag = MD5.hexdigest(key)
if data = CACHE.get(etag)
# render from the cached values
headers["Content-Type"] = data[:content_type]
headers["X-Cache"] = "HIT"
render :text=>data[:content], :status=>data[:status]
else
# Yield, note the cache miss, then cache the response
headers["X-Cache"] = "MISS"
yield
CACHE.set(etag, {:content=>response.body, :status=>headers["Status"].to_i, :content_type=> (response.content_type || "text/html")})
end
end
Note the stern warning on line 20. Although we’re using Etags as keys into our memcache store, don’t ever retrieve something from memcache that you looked up by an Etag header in an HTTP request. This would allow users to retrieve one another’s data by learning the Etags used to validate it. Only use the Etag in the request to see if the document is in the cache, and return a 304 if it is. Our generated Etags, however, are specific to the logged-in user, and should be safe so long as your security model is sound in general.
You might also have spotted an unfamiliar method - on line 20 I’m calling exists? on a MemCached object from the memcache-client Ruby gem. But out of the box, that class defines no such method. I put this in my memcached initializer (the same one that creates the CACHE constant in the first place):
# Attempt to add an object at the given key.
# Just the number 0, expiring in the past, (-1),
# and without marshaling (false). Memcached
# returns "NOT_STORED" if there's already
# something in that slot.
class MemCache
def exists?(key)
CACHE.add(key, 0, -1, false) =~ /NOT_STORED/
end
end
Credit goes to David McCormick for that idea.