Since I started working at bigger startups, like MinbilDinbil, I had to face the problem of providing pages to a larger amount of users every day. For this reason, it is really important to use a CDN service like CloudFlare, in order to provide a good user experience on our website: we don’t want our clients wait too much to load the homepage! How do we do that? The author of GNU grep once said:

The key to making programs fast is to make them do practically nothing

Mike Haertel, Sat Aug 21 2010. Source

So I decided to integrate our django servers with CloudFlare, improving the cache. According to their documentation, to enable the cache, the backend needs to provide specific headers in the HTTP response, like Cache-Control, Expires and others: by using these headers both the browsers and the CloudFlare servers will cache the file for a specific amount of time.

For this purpose, I created an open source customizable middleware to provide cache headers in a smart way. It is called django-smartcc (standing for “smart cache control”). Once installed, it will start considering not authenticated requests as public and disabling the cache when the user is logged in. You can read here the instructions to install it too:

pip install -U django-smartcc

I considered also that not all the URLs should be cached at the same way. The MinbilDinbil API calls are cached differently, as some other static files and pages are cached heavily, for example. For this reason the middleware is customizable with specific URL-based rules as described here:

    (r'/login/$', 'private', 0),
    (r'/api/search.json$', 'public', 300),
    (r'/downloads/contract.pdf', 'public', 900),

In this way the view at /downloads/contract.pdf will return these HTTP Headers:

$ curl -I

Providing these results:

HTTP/1.1 200 OK
Server: cloudflare-nginx
Date: Sun, 01 Feb 2015 19:34:08 GMT
Content-Type: application/pdf
Connection: keep-alive
Vary: User-Agent, Accept-Language, Cookie
Expires: Sun 01 Feb 2015 19:49:08 GMT
Cache-Control: private, max-age=900

This is a basic example that is saying to both the browser and CloudFlare CDN to cache the file contract.pdf for 15 minutes.

Conclusion: Results and more suggestions.

The only way to measure and understand if the cache headers are working properly, is to compare CloudFlare statistics about the bandwidth and requests saved. When we started using CloudFlare on MinbilDinbil, the default settings were saving only a small quantity of requests:

CloudFlare statistics

Then I enabled django-smartcc and configured it properly on HTML pages and files. In the same amount of days, we saved about one third of our bandwidth to our servers:

CloudFlare statistics when using django-smartcc to set automatically cache control HTTP headers.

Thanks to this little trick our user experience is a little bit improved, as the stability and speed: basically, some pages are not provided by our django servers, but instead are downloaded from the fast and distributed content delivery network from all around the world. In other terms to make our server fast, it is doing nothing, as Mike Haertel, was saying!

Remember that CloudFlare says that their servers are not caching the HTML pages by default, and in case you need this feature you can do it in the page rules… but for the most of the static content it works perfectly. For this reason is important to set these headers also on your static files, and probably you really want to do that if you are using Amazon S3.

This is not the only thing that changed on MinbilDinbil website, and we are still making a better service every day. We also had to improve and rewrite specific part of the code that was too slow or RAM-hungry, and we are not stopping doing that. For this reason my suggestion is to not consider cache as big solution to speed problems, but think about rewriting and improving the code when possible: otherwise the cache will be generated from a slow process anyway :-)