How we Improved the Performance of the Imaginea Website

This post explores how we improved the performance of the www.imaginea.com website by leveraging various parameters including caching and a content delivery network. We ran tests to measure the performance from different global locations, different browsers and different internet speeds via the www.webpagetest.org website.

The following table compares the results before and after the optimization. The table also shows the nearest edge location for the CDN from which the content is being delivered to the browser. Except for the HTML, which was served via WordPress/Apache, everything else was served from a CDN after the improvements.

Test Location Load Time before changes Load time after changes    Edge Location
California 12.928s 7.603s Los Angeles
New York 4.436s 3.331s New York
London 11.169s 4.841s London
Japan 13.780s 5.934s Tokyo
Australia 15.058s 7.022s Sydney

 

After identifying the ratings that were not “A”, we set out to attack them one by one. This is a WordPress based website hosted on AWS (Amazon Web Services).

Optimizations

Reducing Time To First Byte

This measures the duration from the client making an HTTP request to the first byte of the page being received by the client’s browser. This includes DNS lookups, Network Latency, Time elapsed in backend querying and generating dynamic html and receiving.

We optimized this by with the help of the W3-Total-Cache plugin (for WordPress). When there is a request to access your site, WordPress runs PHP scripts and MySQL queries to the database to find the requested page. Then PHP parses the data and generates a page. This process takes server resources. Having the page caching turned on, allows you to skip all that server load and show a cached copy of the page when a user requests it.

The page cache option within the W3-Total-Cache settings serves the purpose.

Note:

Make sure that other caching options like Database Caching and Object Caching are not checked. By enabling Object caching you are increasing number of writes to the disk. This gets a bit tedious when you are running a busy and large site.

Enable Compression

Large pages are bulky and slow to download. The best way to speed their load time is to compress them. Compression reduces the bandwidth of your pages, thereby reducing HTTP response. You do this with a Apache mod_deflate module.

 

Leverage Browser Caching

When you visit a website, the elements on the page you visit are stored on your hard drive in a cache, or temporary storage, so the next time you visit the site, your browser can load the page without having to send another HTTP request to the server. This caching can be fine tuned with various parameters.

  1. You could setup browser caching from W3 Total Cache plugin

                                                          OR

2.  These can be set on the server side. Using HTTP headers like Expires header, Cache-control headers, Etag

3. Cache expiration timings can be set in Cloudfront Distribution settings as well. You could specify Minimum and Maximum TTL in Cloudfront distribution settings to let cloudfront know how long it is supposed to cache the static content. Caching on Cloudfront minimizes the requests to the server because you are telling Cloudfront not to contact the origin until the minimum TTL expires.

Purging the Cloudfront cache whenever there is a change in static content is done through invalidation. Invalidations are discussed later in the article.

Use a CDN (Content Delivery Network)

Essentially, a CDN, or content delivery network, takes all the static files you’ve got on your site (CSS, Javascript, images and font files) and lets visitors download them as fast as possible by serving the files from servers close to them. CDNs usually have a world wide network of servers where the static content is replicated and from which they serve the content. This not only eliminates the distance that the content travels, but also reduces the number of hops a data packet must make. The result is less packet loss, optimized bandwidth and faster performance which minimizes time-outs and latency.

The process of bouncing through a CDN is nearly transparent to the user. The only way a user would know if a CDN has been accessed is he checks the browser request/responses using a tool like Firebug.

The CDN will also communicate with the originating server to deliver any content that has not been previously cached.

We configured CloudFront for our use case. Since the W3 total cache already supported multiple CDNs including CloudFront, we leveraged that.

We also had to allow cross origin requests by adding a directive in our virtual host configuration. CloudFront will forward these headers while serving the content. This is required to access the font files in the website. Though not a part of the performance improvement, this arose as the domain of the font files changed from www.imaginea.com to cdn.imaginea.com.

Cache Invalidation on Cloudfront

Why do we need cache invalidation?

Let’s say you want to change the image or your client wants to have the image replaced with a different one or completely take it off. You would have to tell Cloudfront to remove the image or the content from its cache. By default, each object automatically expires after 24 hours. Cloudfront will make a request to get content from your origin when Minimum TTL/Maximum TTL/Default TTL time expires.

Invalidation can be triggered in 2 different ways.

1) Manually invalidate through AWS Console

2) Invoking an API

The challenge we faced was that changes to the website were done by different teams who may not be able to purge the cache, so we had to ensure that it was done automatically. We had to detect when changes were made to the static content in the web server.

We achieved automatic invalidation of Cloudfront cache using Boto (Python module for AWS) and inotify.

Using inotify the directories which contain the static content the website are monitored for changes and that triggers a cache invalidation python script whenever the content of these directories are modified. The options used for inotify are MODIFY,MOVE,CREATE. You should bear those extra invalidation API calls as Modify and Move options will trigger cache invalidation whenever someone renames or modifies properties of the files within these folders.

For boto to work, you could store your AWS credentials in $HOME/.boto or pass them as positional parameters to your script. The invalidation script contains the inotify command which would constantly monitor your static content directory and trigger the python script to cause invalidation (Make sure your script is running in the background so that python script gets triggered when inotify triggers an event).

It is a good idea to track changes by redirecting output/errors of inotify events and invalidation requests to a log file through “ts” command which adds a timestamp to the output.

Note: To install inotify you would need inotify-tools and for ts moreutils has to be installed.