Case Study: SitePoint.com
Posted: 03.10.04Before you resort to the drastic and usually unnecessary measures which closed Part III, we conclude with a brief example of how the low cost ideas we have presented can improve page load times, bandwidth usage, and server load for a real site, in this case looking at the home page of SitePoint, who originally sponsored this article:
Applied Acceleration
Let's grab the SitePoint homepage using Internet Explorer 6.0. When we request SitePoint.com for the first time, we see that the homepage is actually made up of about 37 distinct objects. But we also notice several interesting things right away that SitePoint's developers are doing to accelerate their home page:
1. On the main or containing part of the page, the HTML content is compressed using HTTP content encoding (specifically gzip). Without gzip, the original file size would be 28,218 bytes. With it, the transferred size is only 7,774 bytes - a savings of about 72 percent. SitePoint's dial-up users will definitely thank them for that. A closer look at the response headers tells us that this same page is also sent using chunked transfer-encoding, which can help mitigate the time to first byte penalty associated with HTTP compression. This means that even broadband users might experience a faster page load.
2. Something else we learn from the response headers is that, while the home page is dynamically built using PHP, the developers have minimized the performance impact of this by using a third-party tool that pre-interprets the PHP script and caches the interpreted instructions in memory. (The giveaway is the response header X-Accelerated-By: PHPA/1.3.3r2.) With a script cache like this one, the page is still dynamic (in the sense that its HTML output is not cached), but the server-side overhead of loading up the script and interpreting it into executable instructions every time it is requested is avoided - a nice compromise between full pre-generation and a purely dynamic page.
Is there anything else that SitePoint could do, on the cheap, to optimize things even more? There are several possibilities, which we have summarized in the following table:
This table shows the effects of file size reduction in two stages - first the application of source code optimization (the byte count in the "Optimized" column), and then the application of HTTP compression (the byte count in the "Gzipped" column). We've given the compressed (gzipped) size for the main page (index.php) just as we got it in IE, but we haven't applied source code optimization to it, since it is dynamic. In addition, however, there are eight static external text files (5 CSS and 3 JavaScript) that are not being compressed or code optimized at all, but that could definitely benefit from both. We've shown the results here, and as you can see in the percentage saved columns, the resulting overall savings are substantial - a 13 percent savings for the whole home page (all dependencies included) just with source code optimization, and a 50 percent savings with source code optimization plus HTTP compression. These percentages could go even a little higher if the index.php were source code- optimized as well (which would probably yield an additional 1000 bytes or so).
The last two columns focus on the topic of the second part in the article: cache control. Expiration-based cache control isn't being used on the SitePoint.com home page except for a few externally hosted or highly specialized resources. The "Reval?" column shows which of the files, as a consequence, need to be revalidated upon a return visit, even though they are present in the browser's cache. SitePoint could consider making much wider use of explicit expiration times, especially for the plethora of relatively invariant images that make up the home page, as well as for the static external text files that contain CSS and JavaScript. Authoring and implementing a good set of cache control policies for such objects would help to avoid some or all of the 31 separate client-server round trips required to revalidate already cached objects when a user returns to the home page in a new browser session. That is 31 fewer 304 Not Modified messages in the server logs for every return page view, but more importantly, a reduction in return visitor page load times. The last column shows that these revalidation round trips take between one and seven tenths of a second each. Even though a number of them are going on simultaneously when a user returns to the home page (and taking up a bunch of TCP resources on the server-side in the process!), that is still a substantial wait compared to essentially instantaneous serving of those same objects out the browser's cache - which is where they wound up coming from in any case.
Conclusion
Our simple case study clearly demonstrates that, when used together, the techniques presented in this article can significantly improve the server load, bandwidth usage, and page load times of almost any Web site. All are relatively simple to implement and none of them require large expenditures on new hardware or software. Granted, no acceleration technique is without its downsides - code optimization requires a time investment from developers, caching requires increased coordination between development and administration, and compression can tax server resources in order to save bandwidth. In conclusion, we hope we have provided you with a better understanding of the principles behind Web site performance and a number of specific enhancement techniques that will help you to carefully put your foot on the gas.
Originally published on sitepoint.com, Published: March 10, 2004.