Continuous integration testing for Django sites
At work we're busy trying to get a Django site out the door. This time around, we've been enjoying the modest time invested in setting up a Hudson continuous integration server - see Chris Shenton's presentation here for why you should and how quickly you can - and one of the areas we've really expanded was the use of automated testing. I've already described the test runner we're using but wanted to describe the overall process which we're using.
First, some background notes:
- Create a script which does all of the hard work and manage that as part of your project - some of the examples show people dumping 20+ line shell scripts into a Hudson config but if you're serious, this should be versioned like everything else. If you're careful, some of the setup tasks can even be shared with other scripts you use to setup new developers or create RPMs.
- Our process relies on virtualenv and pip. If you're not familiar with these, all you need to know in order to follow along is that virtualenv creates a virtual Python instance which allows us to keep this project separate from everything else and avoids the need for pip to have privileged access to install software.
Roughly in order, this is what our automated job does:
-
If our virtualenv doesn't exist or requirements.pip has changed since we initialized the virtualenv, remove it and recreate it. In Bash this is roughly:
if [ ! -d .virtualenv -o requirements.pip -nt .virtualenv ]; then rm -r .virtualenv; virtualenv .virtualenv; pip install -r requirements.pip --environment=.virtualenv --download-cache=.pip-download-cache; fi
One important note: using the download cache makes your installs a lot faster and avoids wasting other people's resources on the distribution servers. - To avoid issues with a failure leaving the database in an inconsistent state, we drop and recreate the database before every run and clear the Solr full-text search index.
-
Start Solr as a background task:
java -DSTOP.PORT=<arbitrary high port> -DSTOP.KEY=<arbitrary key> -jar start.jar django-admin.py syncdbdjango-admin.py loaddata clean_site(on our projects, we name fixturesclean_siterather thaninitial_datato avoid overwriting changes when syncdb runs)-
At this point, we're ready to actually run the tests, which we do using our custom test runner which runs our full Django test suite, saving the output and coverage.py's report to a directory which is available through a local Apache instance for convenience. This also generates coverage.py's XML report so the Hudson Cobertura plugin can generate pretty charts showing our progress over time.
We save the return code from the test suite (i.e.TEST_RC=$?) so we can report failures after running our cleanup code (see below) -
Assuming that the test suite ran correctly, we then launch some additional tests using Eric Holscher's excellent django-test-utils:
django-admin.py crawlurls -v0 > logdir/crawler.log
This also allows us to collect some basic performance numbers - I want to start visualizing per-page performance using something like dygraphs but we haven't had time to set that up yet. -
Shutdown Solr:
java -DSTOP.PORT=<arbitrary high port> -DSTOP.KEY=<arbitrary key> -jar start.jar --stop - Exit with the value returned by the Django test suite
That might sound like a lot of work but on our test system it currently takes well under 5 minutes. In addition to helping us stay on top of test coverage it's been really helpful for flushing out obsolete fixture data (i.e. crawurls will show 404 links) and has alerted us to several upstream version changes - we use pip freeze to track version numbers so we've found out quickly when the version of something we're using has been removed from PyPI. Most importantly, we know that our install instructions actually work because we're testing them on a regular basis - when something changes, it breaks quickly and is linked directly to a commit, making it easy to update the instructions and the deployment script - and when the time comes to put the code into production there's no question that the script is accurate because it's been run hundreds of times.
Quickly testing your sites using webtoolbox
As of a few minutes ago, this site is running the bleeding-edge django-mingus. A fair number of things changed since the last release and it's handy to be able to exercise the entire site quickly to make sure everything's working correctly through the entire stack from Webfaction's front-end proxy down to the actual django application. This provided a good excuse to plug one of the newest utilities in my webtoolbox:
check_site is a simple spider, based on an easily-extensible Spider class, which will walk an entire site and report any errors you find. The entire process would look something like this, assuming that you have virtualenv, virtualenvwrapper and pip available:
chris@Saturn:~/Development/webtoolbox $ git clone http://github.com/acdha/webtoolbox.git Initialized empty Git repository in /private/tmp/webtoolbox/.git/ chris@Saturn:~/Development/webtoolbox $ mkvirtualenv webtoolbox New python executable in webtoolbox/bin/python Installing setuptools............done. (webtoolbox)chris@Saturn:~/Development/webtoolbox cd webtoolbox/ (webtoolbox)chris@Saturn:~/Development/webtoolbox [git master] $ add2virtualenv . (webtoolbox)chris@Saturn:~/Development/webtoolbox [git master] $ pip install -r requirements.pip … time passes … (webtoolbox)chris@Saturn:~/Development/webtoolbox [git master] $ ./bin/check_site.py http://chris.improbable.org/ --max-connections=2 [QASpider] [WARNING]: http://chris.improbable.org/2008/07/12/iphone-os-20-the-good-bad-and-very-ugly/: stripped 1 non-printable control characters [QASpider] [WARNING]: http://chris.improbable.org/2009/02/3/in-which-the-gop-surrenders-any-pretense-of/: stripped 3 non-printable control characters [QASpider] [WARNING]: http://chris.improbable.org/2008/04/17/dinosaur-meet-tar-pit/: stripped 1 non-printable control characters [QASpider] [WARNING]: http://chris.improbable.org/2007/10/19/textmate-and-php-automatic-syntax-checking-when/: stripped 4 non-printable control characters [QASpider] [WARNING]: http://chris.improbable.org/2007/07/4/efficiency/: stripped 2 non-printable control characters [QASpider] [WARNING]: http://chris.improbable.org/2007/07/18/in-praise-of-simple-solutions/: stripped 4 non-printable control characters Site Report for chris.improbable.org Retrieved 271 URLs in 28.31 seconds with 0 errors
That's pretty easy and HTML validation is also available. If you need to add custom checks, the core spider is pretty simple and can easily be extended to add whatever custom logic you might want. In the meantime, it looks like I have to clean some control-codes which I imported from the old legacy PHP code which used to run this site…
Improbable.org is now opensource
This is only of interest to fellow web nerds but as of a few minutes and a quick git filter-branch the source code for this site is available under the MIT license. If you want to play with it, head over to http://github.com/acdha/improbable.org and fork it.
A few billion lines of code later…
Parsing is considered a solved problem. Unfortunately, this view is naïve, rooted in the widely believed myth that programming languages exist.
— Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, Dawson Engler
Django site test coverage
At work we're using Hudson for continuous integration on our Django projects. Every time someone checks a commit in to SVN, hudson runs our entire test suite in a virtualenv and reports failures as well as generating various test reports.
Overall it's been a win but there are some rough edges:
- The Django test runner's environment is enough different from the normal one that it can break various things we use: the dynamic model extension FeinCMS performs can break if the test runner initializes apps in a different order (some quality test debugging time), South database migrations, and some of the built-in caching middleware (see #5176).
- We don't want to test every application installed - just our site and the other code which we developed. One convenience you'll want to customize in the code below is the logic which includes apps with a common prefix along with the site's app if you didn't specify the apps on the command-line. We also include a couple of opensource apps which are maintained by people on our team since there's a very clear path for error reporting failures: shouting across the room.
- We want to use Ned Batchelder's awesome coverage.py. Unfortunately, we can't load it in the stock test runner because things like our models have already been processed by the time a management command runs and we'd like those to be included in our test reports.
The solution was to write a custom test runner which works like the standard Django command but customizes the environment: http://gist.github.com/288810
Reading for the modern web
JavaScript the Good Parts is a classic book, if you're into dead trees and want to learn why some people love JavaScript. You can get a lot of the philosophy of modern JavaScript wizardry for free from Douglas Crockford (YUI) and John Resig (jQuery) and there's a ton of good stuff covering everything from philosophy to in-the-trenches coding available from YUI Theater.
DailyJS.com is a good resource for staying abrest of what's going on (not a huge fan of ajaxian.com's style); the 24ways.org web advent calendar was a good survey of where modern web development's going.
Finally, Google's closure library has some interesting tools (Closure Compiler is great) and the articles explaining their philosophy have a lot to learn from even if you don't use the Closure library; there's also a lot of modern performance wisdom summarized in Let's make the web faster, some directly JavaScript related and most of the rest of interest to any front-end developer.

