Feb 08

Post-snowpocalyptic-DC

I've been posting post-storm pictures on Flickr. Some favorites:

  • Washington Monument
  • 14th Street Heights
  • Church of Christ
  • Note to cyclists: these guys are more hardcore than you
  • Frozen Fountain
Feb 06

Continuous integration testing for Django sites

How we're using Hudson to check our projects at work

At work we're busy trying to get a Django site out the door. This time around, we've been enjoying the modest time invested in setting up a Hudson continuous integration server - see Chris Shenton's presentation here for why you should and how quickly you can - and one of the areas we've really expanded was the use of automated testing. I've already described the test runner we're using but wanted to describe the overall process which we're using.

First, some background notes:

  • Create a script which does all of the hard work and manage that as part of your project - some of the examples show people dumping 20+ line shell scripts into a Hudson config but if you're serious, this should be versioned like everything else. If you're careful, some of the setup tasks can even be shared with other scripts you use to setup new developers or create RPMs.
  • Our process relies on virtualenv and pip. If you're not familiar with these, all you need to know in order to follow along is that virtualenv creates a virtual Python instance which allows us to keep this project separate from everything else and avoids the need for pip to have privileged access to install software.

Roughly in order, this is what our automated job does:

  1. If our virtualenv doesn't exist or requirements.pip has changed since we initialized the virtualenv, remove it and recreate it. In Bash this is roughly:
    if [ ! -d .virtualenv -o requirements.pip -nt .virtualenv ]; then 
      rm -r .virtualenv;
      virtualenv .virtualenv;
      pip install -r requirements.pip --environment=.virtualenv --download-cache=.pip-download-cache;
    fi
    
    One important note: using the download cache makes your installs a lot faster and avoids wasting other people's resources on the distribution servers.
  2. To avoid issues with a failure leaving the database in an inconsistent state, we drop and recreate the database before every run and clear the Solr full-text search index.
  3. Start Solr as a background task:
    java -DSTOP.PORT=<arbitrary high port> -DSTOP.KEY=<arbitrary key> -jar start.jar
  4. django-admin.py syncdb
  5. django-admin.py loaddata clean_site (on our projects, we name fixtures clean_site rather than initial_data to avoid overwriting changes when syncdb runs)
  6. At this point, we're ready to actually run the tests, which we do using our custom test runner which runs our full Django test suite, saving the output and coverage.py's report to a directory which is available through a local Apache instance for convenience. This also generates coverage.py's XML report so the Hudson Cobertura plugin can generate pretty charts showing our progress over time.
    We save the return code from the test suite (i.e. TEST_RC=$?) so we can report failures after running our cleanup code (see below) 
  7. Assuming that the test suite ran correctly, we then launch some additional tests using Eric Holscher's excellent django-test-utils:
    django-admin.py crawlurls -v0 > logdir/crawler.log
    This also allows us to collect some basic performance numbers - I want to start visualizing per-page performance using something like dygraphs but we haven't had time to set that up yet.
  8. Shutdown Solr:
    java -DSTOP.PORT=<arbitrary high port> -DSTOP.KEY=<arbitrary key> -jar start.jar --stop
  9. Exit with the value returned by the Django test suite

That might sound like a lot of work but on our test system it currently takes well under 5 minutes. In addition to helping us stay on top of test coverage it's been really helpful for flushing out obsolete fixture data (i.e. crawurls will show 404 links) and has alerted us to several upstream version changes - we use pip freeze to track version numbers so we've found out quickly when the version of something we're using has been removed from PyPI. Most importantly, we know that our install instructions actually work because we're testing them on a regular basis - when something changes, it breaks quickly and is linked directly to a commit, making it easy to update the instructions and the deployment script - and when the time comes to put the code into production there's no question that the script is accurate because it's been run hundreds of times.

Jan 30

Quickly testing your sites using webtoolbox

Quickly testing websites using the check_site spider

As of a few minutes ago, this site is running the bleeding-edge django-mingus. A fair number of things changed since the last release and it's handy to be able to exercise the entire site quickly to make sure everything's working correctly through the entire stack from Webfaction's front-end proxy down to the actual django application. This provided a good excuse to plug one of the newest utilities in my webtoolbox:

check_site is a simple spider, based on an easily-extensible Spider class, which will walk an entire site and report any errors you find. The entire process would look something like this, assuming that you have virtualenv, virtualenvwrapper and pip available:

chris@Saturn:~/Development/webtoolbox $ git clone http://github.com/acdha/webtoolbox.git
Initialized empty Git repository in /private/tmp/webtoolbox/.git/
chris@Saturn:~/Development/webtoolbox $ mkvirtualenv webtoolbox
New python executable in webtoolbox/bin/python
Installing setuptools............done.
(webtoolbox)chris@Saturn:~/Development/webtoolbox cd webtoolbox/
(webtoolbox)chris@Saturn:~/Development/webtoolbox [git master] $ add2virtualenv .
(webtoolbox)chris@Saturn:~/Development/webtoolbox [git master] $ pip install -r requirements.pip 
… time passes …
(webtoolbox)chris@Saturn:~/Development/webtoolbox [git master] $ ./bin/check_site.py http://chris.improbable.org/ --max-connections=2
[QASpider] [WARNING]: http://chris.improbable.org/2008/07/12/iphone-os-20-the-good-bad-and-very-ugly/: stripped 1 non-printable control characters
[QASpider] [WARNING]: http://chris.improbable.org/2009/02/3/in-which-the-gop-surrenders-any-pretense-of/: stripped 3 non-printable control characters
[QASpider] [WARNING]: http://chris.improbable.org/2008/04/17/dinosaur-meet-tar-pit/: stripped 1 non-printable control characters
[QASpider] [WARNING]: http://chris.improbable.org/2007/10/19/textmate-and-php-automatic-syntax-checking-when/: stripped 4 non-printable control characters
[QASpider] [WARNING]: http://chris.improbable.org/2007/07/4/efficiency/: stripped 2 non-printable control characters
[QASpider] [WARNING]: http://chris.improbable.org/2007/07/18/in-praise-of-simple-solutions/: stripped 4 non-printable control characters
Site Report for chris.improbable.org
Retrieved 271 URLs in 28.31 seconds with 0 errors

That's pretty easy and HTML validation is also available. If you need to add custom checks,  the core spider is pretty simple and can easily be extended to add whatever custom logic you might want. In the meantime, it looks like I have to clean some control-codes which I imported from the old legacy PHP code which used to run this site…

 

Jan 30

Improbable.org is now opensource

This site's source code is now open and on github.com

This is only of interest to fellow web nerds but as of a few minutes and a quick git filter-branch the source code for this site is available under the MIT license. If you want to play with it, head over to http://github.com/acdha/improbable.org and fork it.

    

Jan 28

Django site test coverage

A custom Django test runner with support for coverage.py and graceful handling for app selection and various testing gotchas

At work we're using Hudson for continuous integration on our Django projects. Every time someone checks a commit in to SVN, hudson runs our entire test suite in a virtualenv and reports failures as well as generating various test reports.

Overall it's been a win but there are some rough edges:

  1. The Django test runner's environment is enough different from the normal one that it can break various things we use: the dynamic model extension FeinCMS performs can break if the test runner initializes apps in a different order (some quality test debugging time), South database migrations, and some of the built-in caching middleware (see #5176).
  2. We don't want to test every application installed - just our site and the other code which we developed. One convenience you'll want to customize in the code below is the logic which includes apps with a common prefix along with the site's app if you didn't specify the apps on the command-line. We also include a couple of opensource apps which are maintained by people on our team since there's a very clear path for error reporting failures: shouting across the room.
  3. We want to use Ned Batchelder's awesome coverage.py. Unfortunately, we can't load it in the stock test runner because things like our models have already been processed by the time a management command runs and we'd like those to be included in our test reports.

The solution was to write a custom test runner which works like the standard Django command but customizes the environment: http://gist.github.com/288810

Jan 28

Reading for the modern web

A coworker recently asked for some JavaScript pointers and I figured I'd pass them along to everyone else

JavaScript the Good Parts is a classic book, if you're into dead trees and want to learn why some people love JavaScript. You can get a lot of the philosophy of modern JavaScript wizardry for free from Douglas Crockford (YUI) and John Resig (jQuery) and there's a ton of good stuff covering everything from philosophy to in-the-trenches coding available from YUI Theater.

DailyJS.com is a good resource for staying abrest of what's going on (not a huge fan of ajaxian.com's style); the 24ways.org web advent calendar was a good survey of where modern web development's going.

Finally, Google's closure library has some interesting tools (Closure Compiler is great) and the articles explaining their philosophy have a lot to learn from even if you don't use the Closure library; there's also a lot of modern performance wisdom summarized in Let's make the web faster, some directly JavaScript related and most of the rest of interest to any front-end developer.