Jul 23

Easy JavaScript error collection with Arecibo

How to monitor your visitors' JavaScript errors in less than 15 minutes

If you work on sites with complex JavaScript you've probably wanted a way to know about the errors reported by users' browsers: even with a rigorous test process it's likely that there's some permutation of browser version and settings which you don't test, particularly when you consider external factors like JavaScript from third-party sources or the many ways in which anti-virus software, corporate proxies and policies can interact in darkly obscure ways.

There's now a really easy way to collect JavaScript errors thanks to Andy McKay and Clearwind Consulting: Arecibo. It's available as a commercial service for people who need support but it's also a completely open-source project on Github. Recently I've been working on an improved JavaScript client which has now merged into the official codebase, making it really easy to setup a personal error aggregation service up for all of your projects:

  1. Set up an Arecibo service on AppEngine following the installation guide
  2. Add this JavaScript fragment to your HTML templates:
    <script src="http://your-arecibo.appspot.com/lib/error.js" type="text/javascript" charset="utf-8"></script>
    <script type="text/javascript" charset="utf-8">
        arecibo.account = 'YOUR PUBLIC API KEY';
        arecibo.registerGlobalHandler();
    </script>
            

    It's often desirable to defer loading Arecibo until after the rest of your page has displayed which can be done using something like this example with jQuery:

    <script type="text/javascript" charset="utf-8">
        jQuery(function($) {
            $.getScript("http://your-arecibo.appspot.com/lib/error.js", function () {
                arecibo.account = 'YOUR PUBLIC API KEY';
                arecibo.registerGlobalHandler();
            });
        });
    </script>
            

The reporting interface looks like this: Screenshot of the arecibo reporting interface

There are two general caveats here: this service can't collect data when JavaScript is completely disabled or when the problem is caused by internet connectivity issues. Unfortunately browser error handling is also not standardized and WebKit browsers like Safari and Chrome currently don't have a way to capture unhandled exceptions; similarly, attempts to collect detailed stack traces varies from browser to browser so you'll find richer error reports from Firefox than Internet Explorer but in most cases simply getting the report is enough to start working on a fix or at least a more exhaustive test.

Jul 15

How to work in Git and push changes to Mercurial using hg-git

I often need to work with both Git and Mercurial repositories. I've previously used hg-git to work in Hg and push changes to Git but have found Mercurial to be less comfortable than Git (no flames, please: this is the newer vi-Emacs debate. Use the one you like) and was hoping to work in the opposite manner: local changes in Git pushed to an Hg repo on BitBucket or Google Code.

Travis Cline posted some instructions for working in Git and pushing changes to Hg which I've updated with a few stylistic tweaks:

  • Install hg-git (e.g. ``pip install hg-git``)
  • Make sure you've enabled the Hg bookmark extension in your ``.hgrc``
  • Add this to your .hgrc::
        [git]
        intree=1
    
  • Clone your Mercurial repo::
        $ hg clone https://acdha@bitbucket.org/ned/coveragepy
    
  • Change into the repo::
        $ cd coveragepy
    
  • Create a local bookmark tracking your Mercurial default branch - this is what will be exported to Git::
       
        $ hg bookmark hg/default -r default
    
  • Export to the git repo::
        $ hg gexport
    
  • Configure Hg to ignore the Git repo::
        $ echo ".git" >> .hg/hgignore
    
  • Configure Git to ignore the Hg repo::
        $ echo ".hg*" >> .git/info/exclude
    
  • Configure Git to ignore the same things as Mercurial::
        $ git config core.excludesfile `pwd`/.hg/hgignore
    
  • Have your master branch track the exported Hg default branch::
        $ git branch --track hg/default master
        $ git reset --hard
    
  • Do stuff in Git and make commits
  • Export your changes to Hg::
        $ hg gimport
        
    
  • Push them out to the world::
        $ hg push
    
Mar 29

Why should we fear a national ID card?

I see the usual concerns about the government collecting data about citizens are making the rounds again, with a recent Wired post about national ID cards and a concerned editorial in the CS Monitor about Census data. This sort of thing comes up regularly in both the generally left-leaning privacy circles and the right-leaning small-government crowd and I find it interesting that the discussion almost inevitably revolves around strategies for preventing the government from collecting data.

What I find surprising is that the discussion always seems to be based on the misunderstanding that this is a future threat, as opposed to something which has been routine for decades. I'm not referring to just SSNs, although they're closed enough to a national ID to make these discussions obsolete, but simply the fact that there are already many large databases containing various bits of personal data and since the invention of the digital computer (or at least the relational database) there's been no privacy advantage to be gained by avoiding a number since anyone can trivially cluster data on characteristics like name, age, address, etc. Academic research has demonstrated that much of our sense of privacy is illusory - something as simple as the name, zipcode and birthdate or approximate home and work locations suffices to uniquely identify most people. At this point, campaigning against a national ID card devolves into the case that abuse will be stopped because the suspect agencies aren't capable of basic database use.   

Since we know that large scale, effective data-mining is already going on - not every agency shares TSA's reliance on WWI-era technology in excel and not all of the data mining is being done by public agencies - the better question is what a reasonable expectation of privacy is in the post-Google era and how we can deter abuse while taking advantage of the benefits offered by modern computing1. Since data is so easy to collect and mine, we really should be discussing acceptable use of data and the penalties for abuse, covering both government and corporate use. If we had European-style privacy laws, a national ID card could be discussed as the obvious good idea it is rather than the proxy for a litany of tangentially-related fears.

  1. Does anyone really want to make the case that government shouldn't use effective tools? Instead of fighting the Census we should be trying to figure out how it could be updated yearly or better so policy can be adjusted for trends on less than a decadal timescale.
Feb 12

I wonder what incorrect conclusions I may be drawing…

In my own experience I suspect that it is not a coincidence that the last major advance in CPU microarchitecture, the transition to out-of-order execution with the Intel P6 (and AMD K7, etc.) occurred just prior to the transition to PowerPoint.

— Andy Glew

Feb 08

Post-snowpocalyptic-DC

I've been posting post-storm pictures on Flickr. Some favorites:

  • Washington Monument
  • 14th Street Heights
  • Church of Christ
  • Note to cyclists: these guys are more hardcore than you
  • Frozen Fountain
Feb 06

Continuous integration testing for Django sites

How we're using Hudson to check our projects at work

At work we're busy trying to get a Django site out the door. This time around, we've been enjoying the modest time invested in setting up a Hudson continuous integration server - see Chris Shenton's presentation here for why you should and how quickly you can - and one of the areas we've really expanded was the use of automated testing. I've already described the test runner we're using but wanted to describe the overall process which we're using.

First, some background notes:

  • Create a script which does all of the hard work and manage that as part of your project - some of the examples show people dumping 20+ line shell scripts into a Hudson config but if you're serious, this should be versioned like everything else. If you're careful, some of the setup tasks can even be shared with other scripts you use to setup new developers or create RPMs.
  • Our process relies on virtualenv and pip. If you're not familiar with these, all you need to know in order to follow along is that virtualenv creates a virtual Python instance which allows us to keep this project separate from everything else and avoids the need for pip to have privileged access to install software.

Roughly in order, this is what our automated job does:

  1. If our virtualenv doesn't exist or requirements.pip has changed since we initialized the virtualenv, remove it and recreate it. In Bash this is roughly:
    if [ ! -d .virtualenv -o requirements.pip -nt .virtualenv ]; then 
      rm -r .virtualenv;
      virtualenv .virtualenv;
      pip install -r requirements.pip --environment=.virtualenv --download-cache=.pip-download-cache;
    fi
    
    One important note: using the download cache makes your installs a lot faster and avoids wasting other people's resources on the distribution servers.
  2. To avoid issues with a failure leaving the database in an inconsistent state, we drop and recreate the database before every run and clear the Solr full-text search index.
  3. Start Solr as a background task:
    java -DSTOP.PORT=<arbitrary high port> -DSTOP.KEY=<arbitrary key> -jar start.jar
  4. django-admin.py syncdb
  5. django-admin.py loaddata clean_site (on our projects, we name fixtures clean_site rather than initial_data to avoid overwriting changes when syncdb runs)
  6. At this point, we're ready to actually run the tests, which we do using our custom test runner which runs our full Django test suite, saving the output and coverage.py's report to a directory which is available through a local Apache instance for convenience. This also generates coverage.py's XML report so the Hudson Cobertura plugin can generate pretty charts showing our progress over time.
    We save the return code from the test suite (i.e. TEST_RC=$?) so we can report failures after running our cleanup code (see below) 
  7. Assuming that the test suite ran correctly, we then launch some additional tests using Eric Holscher's excellent django-test-utils:
    django-admin.py crawlurls -v0 > logdir/crawler.log
    This also allows us to collect some basic performance numbers - I want to start visualizing per-page performance using something like dygraphs but we haven't had time to set that up yet.
  8. Shutdown Solr:
    java -DSTOP.PORT=<arbitrary high port> -DSTOP.KEY=<arbitrary key> -jar start.jar --stop
  9. Exit with the value returned by the Django test suite

That might sound like a lot of work but on our test system it currently takes well under 5 minutes. In addition to helping us stay on top of test coverage it's been really helpful for flushing out obsolete fixture data (i.e. crawurls will show 404 links) and has alerted us to several upstream version changes - we use pip freeze to track version numbers so we've found out quickly when the version of something we're using has been removed from PyPI. Most importantly, we know that our install instructions actually work because we're testing them on a regular basis - when something changes, it breaks quickly and is linked directly to a commit, making it easy to update the instructions and the deployment script - and when the time comes to put the code into production there's no question that the script is accurate because it's been run hundreds of times.