Accessing FitBit Intraday Data

JoggingFor Christmas my wife and I brought each other a new FitBit One device (Amazon affiliate link included). These are small fitness tracking devices that monitor the number of steps you take, how high you climb and how well you sleep. They’re great for providing motivation to walk that extra bit further, or to take the stairs rather than the lift.

I’ve only had the device for less than a week, but already I’m feeling the benefit of the gamification on As well as monitoring your fitness it also provides you with goals, achievements and competitions against your friends. The big advantage of the FitBit One over the previous models is that it syncs to recent iPhones, iPads, as well as some Android phones. This means that your computer doesn’t need to be on, and often it will sync without you having to do anything. In the worst case you just have to open the FitBit app to update your stats on the website. Battery life seems good, at about a week.

The FitBit apps sync your data directly to, which is great for seeing your progress quickly. They also provide an API for developers to provide interesting ways to process the data captured by the FitBit device. One glaring omission from the API is any way to get access to the minute by minute data. For a fee of $50 per year you can become a Premium member which allows you do to a CSV export of the raw data. Holding the data, collected by a user hostage is deeply suspect and FitBit should be ashamed of themselves for making this a paid for feature. I have no problem with the rest of the features in the Premium subscription being paid for, but your own raw data should be freely available.

The FitBit API does have the ability to give you the intraday data, but this is not part of the open API and instead is part of the ‘Partner API’. This does not require payment, but you do need to explain to FitBit why you need access to this API call and what you intend to do with it. I do not believe that they would give you access if your goal was to provide a free alternative to the Premium export function.

So, has the free software community provided a solution? A quick search revealed that the GitHub user Wadey had created a library that uses the urls used by the graphs on the FitBit website to extract the intraday data. Unfortunately the library hadn’t been updated in the last three years and a change to the FitBit website had broken it.

Fortunately the changes required to make it work are relatively straightforward, so a fixed version of the library is now available as andrewjw/python-fitbit. The old version of the library relied on you logging into to and extracting some values from the cookies. Instead I take your email address and password and fake a request to the log in page. This captures all of the cookies that are set, and will only break if the log in form elements change.

Another change I made was to extend the example script. The previous version just dumped the previous day’s values, which is not useful if you want to extract your entire history. In my new version it exports data for every day that you’ve been using your FitBit. It also incrementally updates your data dump if you run it irregularly.

If you’re using Windows you’ll need both Python and Git installed. Once you’ve done that check out my repository at Lastly, in the newly checked out directory run python examples/ <email> <password> <dump directory>.

Photo of Jogging by Glenn Euloth.


Losing Games

Alan WakeI’m not a quick game player. I don’t rush out a buy the latest games and complete them on the same weekend. Currently I’m most of the way through both Alan Wake and L.A. Noire.

Alan Wake is a survival horror game where you’re fighting off hordes of people possessed by darkness. L.A. Noire is a detective story that has you solving crimes in 1940s Los Angeles. Both feature an over the shoulder third person camera, and both have excellent graphics. They also both have a film like quality to the story. In Alan Wake the action is divided up in six tv style “episodes”, with a title sequence between each one. It also has a number of cut scenes and narration by the title character sprinkled throughout the game which help to drive the story forward.

LA Noire Screenshot 4
In L.A. Noire you are detective try to solve crimes and rise up the ranks of the police force. The game features cut scenes to introduce and close each case. During each case you head from location to location and interviewing suspects and witnesses. The big breakthrough in L.A. Noire is the facial animation in the game. Rather than being animated by hand the faces of characters were recorded directly from actor’s faces. This gives the faces a lifelike quality that has not been seen in games before.

Despite the extensive similarities between the game my opinion of the two could hardly be more different. Alan Wake is one of the best games I’ve ever played, while L.A. Noire is really quite boring. I was trying to work out why I felt so differently about them when I read the following quote in Making Isometric Social Real-Time Games with HTML5, CSS3, and JavaScript by Mario Andres Pagella.

This recent surge in isometric real-time games was caused partly by Zynga’s incredible ability to “keep the positive things and get rid of the negative things” in this particular genre of games, and partly by a shift in consumer interests. They took away the frustration of figuring out why no one was “moving to your city” (in the case of SimCity) and replaced it with adding friends to be your growing neighbours.

The need for the face of characters in L. A. Noire to be recorded from real actors limits one of the best things about games: their dynamic nature. Even if you get every question wrong you still solve the case and make progress. Initially you don’t really notice this, but quickly I found it meant that the questioning, the key game mechanic, became superfluous.

Alan Wake is a fairly standard game in that there’s really only one way to progress. This is well disguised though so you don’t notice. The atmosphere in the game forces you to keep moving and the story progresses at quite a pace.

Ultimately it’s not for me to criticise what games people want to play. FarmVille and the rest of Zynga’s games are enormously popular. What disappoints me most about L.A. Noire is that it such a technically advanced game, but falls down on such a simple piece of game mechanics. Alan Wake on the other hand succeeds mostly based on story and atmosphere, and that’s the way it should be.

Photo of Alan Wake by jit.
Photo of LA Noire Screenshot 4 by The GameWay.

Scalable Collaborative Filtering With MongoDB

Book AddictionMany websites have some form of recommendation system. While it’s simple to create a recommendation system for small amounts of data, how do you create a system that scales to huge amounts of data?

How to actually calculate the similarity of two items is a complicated topic with many possible solutions. Which one if appropriate depends on your particularly application. If you want to find out more I suggest reading the excellent Programming Collective Intelligence (Amazon affiliate link) by Toby Segaran.

We’ll take the simplest method for calculating similarity and just calculate the percentage of users who have visited both pages compared to the total number who have visited either. If we have Page 1 that was visited by user A, B and C and Page 2 that was visited by A, C and D then the A and C visited both, but A, B, C and D visited either one so the similarity is 50%.

With thousands or millions of items and millions or billions of views calculating the similarity between items becomes a difficult problem. Fortunately MongoDB’s sharding and replication allow us to scale the calculations to cope with these large datasets.

First let’s create a set of views across a number of items. A view is stored as a single document in MongoDB. You would probably want to include extra information such as the time of the view, but for our purposes this is all that is required.

views = [
        { "user": "0", "item": "0" },
        { "user": "1", "item": "0" },
        { "user": "1", "item": "0" },
        { "user": "1", "item": "1" },
        { "user": "2", "item": "0" },
        { "user": "2", "item": "1" },
        { "user": "2", "item": "1" },
        { "user": "3", "item": "1" },
        { "user": "3", "item": "2" },
        { "user": "4", "item": "2" },

for view in views:

The first step is to process this list of view of events so we can take a single item and get a list of all the users that have viewed it. To make sure this scales over a large number of views we’ll use MongoDB’s map/reduce functionality.

def article_user_view_count():
    map_func = """
function () {
    var view = {}
    view[this.user] = 1
    emit(this.item, view);

We’ll build a javascript Object where the keys are the user id and the value is the number of time that user has viewed this item. In the map function we we build an object that represents a single view and emit it using the item id as the key. MongoDB will group all the objects emitted with the same key and run the reduce function, shown below.

    reduce_func = """
function (key, values) {
    var view = values[0];

    for (var i = 1; i < values.length; i++) {
        for(var item in values[i]) {
            if(!view.hasOwnProperty(item)) { view[item] = 0; }

            view[item] = view[item] + values[i][item];
    return view;

A reduce function takes two parameters, the key and a list of values. The values that are passed in can either be those emitted by the map function, or values returned from the reduce function. To help it scale not all of the original values will be processed at once, and the reduce function must be able to handle input from the map function or its own output. Here we output a value in the same format as the input so we don’t need to do anything special.

    db.views.map_reduce(Code(map_func), Code(reduce_func), out="item_user_view_count")

The final step is to run the functions we’ve just created and output the data into a new collection. Here we’re recalculating all the data each time this function is run. To scale properly you should filter the input based on the date the view occurred and merge it with the output collection, rather than replacing it as we are doing here.

Now we need calculate a matrix of similarity values, linking each item with every other item. First lets see how we can calculate the similarity of all items to one single item. Again we’ll use map/reduce to help spread the load of running this calculation. Here we’ll just use the map part of map/reduce because each input document will be represented by a single output document.

def similarity(item):
    map_func = """
function () {
    if(this._id == "%s") { return; }

    var viewed_both = {};
    var viewed_any = %s;

    for (var user in this.views) {
        if(this.value.hasOwnProperty(user)) {
            viewed_both[user] = 1;

        viewed_any[user] = 1;
     emit("%s"+"_"+this._id, viewed_both.length / viewed_any.length );
""" % (int(item["_id"]), json.dumps(item["value"]), json.dumps(item["value"]) int(item["_id"]), )

The input to our Python function is a document that was outputted by our previous map/reduce call. We build a new Javascript by interpolating some data from this document into a template function. We loop through all the users who viewed the document we’re comparing against and work out whether they have viewed both. At the end of the function we emit the percentage of users who viewed both.

    reduce_func = """
function (key, values) {
    return results[0];

Because we output unique ids in the map function this reduce function will only be called with a single value so we just return that.

    db.item_user_view_count.map_reduce(Code(map_func), Code(reduce_func), out=SON([("merge", "item_similarity")]))

The last step in this function is to run the map reduce. Here as we’re running the map/reduce multiple times we need to merge the output rather than replacing it as we did before.

The final step is to loop through the output from our first map/reduce and call our second function for each item.

for doc in db.item_user_view_count.find():

A key thing to realise is that you don’t need to calculate live similarity data. Once you have even a few hundred views per item then the similarity will remain fairly consistent. In this example we step through each item in turn and calculate the similarity for it with every other item. For a million item database where each iteration of this loop takes one second the similarity data will be updated once every 11 days.

I’m not claiming that you can take the code provided here and immediately have a massively scalable system. MongoDB provides an easy to use replication and sharding system, which are plugged in to its Map/Reduce framework. What you should take away is that by using map/reduce with sharding and replication to calculate the similarity between two items we can quickly get a system that scales well with an increasing number of items and of views.

Photo of Book Addiction by Emily Carlin.

Steve Jobs and the Lean Startup

Steve JobsOn my 25 minute train journey to work each morning I like to pass the time by reading. The two most recent books I’ve read are The Lean Startup: How Constant Innovation Creates Radically Successful Businesses by Eric Ries and Steve Jobs by Walter Isaacson (both links contain an affiliate id). Although one is a biography and the other is a book on project management they actually cover similar ground, and both are books that people working in technology should read.

Walter Isaacson’s book has been extensively reviewed and dissected so I’m not going to go into detail on it. The book is roughly divided into two halves. The first section is on the founding of Apple, Pixar and NeXT. This section serves an inspirational guide to setting up your own company. The joy of building a great product and defying the odds against a company succeeding comes across very strongly. The later section following Job’s return to Apple is a much more about the nuts and bolts of running a huge corporation. While it’s an interesting guide to how Apple got to where it is today, it lacks the excitement of the earlier chapters.

Eric Ries - The Lean Startup, London EditionThe Lean Startup could, rather unkindly, be described as a managerial technique book. It’s much more than that though as it’s more of a philosophy for how to run company or a project. The book is very readable and engaging with plenty of useful case studies to illustrate the point being made. The key message of the book is to get your product out to customers as soon as possible, to measure as much as you can and learn from what your customers are doing and saying. As you learn you need to make a decision on whether to persevere or to pivot, and change strategy.

There are many reasons why Steve Jobs was a great leader, a visionary and a terrible boss. One aspect was his unshakable belief that he knew what the customer wanted, even before they knew themselves. This is the antithesis of the Lean Startup methodology, which focuses on measurement and learning. Eric Ries stresses that a startup is not necessarily two guys working out of a garage. Huge multinational corporations can have speculative teams or projects inside them, that act much like start ups, so it wouldn’t be impossible for the Apple of today to act like a start up. Apple weren’t always huge though, and back in the 1970s they really were a start up.

One Apple trait the Lean Startup methodolgy doesn’t allow for is dramatic product launches. The Lean Startup is a way of working that relies on quick iteration and gradually building up your customer base. It’s hard to quickly iterate when building hardware, but early in Apple’s life they were struggling to find a market for their computers. The Apple I follow the trend of the time of build-it-yourself computers. Just a year later and Apple released the Apple ][ which came with a case and was much more suitable for the average consumer. This represents a pivot on the part of Apple. They could have continued to focus on hobbyists but instead they decided to change and aim for a bigger, but less technical, market.

Reading is a key part of becoming a better programmer. Whether it’s reading about the latest technology on a blog, the latest project management techniques or the history of computers reading will help you become better at your job. I’m not sure I recommend anyone tries to recreate Steve Job’s management style, but as a history of Apple Walter Isaacson’s book is inspirational and informative. The Lean Startup is considerably more practical, even if it won’t inspire you to set a company in the first place.

Photo of Steve Jobs by Ben Stanfield.
Photo of Eric Ries – The Lean Startup, London Edition by Betsy Weber.

Django ImportError Hiding

Hidden CatA little while ago I was asked what my biggest gripe with Django was. At the time I couldn’t think of a good answer because since I started using Django in the pre-1.0 days most of the rough edges have been smoothed. Yesterday though, I encountered an error that made me wish I thought of it at the time.

The code that produced the error looked like this:

from django.db import models

class MyModel(model.Model):

    def save(self):



The error that was raised was AttributeError: 'NoneType' object has no attribute 'Model'. This means that rather than containing a module object, models was None. Clearly this is impossible as the class could not have been created if that was the case. Impossible or not, it was clearly happening.

Adding a print statement to the module showed that when it was imported the models variable did contain the expected module object. What that also showed was that module was being imported more than once, something that should also be impossible.

After a wild goose chase investigating reasons why the module might be imported twice I tracked it down to the load_app method in django/db/models/ The code there looks something like this:

    def load_app(self, app_name, can_postpone=False):
            models = import_module('.models', app_name)
        except ImportError:
            # Ignore exception

Now I’m being a harsh here, and the exception handler does contain a comment about working out if it should reraise the exception. The issue here is that it wasn’t raising the exception, and it’s really not clear why. It turns out that I had a misspelt module name in an import statement in a different module. This raised an ImportError which was caught, hidden and then Django repeatedly attempted to import the models as they were referenced in the models of other apps. The strange exception that was originally encountered is probably an artefact of Python’s garbage collection, although how exactly it occurred is still not clear to me.

There are a number of tickets (#6379, #14130 and probably others) on this topic. A common refrain in Python is that it’s easier to ask for forgiveness than to ask for permission, and I certainly agree with Django and follow that most of the time.

I always follow the rule that try/except clauses should cover as little code as possible. Consider the following piece of code.


except AttributeError:
    # handle error

Which of the three attribute accesses are we actually trying to catch here? Handling exceptions like this are a useful way of implementing Duck Typing while following the easier to ask forgiveness principle. What this code doesn’t make clear is which member or method is actually optional. A better way to write this would be:


    member = var.member
except AttributeError:
    # handle error

Now the code is very clear that the var variable may or may not have a member member variable. If method1 or method2 do not exist then the exception is not masked and is passed on. Now lets consider that we want to allow the method1 attribute to be optional.

except AttributeError:
    # handle error

At first glance it’s obvious that method1 is optional, but actually we’re catching too much here. If there is a bug in method1 that causes an AttributeError to raised then this will be masked and the code will treat it as if method1 didn’t exist. A better piece of code would be:

    method = var.method1
except AttributeError:
    # handle error

ImportErrors are similar because code can be executed, but then when an error occurs you can’t tell whether the original import failed or whether an import inside that failed. Unlike with an AttributeError there is a no easy way to rewrite the code to only catch the error you’re interested in. Python does provide some tools to divide the import process into steps, so you can tell whether the module exists before attempting to import it. In particular the imp.find_module function would be useful.

Changing Django to avoid catching the wrong ImportErrors will greatly complicate the code. It would also introduce the danger that the algorithm used would not match the one used by Python. So, what’s the moral of this story? Never catch more exceptions than you intended to, and if you get some really odd errors in your Django site watch out for ImportErrors.

Photo of Hidden Cat by Craig Grahford.

Back Garden Weather in CouchDB (Part 5)

Snow fallingAfter a two week gap the recent snow in the UK has inspired me to get back to my series of posts on my weather station website, In this post I’ll discuss the records page, which shows details such as the highest and lowest temperatures, and the heaviest periods of rain.

From a previous post in this series you’ll remember that the website is implemented as a CouchApp. These are Javascript functions that run inside the CouchDB database, and while they provide quite a lot of flexibility you do need to tailor your code to them.

On previous pages we have use CouchDB’s map/reduce framework to summarise data then used a list function to display the results. The records page could take a similar approach, but there are some drawbacks to that. Unlike the rest of the pages the information on the records page consists of a number of unrelated numbers. While we could create a single map/reduce function to process all of them at once. That function will quickly grow and become unmanageable, so instead we’ll calculate the statistics individually and use AJAX to load them dynamically into the page.

To calculate the minimum indoor temperature we first need to create a simple view to calculate the value. As with all CouchDB views this starts with map function that outputs the parts of the document we are interested in.

function(doc) {
    emit(doc._id, { "temp_in": doc.temp_in, "timestamp": doc.timestamp });

Next we create a reduce function to find the lowest temperature. To do this we simply loop through all the values and select the smallest temperature, recording the timestamp that temperature occurred.

function(keys, values, rereduce) {
    var min = values[0].temp_in;
    var min_on = values[0].timestamp;

    for(var i=0; i<values.length; i++) {
        if(values[i].temp_in < min) {
            min = values[i].temp_in;
            min_on = values[i].timestamp;

    return { "temp_in": min, "timestamp": min_on }

The website actually points to the Couch rewrite document. To make the view available we add a rewrite to expose it to the world. As we want to reduce all documents to a single point we just need to pass reduce=true as the query.

    "from": "/records/temperature/in/min",
    "to": "/_view/records_temp_in_min",
    "query": { "reduce": "true" }

Lastly we can use jQuery to load the data and place the values into the DOM at the appropriate place. As CouchDB automatically sends the correct mime type jQuery will automatically decode the JSON data making this function very straightforward.

$.getJSON("records/temperature/in/min", function (data, textStatus, jqXHR) {
    var row = data.rows[0].value;
    var date = new Date(row.timestamp*1000);

This approach works well for most of the records that I want to calculate. Where it falls down is when calculating the wettest days and heaviest rain as the data needs to be aggregated before being reduced to a single value. Unfortunately CouchDB does not support this. The issue is that you cannot guarantee that the original documents will be passed to your view in order. In fact it is more likely than not than they won’t be. So, to calculate the heaviest periods of rain you would need to build a data structure containing each hour or day and the amount of rain in that period. As the documents are processed the structure would need to be updated and the period with the highest rain found.

Calculating a complicated structure as the result of your reduce function is disallowed by CouchDB, for good reason. An alternative way to find the heaviest periods of rain would be to put the output of the aggregation function into a new database and run another map/reduce function over that to find the heaviest period. Unfortunately CouchDB doesn’t support the chaining of views, so this is impossible without using an external program.

To solve this problem I do the aggregation in CouchDB and the transfer the whole result to the webbrowser and calculate the heaviest period in Javascript. The code to do this is given below. It’s very similar to that given above, but includes a loop to cycle over the results and pick the largest value.

$.getJSON("records/rain/wettest", function (data, textStatus, jqXHR) {
        var max_on = data.rows[0].key;
        var max_rain = data.rows[0].value;
        for(var i=0; i<data.rows.length; i++) {
            if(data.rows[i].value > max_rain) {
                max_on = data.rows[i].key;
                max_rain = data.rows[i].value;
        var date = new Date(max_on*1000);

This solution works ok, but as time goes on the dataset gets bigger and bigger and the amount of data that is transferred to the browser will grow and grow. Hopefully in future I’ll be able to write another post about changing this to use chained viewed.

CouchDB is a great document store that is at home the web. The ability to run simple sites right from your database is extremely useful and makes deployment a snap. As with all technology you need to be aware of the limitations of CouchDB and allow for them in your designs. In my case the inability to chain views together is really the only wart in the code. Don’t forget you can replicate the database to get the data and use the couchapp command to clone a copy of site. See the first post in this series for instructions on how to do this. Please let me know in the comment section below if you find the site useful or have any questions or comments on the code.

Photo of Snow falling by Dave Gunn.

Hackathons, and why your company needs one

CodeI could wax lyrical about how programming is an art form and requires a great deal of creativity. However, it’s easy to loose focus on this in the middle of creating project specs and servicing your technical debt. Like many companies we recently held a hackathon event where we split up into teams and worked on projects suggested by the team members.

Different teams took different approaches to the challenge, one team set about integrating an open source code review site in our development environment, others investigated how some commercial technologies could be useful to us. My team built a collaborative filtering system using MongoDB. I’ll post about that project in the future, but in this post I wanted to focus on what we learnt about running a company Hackathon event.

If you’re lucky you’ll work in a company that’s focused on technology and you’ll always be creating new and interesting things. In the majority of companies technology is a means to a end, rather than the goal. In that case it’s easy to become so engrossed in the day to day work that you forget to innovate or to experiment with new technologies. A hackathon is a great way to take a step back and try something new for a few days.

Running a hackathon event should be divided into three stages, preparation, the event and the post event. Before the event you need to take some time to collect ideas and do some preliminary research. The event itself should be a whirlwind of pumping out code and building something exciting. Afterwards you need to take some time to demonstrate what you’ve built, and share what you’ve learnt.

Typical IT departments will been given a set of requirements and will need work out how to achieve them. What a hackathon event should allow is for the department to produce their own set of requirements, free from any external influences. In your day to day work what projects you actually tackle will be decided by a range of factors, but a hackathon is designed to let programmers take something where they’ve thought “we could do something interesting with that data” or “that technology might help us make things better” and run with it. The first stage is to collect all these ideas from the department and then to divide the team up into groups to tackle the most popular projects, perhaps with a voting staging to whittle the ideas down. To keep things fun and quick small teams are key here, any more than five people and you’ll start to get bogged down in process and design by committee.

Once you’ve got the projects and teams sorted you can prepare for the event. People who are working on each project need to think about what they want to have built by the end of event and should be dreaming up ways to tackle the problem. Coding before is banned, but things will go much quicker if you’ve come up with a plan to attack the problem.

For the event you need to remove as many distractions as possible. Certainly you need tell others in the company that you will not be available for a few days. Whether other possibilities such as not reading your email are doable depends on how often you need to deal with crises. Hopefully with no-one fiddling with your servers fewer things will go wrong than on an average day. Moving location, either to meeting rooms or to an external space are both good ways of getting space to focus on the work.

Once the time has expired you need to wrap the event up, and the key thing is to demonstrate what you’ve built to the rest of team. Not everyone in IT is happy with standing and presenting, but a few minutes to people they know well should not be a problem. It’s tempting to invite people from outside the department to the presentations, and before my first hackathon I was very keen on bringing management, who often won’t understand what IT do, into the meeting. Hopefully what is built will show how innovative and full of ideas programmers can be. In reality though the best you can really hope for is a set of tech demos that require quite a lot of understanding about the limitations inherent in building things quickly, which those outside IT will struggle to understand.

The presentations should focus on two things, a demonstration of what you’ve built and a discussion on the technology used and decisions made. The aim should be to excite the team about what you’ve built and to impart some of what you’ve learnt. A good way to spark a discussion is to talk about what problems you encountered. How long the presentations should be depends a lot on the complexity of the problems being tackled, but ten to twenty minutes per project is a good length of time.

In the longer term after the event you’ll need to decide which of the projects to keep, which to improve on or rewrite and which to abandon completely. In a more structured and prepared presentation showing the projects to management maybe a good idea. The presentations immediately following the hackathon are likely to be a fairly ramshackle affair and won’t make the most ‘professional’ impression.

Traditional ‘training’ session don’t work to well in IT, it’s a easier to learn by doing. Most people are also quite sceptical of ‘build a raft’ style team building exercises, compared to those hackathons are the perfect mix of learning and fun. They’re also a great way to work on the problems that you’ve wanted to solve for ages but have never found the time. Even if you don’t get anything you can use out of the event the process of getting there will be worthwhile. And who knows, you might build a million dollar product for your company.

Photo of Code by Lindsey Bieda.