Back Garden Weather in CouchDB (Part 5)

Snow fallingAfter a two week gap the recent snow in the UK has inspired me to get back to my series of posts on my weather station website, WelwynWeather.co.uk. In this post I’ll discuss the records page, which shows details such as the highest and lowest temperatures, and the heaviest periods of rain.

From a previous post in this series you’ll remember that the website is implemented as a CouchApp. These are Javascript functions that run inside the CouchDB database, and while they provide quite a lot of flexibility you do need to tailor your code to them.

On previous pages we have use CouchDB’s map/reduce framework to summarise data then used a list function to display the results. The records page could take a similar approach, but there are some drawbacks to that. Unlike the rest of the pages the information on the records page consists of a number of unrelated numbers. While we could create a single map/reduce function to process all of them at once. That function will quickly grow and become unmanageable, so instead we’ll calculate the statistics individually and use AJAX to load them dynamically into the page.

To calculate the minimum indoor temperature we first need to create a simple view to calculate the value. As with all CouchDB views this starts with map function that outputs the parts of the document we are interested in.

function(doc) {
    emit(doc._id, { "temp_in": doc.temp_in, "timestamp": doc.timestamp });
}

Next we create a reduce function to find the lowest temperature. To do this we simply loop through all the values and select the smallest temperature, recording the timestamp that temperature occurred.

function(keys, values, rereduce) {
    var min = values[0].temp_in;
    var min_on = values[0].timestamp;

    for(var i=0; i<values.length; i++) {
        if(values[i].temp_in < min) {
            min = values[i].temp_in;
            min_on = values[i].timestamp;
        }
    }

    return { "temp_in": min, "timestamp": min_on }
}

The website welwynweather.co.uk actually points to the Couch rewrite document. To make the view available we add a rewrite to expose it to the world. As we want to reduce all documents to a single point we just need to pass reduce=true as the query.

{
    "from": "/records/temperature/in/min",
    "to": "/_view/records_temp_in_min",
    "query": { "reduce": "true" }
},

Lastly we can use jQuery to load the data and place the values into the DOM at the appropriate place. As CouchDB automatically sends the correct mime type jQuery will automatically decode the JSON data making this function very straightforward.

$.getJSON("records/temperature/in/min", function (data, textStatus, jqXHR) {
    var row = data.rows[0].value;
    var date = new Date(row.timestamp*1000);
    $("#min_temp_in").html(row.temp_in);
    $("#min_temp_in_date").html(date.toUTCString());
  });

This approach works well for most of the records that I want to calculate. Where it falls down is when calculating the wettest days and heaviest rain as the data needs to be aggregated before being reduced to a single value. Unfortunately CouchDB does not support this. The issue is that you cannot guarantee that the original documents will be passed to your view in order. In fact it is more likely than not than they won’t be. So, to calculate the heaviest periods of rain you would need to build a data structure containing each hour or day and the amount of rain in that period. As the documents are processed the structure would need to be updated and the period with the highest rain found.

Calculating a complicated structure as the result of your reduce function is disallowed by CouchDB, for good reason. An alternative way to find the heaviest periods of rain would be to put the output of the aggregation function into a new database and run another map/reduce function over that to find the heaviest period. Unfortunately CouchDB doesn’t support the chaining of views, so this is impossible without using an external program.

To solve this problem I do the aggregation in CouchDB and the transfer the whole result to the webbrowser and calculate the heaviest period in Javascript. The code to do this is given below. It’s very similar to that given above, but includes a loop to cycle over the results and pick the largest value.

$.getJSON("records/rain/wettest", function (data, textStatus, jqXHR) {
        var max_on = data.rows[0].key;
        var max_rain = data.rows[0].value;
        for(var i=0; i<data.rows.length; i++) {
            if(data.rows[i].value > max_rain) {
                max_on = data.rows[i].key;
                max_rain = data.rows[i].value;
            }
        }
        var date = new Date(max_on*1000);
    $("#wettest_day").html(max_rain);
        $("#wettest_day_date").html(date.toDateString());
    });

This solution works ok, but as time goes on the dataset gets bigger and bigger and the amount of data that is transferred to the browser will grow and grow. Hopefully in future I’ll be able to write another post about changing this to use chained viewed.

CouchDB is a great document store that is at home the web. The ability to run simple sites right from your database is extremely useful and makes deployment a snap. As with all technology you need to be aware of the limitations of CouchDB and allow for them in your designs. In my case the inability to chain views together is really the only wart in the code. Don’t forget you can replicate the database to get the data and use the couchapp command to clone a copy of site. See the first post in this series for instructions on how to do this. Please let me know in the comment section below if you find the site useful or have any questions or comments on the code.


Photo of Snow falling by Dave Gunn.

Advertisements

Back Garden Weather in CouchDB (Part 4)

Weather frontIn this series of posts I’m describing how I created a CouchDB CouchApp to display the weather data collected by the weather station in my back garden. In the previous post I showed you how to display a single day’s weather data. In this post we will look at processing the data to display it by month.

The data my weather station collects consists of a record every five minutes. This means that a 31 day month will consist of 8,928 records. Unless you have space to draw a graph almost nine thousand pixels wide then there is no point in wasting valuable rending time processing that much data. Reducing the data to one point per hour gives us a much more manageable 744 data points for a month. A full years worth of weather data consists of 105,120 records, even reducing it to one point per hour gives us 8760 points. When rendering a year’s worth of data it is clearly worth reducing the data even further, this time to one point per day.

How do we use CouchDB to reduce the data to one point per hour? Fortunately CouchDB’s map/reduce architecture is perfect for this type of processing. CouchDB will also cache the results of the processing automatically so it only needs to be run once rather than requiring an expensive denormalisation process each time some new data is uploaded.

First we need to group the five minute weather records together into groups for each hour. We could do this by taking the unix timestamp of record and rounding to the nearest hour. The problem with this approach is that the keys are included in the urls. If you can calculate unix timestamps in your head then your maths is better than mine! To make the urls more friendly we’ll use a Javascript implementation of sprintf to build a human-friendly representation of date and time, excluding the minute component.

function(doc) {
    // !code vendor/sprintf-0.6.js

    emit(sprintf("%4d-%02d-%02d %02d", doc.year, doc.month, doc.day, doc.hour), doc);
}

CouchDB will helpfully group documents with the same key, so all the records from the same hour will be passed to the reduce function. What you cannot guarantee though is that all the records will be passed in one go, instead you must ensure that your reduce function can operate on its own output. You can tell whether you are ‘rereducing’ the output of the reduce function by checking the third parameter to the function.

function(keys, values, rereduce) {
    var count = 0;

    var timestamp = values[0].timestamp;
    var temp_in = 0;
    var temp_out = 0;
    var abs_pressure = 0;
    var rain = 0;

    var wind_dir = [];
    for(var i=0; i<8; i++) { wind_dir.push({ value: 0}); }

To combine the multiple records it makes sense to average most of the values. The exceptions to this are the amount of rain, which should be summed; the wind direction, which should be a count of the gusts in each direction, and the wind gust speed which should be the maximum value. Because your reduced function may be called more than once calculating the average value is not straightforward. If you simply calculate the average of the values passed in then you will be calculating the average of averages, which is not the same the average of the full original data. To work around this we calculate the average of the values and store that with the number of values. Then, when we rereduce, we multiply the average by the number of values and then average the multiplied value.

In the previous, simplified, code snippet we set up the variables that will hold the averages.

    for(var i=0; i<values.length; i++) {
        var vcount;
        if(rereduce) { vcount = values[i].count } else { vcount = 1 }

We now loop through each of the values and work out how many weather records the value we’re processing represents. The initial pass will just represent a single record, but in the rereduce step it will be more.

        temp_in = temp_in + values[i].temp_in * vcount;
        temp_out = temp_out + values[i].temp_out * vcount;
        abs_pressure = abs_pressure + values[i].abs_pressure * vcount;

Here we build up the total values for temperature and pressure. Later we’ll divide these by the number of records to get the average. The next section adds the rain count up and selects the maximum wind gust.

        rain = rain + values[i].rain;

        wind_ave = wind_ave + values[i].wind_ave * vcount;
        if(values[i].wind_gust > wind_gust) { wind_gust = values[i].wind_gust; }

So far we’ve not really had to worry about the possibility of a rereduce, but for wind direction we need to take it into account. An individual record has a single window direction but for a hourly records we want to store the count of the number of times each direction was recorded. If we’re rereducing we need to loop through all the directions and combine them.

        if(rereduce) {
            for(var j=0; j<8; j++) {
                wind_dir[j]["value"] += values[i].wind_dir[j]["value"];
            }
        } else if(values[i].wind_ave > 0 && values[i].wind_dir >= 0 && values[i].wind_dir < 16) {
            wind_dir[Math.floor(values[i].wind_dir/2)]["value"] += 1;
        }

        if(values[i].timestamp < timestamp) { timestamp = values[i].timestamp; }
        count = count + vcount;
    }

The final stage is to build the object that we’re going to return. This stage is very straightforward, we just need to divide the numbers we calculated before by the count of the number of records. This gives us the correct average for these values.

    return {
            "count": count,
            "status": status,
            "timestamp": timestamp,
            "temp_in": temp_in / count,
            "temp_out": temp_out / count,
            "abs_pressure": abs_pressure / count,
            "rain": rain,
            "wind_ave": wind_ave / count,
            "wind_gust": wind_gust,
            "wind_dir": wind_dir,
        };
}

Now we have averaged the weather data into hourly chunks we can use a list, like the one described in the previous post, to display the data.

In the next and final post in this series I’ll discuss the records page on the weather site.


Photo of Weather front by Paul Wordingham.

Back Garden Weather in CouchDB (Part 3)

almost mayIn this series I’m describing how I used a CouchDB CouchApp to display the weather data collected by a weather station in my back garden. In the first post I described CouchApps and how to get a copy of the site. In the next post we looked at how to import the data collected by PyWWS and how to render a basic page in a CouchApp. In the post we’ll extend the basic page to display real weather data.

Each document in the database is a record of the weather data at a particular point in time. As we want to display the data over a whole day we need to use a list function. list functions work similarly to the show function we saw in the previous post. Unlike show functions list functions don’t have the document passed in, they can call a getRow function which returns the next row to process. When there are no rows left it returns null.

show functions process an individual document and return a single object containing the processed data and any HTTP headers. Because a list function can process a potentially huge number of rows they return data in a different way. Rather than returning a single object containing the whole response list functions must return their response in chunks. First you need to call the start function, passing in any headers that you want to return. Then you call send one or more times to return parts of your response. A typical list function will look like the code below.

function (head, req) {
    start({ "headers": { "Content-Type": "text/html" }});

    send(header);
    while(row = getRow()) {
        data = /* process row */;
        send(row);
    }
    send(footer);
}

To process the weather data we can’t follow this simple format because we need to split each document up and display the different measurements separately. Let’s look at the code for creating the day page. The complete code is a bit too long to include in a blog post so checkout the first post in this series to find out how to get a complete copy of the code.

To start the function we load the templates and code that we need using the CouchApp macros. Next we return the appropriate Content-Type header, and then we create the object that we’ll pass to Mustache when we’ve processed everything.

function(head, req) {
    // !json templates.day
    // !json templates.head
    // !json templates.foot
    // !code vendor/couchapp/lib/mustache.js
    // !code vendor/sprintf-0.6.js
    // !code vendor/date_utils.js

    start({ "headers": { "Content-Type": "text/html" }});

    var stash = {
        head: templates.head,
        foot: templates.foot,
        date: req.query.startkey,
    };

Next we build a list of the documents that we’re processing so we can loop over the documents multiple times.

    var rows = [];
    while (row = getRow()) {
        rows.push(row.doc);
    }

To calculate maximum and minimum values we need to choose the first value and then run through each piece of data and see whether it is higher or lower than the current record. As the data collector of the weather station is separate to the outside sensors occasionally they lose their connection. This means that we can just pick the value in the first document as our starting value, instead we must choose the first document where the connection with the outside sensors was made.

    if(rows.length &gt; 0) {
        for(var i=0; i<rows.length; i++) {
            if((rows[i].status &amp; 64) == 0) {
                max_temp_out = rows[i].temp_out;
                min_temp_out = rows[i].temp_out;
                max_hum_out = rows[i].hum_out;
                min_hum_out = rows[i].hum_out;

                break;
            }
        }

Now we come to the meat of the function. We loop through all of the documents and process them into a series of arrays, one for each graph that we’ll draw on the final page.

        for(var i=0; i<rows.length; i++) {
            var temp_out = null;
            var hum_out = null;
            if((rows[i].status & 64) == 0) {
                temp_out = rows[i].temp_out;
                hum_out = rows[i].hum_out;

                total_rain = total_rain + rows[i].rain;
                rainfall.push({ "time": time_text, "rain": rows[i].rain });

                wind.push({ "time": time_text, "wind_ave": rows[i].wind_ave, "wind_gust": rows[i].wind_gust });

            }

            pressure.push({ "time": time_text, "pressure": rows[i].abs_pressure });

            temps.push({ "time": time_text, "temp_out": temp_out, "temp_in": rows[i].temp_in });

            humidity.push({ "time": time_text, "hum_in": rows[i].hum_in, ";hum_out": hum_out });
        }
    }

Lastly we take the stash, which in a bit of code I’ve not included here has the data arrays added to it, and use it to render the day template.

    send(Mustache.to_html(templates.day, stash));

    return &quot;&quot;;
}

Let’s look at a part of the day template. The page is a fairly standard use of the Google Chart Tools library. In this first snippet we render the maximum and minimum temperature values, and a blank div that we’ll fill with the chart.

<h3>Temperature</h3>

<p>Outside: <b>Maximum:</b> {{ max_temp_out }}<sup>o</sup>C <b>Minimum:</b> {{ min_temp_out }}<sup>o</sup>C</p>
<p>Inside: <b>Maximum:</b> {{ max_temp_in }}<sup>o</sup>C <b>Minimum:</b> {{ min_temp_in }}<sup>o</sup>C</p>

<div id="tempchart_div"></div>

In the following Javascript function we build a DataTable object that we pass to the library to draw a line chart. The {{#temps}} and {{/temps}} construction is the Mustache way of looping through the temps array. We use it to dynamically write out Javascript code containing the data we want to render.

function drawTempChart() {
    var data = new google.visualization.DataTable();
    data.addColumn('string', 'Time');
    data.addColumn('number', 'Outside');
    data.addColumn('number', 'Inside');

    data.addRows([
    {{#temps}}
        ['{{ time }}', {{ temp_out }}, {{ temp_in }}],
    {{/temps}}
        null]);

    var chart = new google.visualization.LineChart(document.getElementById('tempchart_div'));
    chart.draw(data, {width: 950, height: 240, title: 'Temperature'});
}
google.setOnLoadCallback(drawTempChart);

We now have a page that displays all the collected weather data for a single day. In the next post in this series we’ll look at how to use CouchDB’s map/reduce functions to process the data so we can display it by month and by year.


Photo of almost may by paul bica.

Back Garden Weather in CouchDB (Part 2)

its raining..its pouringIn my last post I described the new CouchDB-based website I have built to display the weather data collected from the weather station in my back garden. In this post I’ll describe to import the data into CouchDB and the basics of rendering a page with a CouchApp.

PyWWS writes out the raw data it collected into a series of CSV files, one per day. These are stored in two nested directory, the first being the year, the second being year-month. To collect the data I use PyWWS’s live logging mode, which consists of a process constantly running, talking to the data collector. Every five minutes it writes a new row into today’s CSV file. Another process then runs every five minutes to read the new row, and import it into the database.

Because CouchDB stores its data using an append only format you should aim to avoid unnecessary updates. The simplest way to write the import script would be to import each day’s data every five minutes. This would cause the database to balloon in size, so instead we query the database to find the last update time and import everything after than. Each update is stored as a separate document in the database, with the timestamp attribute containing the unix timestamp of the update.

The map code to get the most recent update is quite simple, we just need to emit the timestamp for each update. The reason the timestamp is emitted as the key is so we can filter the range of updates. It is also emitted as the value so we can use the timestamp in the reduce function.

function(doc) {
    emit(doc.timestamp, doc.timestamp);
}

The reduce function is a fairly simple way to calculate the maximum value of the keys. I’ve mostly included it here for completeness.

function(keys, values, rereduce) {
    if(values.length == 0) {
        return 0;
    }

    var m = values[0];

    for(var i=0; i<values.length; i++) {
        if(values[i] > m) { m = values[i]; }
    }

    return m;
}

You’ll find the import script that I use in the directory you cloned in the previous post, when you got a copy of the website.

So, we’ve got some data in our database. How do we display it on a webpage? First, let’s consider the basics of rendering a webpage.

CouchDB has two ways to display formatted data, show and list functions. Show functions allow you to format a single documents, for example a blog post. List functions allow you to format a group of documents, such as a the comments on a post. Because viewing a single piece of weather data is not interesting the weather site only uses list functions. To get started let’s create a simple Show function, as these are simpler.

CouchApp doesn’t come with a templating library, but a common one to use is Mustache. The syntax is superficially like Django templates, but in reality it is far less powerful. For a simple website like this, Mustache is perfect.

In the show directory of your CouchApp create a new file, test.js. As with the map/reduce functions this file contains an anonymous function. In this case the function takes two parameters, the document and the request obejct, and returns an object containing the response body and any headers.

function (doc, req) {
    // !json templates.records
    // !json templates.head
    // !json templates.foot
    // !code vendor/couchapp/lib/mustache.js

The function begins with some magic comments. These are commands to CouchDB which includes the referenced code or data in the function. This allows you to keep shared data separate from the functions that uses it.

The first !json command causes the compiler to load the file templates/records.* and add it to a templates objects, under the records attribute.

The !code command works similarly, but in loads the specified file and includes the code in your function. Here we load the Mustache library, but I have also used the function to load a javascript implementation of sprintf. You might want to load some of your own common code using this method.

    var stash = {
        head: templates.head,
        foot: templates.foot
    };

    return { body: Mustache.to_html(templates.records, stash), headers: { "Content-Type": "text/html" } };
}

Firstly we build an object containing the data we want to use in our template. As Mustache doesn’t allow you to extend templates we need to pass the header and footer HTML code in as data.

As mentioned the return type of a show function is a object containing the HTML and any HTTP headers. We only want to include the content type of the page, but you could return any HTTP header in a similar fashion. To generate the HTML we call the to_html function provided by Mustache, passing the template and the data object we prepared earlier.

Now we have data in our database and can create simple pages using a CouchApp we can move on to showing real data. In the next post I will describe the list functions use to show summarized day and month weather information.


Photo of its raining..its pouring by samantha celera.

Back Garden Weather in CouchDB (Part 1)

RainWhen she was younger my wife wanted to be a meteorologist. That didn’t pan out, but our recent move found us with a garden, which we’ve not had before. This gave me the opportunity to buy her a weather station. I didn’t just choose any old station though, I wanted one that did wind and rain as well as the usual temperature, pressure and humidity. And, the deciding factor, a USB interface with Linux support. Fortunately the excellent PyWWS supports a range of weather stations, including the one I brought.

I’m not going to go into how I mounted the system, or configured PyWWS. That’s all covered in the documentation. PyWWS can produce a static website, but as someone who earns his living building websites I wanted something a bit better. Continuing my experiments with CouchDB I decided to build the website as a CouchApp.

As well as allowing you to query your data with Javascript, CouchDB lets you display webpages directly out of your database. If you visit welwynweather.co.uk you’ll notice that you’re redirected to a url that contains url arguments that look a lot like those used to query a view. That’s because that’s exactly what’s going on. Things become clearer when you discover that that http://www.welwynweather.co.uk is an alias for http://db.welwynweather.co.uk/_design/weather/_rewrite/. Now you can see a more complete CouchDB URL, albeit without the database name. db.welwynweather.co.uk points to an Apache reverse proxy that routes requests through to CouchDB.

Over the next few posts I’ll detail how the CouchApp works, but to get started you can clone my app and poke it yourself. Once you’ve installed the couchapp command line client simply run couchapp clone http://db.welwynweather.co.uk/_design/weather. This will give you a directory, weather, that contains a number of subdirectories including templates and views which contain the complete website.

To deploy the site to your own server you need to create a database and then run couchapp push weather http://localhost:5984/welwynweather. Visiting http://localhost:5984/welwynweather/_design/weather/_rewrite/ should show you the site. You’ll need some data though, and you can use CouchDB replication to pull my data to your server. Using Futon simply set http://db.welwynweather.co.uk/ as the replication source and your database as the destination and you’ll quickly get a complete copy of the database.

When replicating my data you currently cannot use continuous replication. When it completes replication CouchDB calls POST /_ensure_full_commit, but obviously I’ve disabled POST, PUT and DELETE on my server. This causes replication to fail and to restart from the beginning. The data will already have been copied, but CouchDB will copy it again. If you have any ideas on how to avoid this, please answer my StackOverflow question.

The website consists of four main pages. When you visit you are redirected to a page that shows the weather for the current day. Clicking on the date at the top of the page lets you also view the weather by month and by year. The daily weather pages show as much detail as is recorded by the station, in my case this is an update every five minutes. The monthly page is much the same except that the values are averaged across an hour. The yearly page is a bit different as it shows a single point for each day. An average temperature for each day is not that useful so we calculate the high and low for each day and display that.

The final page is the records page. This displays information like the highest and lowest temperature ever recorded and the heaviest rain by hour and by day. The previous three pages are all fully generated by the server. The records page is a bit different though as calculating the records in one step is a bit complicated, instead we use AJAX to load each record individually. This means we can focus on each record keeping the code simple.

In the next post I’ll discuss how I import data into CouchDB and the basics of rendering a page in a CouchApp.


If you visit the site you may find that there is no recent weather data. This is because I run PyWWS on my MythTV box. Rather than running the PC all the time the weather data only updates when a programme is being recorded, or I’m watching TV.


Photo of Rain by Moyan Brenn.