Naming Screen Sessions

I develop a number of Django-powered websites at work, and usually I want to leave them running when I’m not working on them so others can check out my progress and give me suggestions. The Django development server is incredibly useful when developing, but it’s not detached from the terminal so as soon as you log out the server gets switched off. One alternative is to run the website under Apache, as you would deploy it normally. This solves the problem of leaving the website running, but makes it much harder to develop with.

A third option is the GNU program Screen. When run without arguments screen puts you into a new bash session. Pressing Ctrl+d drops you back out to where you were. The magic occurs when you press Ctrl+a d. This drops you back out, but the bash session is stilling running! By typing screen -r you’ll reattach to the session and can carry on working as before. You can leave it as long as you like between detaching and reattaching to a session, as long as the computer is still running.

It is possible to run multiple screen sessions at once, perhaps with a different Django development server running in each. Unfortunately screen will only reattach automatically when there is just one detached session. If you have more than one then you’ll be confronted by a cryptic series of numbers that uniquely identifies each session. You can reattach to a specific session you can type screen -r <pid>.

To make things easier to reattach to the session that I’m working on I give these sessions name so rather than a cryptic series of numbers I see a useful set of names. To do this you just need to type Ctrl + A : sessioname <name>.

There are plenty of other useful things that screen can do, but named sessions is by far and away the most common one that I use.

Where GitHub (Possibly) Went Wrong

8 Forks by bitzceltWhile on my delayed train this morning I was listening to episode 80 of the excellent Stack Overflow podcast. In this episode Jeff Atwood was complaining to Joel Spolsky about his problems with GitHub.

GitHub is a social coding site, along the same lines as Sourceforge or Google Code, but focused entirely on the distributed version control system Git. Where GitHub differs from the other project hosting sites, and where I think Jeff’s confusion comes from is that with GitHub the primary structure on their site is that of the developer, not of the project. They treat every developer as a rock star, who is bigger than the projects that they work on.

GitHub makes it incredibly easy to take a codebase, make your own changes and to publish them to world. What GitHub fails to do is to encourage people to collaborate together to push one code base forward. What I’m not suggestion is that branching is a bad idea. Branching code is a useful coding technique which can be used to separate in-development features from other changes until the code has stabilised again. What GitHub focuses on is the changes that an individual developer makes, not the changes required for a particular feature.

dewy branch by calliopeWhen a developer creates a copy of some code of GitHub they get a wiki and an issue tracker as well. This further confuses matters because not only do you have trouble knowing which git tree is the correct one to pull from, but you also don’t know where to report bugs or go to for documentation.

Google Code seems to be in a better position for combining distributed version control with project management. They have an excellent wiki and issue tracker, and give each project a straightforward and simple homepage. You can also use Mercurial, which is similar to git, as your version control system. All that they need to do is allow developers to publish their own changes, but in a markedly separate section to the core code of the project.

I can see how GitHub is nice for developers, but in any mildly successful open source project the number of users vastly outweighs the number of developers. It seems crazy to me to make your primary web presence suited only for the minority of people who are involved with the project.


Photo of 8 Forks by bitzcelt.

Photo of dewy branch by calliope.

Searching Stemmed Fields With Whoosh

WORDS by FeuilluWhoosh is quite a nice pure-python full text search engine. While it is still being actively developed and is suitable for production usage there are still some rough edges. One problem that stumped me for a while was searching stemmed fields.

Stemming is where you take the endings off words, such as ‘ings’ on the word endings. This reduces the accuracy of searches but greatly increases the chances of users finding something related to what they were looking for.

To create a stemmed field you need to tell Whoosh to use the StemmingAnalyzer, as shown in the schema definition below.

from whoosh.analysis import StemmingAnalyzer
from whoosh.fields import Schema, TEXT, ID

schema = Schema(id=ID(stored=True, unique=True),
                       text=TEXT(analyzer=StemmingAnalyzer()))

Using the StemmingAnalyzer will cause Whoosh to stem every word before it is added to the index. If you use the shortcut search function to search with a word that should be stemmed it will return no results, as that word does not exist in the index, even though it was included in the data that was indexed.

To correctly search a stemmed index you must parse the query and tell the parse to use the Variations term class. The causes the words in the query to also be stemmed, so they correctly match words in the stemmed index.

searcher = ix.searcher()
qp = QueryParser("text", schema=schema, termclass=Variations)
parsed = qp.parse(query)
docs = searcher.search(parsed)

Photo of words by feuilllu.

Custom Podcasts With MythTV

I love listening to both BBC Radio 4 and BBC 6 Music. Like the rest of the BBC radio stations a significant proportion of the shows are available as a podcast. Unfortunately this is not true of all the shows, and for those that feature music such as Adam & Joe or Steve Lamacq the podcasts are talking only.

I watch almost all of TV through MythTV which records all of my favourite shows automatically while on my way to work I like to listen to podcasts that are downloaded automatically by iTunes. Would it be possible to automatically record shows with MythTV that aren’t available as podcasts and sync them to my iPhone automatically?

Recording a radio show with MythTV is no different to recording a TV show so that’s not a problem. MythTV also provides the ability to run a script after certain shows have been recorded. All that is required is a script that converts the recording into an mp3 file and to build an RSS feed which can be read by iTunes.

First we need to convert the recorded file into an mp3, which is easy to do with the ffmpeg program.

#!/usr/bin/python
# -*- coding: utf-8 -*-

from datetime import date, datetime
import glob
import MySQLdb
import os
import sys

input = sys.argv[1]
input_filename = input.split("/")[-1]
output_filename = input_filename.split(".")[0] + ".mp3"

os.system("ffmpeg -y -i %s -acodec libmp3lame -ab 128k /var/www/localhost/htdocs/podcasts/%s > /dev/null" % (input, output_filename))

Next up we need write out the RSS feed that iTunes will read. We start off by opening the file and writing out some boiler plate code.

fp = open("/var/www/localhost/htdocs/podcasts/feed.rss", "w")
fp.write("""<?xml version="1.0" encoding="UTF-8"?>

<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0">
  <channel>
       <title>MythTV Recorded Radio</title>
       <description>Radio Recorded By MythTV</description>
       <link>http://192.168.0.8/podcasts/</link>
       <language>en-us</language>
       <lastBuildDate>%(datetime)s</lastBuildDate>
       <pubDate>%(datetime)s</pubDate>
       <webMaster>andrewjwilkinson@gmail.com</webMaster>

       <itunes:image href="http://192.168.0.8/podcasts/stevelamacq.jpg"/>

       <itunes:category text="Technology">
           <itunes:category text="Podcasting"/>
       </itunes:category>
""" % { "datetime": datetime.now().ctime() })

Finally we need to write out a small bit of XML for each file that’s in our directory waiting to be downloaded. We do this by looking at each mp3 file in the podcasts directory and looking for the appropriate entry in MythTV’s recorded table. If an entry doesn’t exist then recording has been deleted and we delete the mp3 file.

db = MySQLdb.connect(user="mythtv", passwd="mythtv", db="mythconverg")

for radio_file in glob.glob("/var/www/localhost/htdocs/podcasts/*.mp3"):
    output = radio_file.split("/")[-1]
    size = len(open(radio_file, "rb").read())

    c = db.cursor()
    c.execute("SELECT title, description, starttime FROM recorded WHERE basename=%s", (output.split(".")[0] + ".mpg", ))
    row = c.fetchone()
    if row is None:
        os.remove(radio_file)
        continue

    title, description, starttime = row

    fp.write("""       <item>
           <title>%(title)s - %(datetime)s</title>
           <link>http://192.168.0.8/podcasts/%(output)s</link>
           <guid>http://192.168.0.8/podcasts/%(output)s</guid>
           <description>%(description)s</description>
           <enclosure url="http://192.168.0.8/podcasts/%(output)s" length="%(output_size)s" type="audio/mpeg"/>
           <category>Podcasts</category>
           <pubDate>%(datetime)s</pubDate>
       </item>""" % { "title": title, "description": description, "datetime": starttime, "output": output, "output_size": size })

fp.write("""
    </channel>
</rss>
""")

To use this put all three bits of code into one file, save it somewhere and mark it as executable. Next set up Apache to serve the directory /var/www/localhost/htdocs/podcasts/ as /podcasts. Finally you need to set up the script to run automatically after a program you want to create a podcast from has been recorded. To do this run mythtv-setup and select the ‘general’ menu option. Move through the screens until you reach ‘Job Queue (Job Commands)’. Add a brief description of the script in the ‘description’ field then enter the <path to script> %s. Then use the normal MythTV frontend and edit the recording schedules to make the correct User Script run.

Point iTunes as http://you.ip.address/podcasts/feed.rss and it’ll automatically download any new recordings.

CouchDB Document Cache

Red couch by daveaustriaIt’s well known that one of the best things you can do to speed up CouchDB is to use bulk inserts to add or update many documents at one time.

Bulk updates are easy to use if you’re just blindly inserting documents into the database because you can just maintain a list of documents. However, a common scheme that I often use is to call a view to determine whether a document representing an object exists, update it if it does, add a new document if it doesn’t. To help make this easier I use the DocCache class given below.

The cache contains two interesting methods, get and update. Rather than writing directly to CouchDB when you want to add or update a document just pass the document to update. This will cache the document and periodically save them in a bulk update.

It is possible that you will retrieve a document from CouchDB that an updated version exists in the cache. To avoid the possibility that changes get lost you should pass the retrieved document to get. This will either return the document you passed in or the document that’s waiting to be saved if it exists in the cache. Because there is a gap between when you ask for document to be saved and when it actually is saved any views you use may be out of date, but that’s the cost of faster updates with CouchDB.

One complicating factor in the code is that the updating process updates the documents you passed in with _id and _rev from the newly saved documents. This means you can cache documents in a your own datastructure and should you decide to save the document again you won’t get a conflict error because it will have been updated for you.

class DocCache:
    def __init__(self, db, limit=1000):
        self._db = db
        self._cache = {}
        self._new = []
        self._limit = limit
        self.inserted = 0

    def __del__(self):
        self.save()

    def get(self, doc):
        if "_id" in doc and doc["_id"] in self._cache:
            return self._cache[doc["_id"]]
        else:
            return doc

    def update(self, doc, force_save=False):
        if "_id" in doc:
            self._cache[doc["_id"]] = doc
        else:
            self._new.append(doc)

        if force_save or len(self._cache) + len(self._new) &gt; self._limit:
            self.save()

    def save(self):
        docs = self._cache.values() + self._new
        if len(docs) > 0:
            inserted_docs = self._db.update(docs)
            for doc, newdoc in zip(docs, inserted_docs):
                if newdoc[0]:
                    doc["_id"], doc["_rev"] = newdoc[1], newdoc[2]
                    self.inserted += 1
            self._cache = {}
            self._new = []

Photo of a red couch by daveaustria.

Charming Roulette

Roulette WheelRecently I went to a wedding which had a casino theme. To keep the guests entertained they gave every guest $100 from the Bank Of Fun to spend on the roulette and black jack tables. I decided to play roulette and I knew that the best way to maximise my chances of winning was to bet only on odd or even and to double my bet whenever I lost. At one point I was 2.6x up on my initial stake, but unfortunately, as you’d expect, I eventually lost the lot.

I want to see what I could have done to increase my peak winnings, and to try my best to leave the table with a positive cash flow. To do this we’ll simulate a roulette table using Python and try out various betting strategies. The Roulette wheel that was used at the Wedding was an American wheel and featured the numbers 1 to 36 as well as 0 and 00. Betting on odd or even will win if a number 1 to 36 comes up and it is odd or even. 0 or 00 will lose you your money. If you win your stake is doubled. This means that by betting on odd or even you stand a 47% chance of winning.

To help work out the best strategy we need to build a roulette wheel simulator. To do this we use the Python function given below. It takes four parameters and returns the amount money left at the end of the run. The first parameter is the amount of money to start with, the second is a function which takes the current amount of money and returns the bet. The next to function determine when to give up – either a limit on the number of rounds, or the amount of money to stop at. The variable wheel is a list containing 18 “odd” strings, 18 “even” strings as well as one “0″ and one “00″ string.

from random import choice

def roulette(stake, bet_func, go_limit=None, walk_away=None):
    go = 0
    while stake > 0 and (go_limit is None or go < go_limit) and (walk_away is None or stake < walk_away):
        go += 1
        bet = bet_func(stake)
        if bet > stake: bet = stake
        if choice(wheel) == "odd":
            stake += bet
        else:
            stake -= bet
    return stake

So, with the simulation in place let’s start working out some odds. The simplest betting strategy is to bet $1 each round. To do this we used this simple betting function.

def flat_bet(stake):
     return 1

The graph below shows how likely you are to win when following this strategy for the given target. As you can see if you only want to increase you money from $100 to $101 then you’ve a 90% chance of doing this betting $1 each go. However, if you set your sights higher then your chances quickly diminish and you’ve almost no chance of making even a $40 profit.

Constant $1 bet with an increasing target

The strategy I used was to double my bet every time I lost and reset to a $1 bet when I won. This means that on average you only stand to win $1 per round, but because your bet is doubled each win wipes out any previous loses. The code for this bet function is more complicated and we need to use a callable class to store the state of our bet.

class scale_bet:
    def __init__(self, scale):
        self.bet = 1
        self.scale = scale
        self.prev_stake = None
    def __call__(self, stake):
        if self.prev_stake is None or stake > self.prev_stake:
            self.bet = 1
        else:
            self.bet *= self.scale
        self.prev_stake = stake
        return math.floor(self.bet)

The probably of winning is much better with the doubling strategy, and if you’re aiming for increasing your cash pile to $250 then you have a 25% chance of doing that.

Doubling bet with a $1 reset and an increasing target

The chances of winning are much better if you double your bet, but why stop at doubling? In the next test I aimed for a target of $200 and increased the scaling factor of the bet from 0.1 to 50. You can see from the graph below that increasing the scaling factor doesn’t change your chances of winning, instead it remains at about 47%.

Chances of reaching $200 with an increasing scaled bet and a $1 reset

The final chart shows the chance of reaching $200 with a bet which doubles when you lose. In this test the starting bet is set so that you have at least x goes remaining. We begin with having only one possible other bet, and go up to twenty. Despite what you might think, the chances of winning do not really change much.

Chances of reaching $200 with an doubling bet and an increasing reset

So, what’s the outcome of all this? What ever you do, you’ve got a less than 50/50 chance of winning, but doubling your bet each time you lose will give a longer run before your lose your house.


Photo of a Roulette Wheel by John Wardell (Netinho).
Charts generated with Google Charts.

Testing A Facebook Connect Site

I’ve been developing a website in my spare time. Because I want to add plenty of social features it makes sense to let users login using Facebook Connect. The Facebook platform is by far the most successful social platform with many developers having created applications and websites that use it. I expected that the experience for developers would be a good one. Unfortunately, I was disappointed.

Facebook makes it easy to register an application and provide links to libraries that wrap their API and make it easy to get started. What Facebook don’t provide however is a downloadable version of their API to test with. Facebook have made some effort to support test users, but you have to open ports in your firewall and use your real facebook account to test with. Testing a new user signing up for your app is really quite a chore. Automating this sort of test is essentially impossible.

In an ideal world Facebook would produce a downloadable program that you can use to automatically user, programmatically log in users and generally automatically test all the parts of your code. The danger is that they’d have to give you a downloadable copy of their website code. Google App Engine give a similar downloadable environment, and you can’t say that Google don’t have a load of code that they don’t want to give away!

The Facebook API is pretty simple to get started with, and with in a couple of minutes you’ll have the code written to log a user in. Checking that it all works though, is a much tougher challenge…

Interestingness

As an amateur photographer I upload all my photographs to Flickr. Most of the them are mediocre, but one or two are good enough that I think they can stand along side the photos from more professional users of Flickr.

For the same reason that I blog, I put my photos on Flickr because I feel that I have something useful or interesting to offer and to interact with new and interesting people. My blog gets between twenty and thirty visits a day – not much, but roughly the same as the number of visit I get to my photos on Flickr. The difference is that I only have twenty posts on my blog, whereas I have 2,000 photos on Flickr!

Plenty has been written about search engine optimisation for blogs, but not much has been written about SEO for Flickr. The majority of my photos have five or so tags, a title and are geotagged. Flickr does allow you to write a description and this would increase the about of text thereby giving search engines much more to go on. The key to gaining exposure on Flickr though, is to appear on Explore.

Flickr are not explicit about whether photos that appear in Explore are influenced by humans or not. They certainly imply that it’s chosen algorithmically though. If it’s chosen by computer then it should be possible to help your photos gain more exposure, beyond just taking nice photographs. If you look at the people who have their photos on Explore two things just out at you. Firstly it’s that they have a lot of contacts, and secondly that all their photos have lots of comments. You’d expect photos that appear on Explore to have a lot of comments, but typically all their photos have lots of comments. This implies two things, that you need to be active in the Flickr community, and that your contacts need to be active in looking at and commenting on your photos.

It appears that Flickr’s definition of Interestingness rewards not only excellent photos but also active community members. This is a really excellent design decision on Flickr’s part because it almost completely removes the ability to ’spam’ Explore – you do have to be active and to be producing great photos to get features.

So, how do you get your photo featured on Explore? Well, you need to be taking great photos, submitting them to groups and interacting with other users. Like the best photos, it’s hard work, with a touch of luck.

WWDC 09

It was with some trepidation that I listened in to this Monday’s Apple developer event, the WWDC keynote address. I have 16GB iPhone 3G, a current top-of-the-range model. With all speculation before the event it was clear that Apple were going to release a new model. But what were they going to include? Were they going to include the kitchen sink as some had been suggesting?

Fortunately, as the change in name would suggest, the new iPhone 3GS is an evolution rather than a revolution. Apple claim it has twice the magic which should equate to much faster application loading and probably better games too. In reality it’ll mean twice the cpu speed or twice the memory, or more likely both. It appears that the biggest change is that the iPhone 3GS contains a new graphics chick which gives it seven times the graphics throughput, Seven times!

The extra disk space that comes with a 32GB 3GS is nice, but is unlikely to be a reason to pay the extra for a 3GS. The same with voice dialling. The new phone does contain a compress, which will certainly make using the mapping application easier, and will allow for some really nice apps. When Google change Google Earth to us the compress it’ll be really nice to use.

I’m not going to pay the extra to upgrade before my contract is up, but I’ll certainly be a bit jealous of those with a new 3GS.

FlightControl Review

On Friday I download a fun little puzzle game for my iPhone, FlightControl.

The premise of the game is that you’re running air traffic approach control for a small airport and you need to arrange for the two types of passenger jets, light aircraft and helicopters to land in the appropriate places without crashing into each other. A simple concept with even simpler controls. You tap on the plane you want to direct and then drag the plane to the runway. It will then follow the path you dragged out. It’s incredibly easy to use and really lets you focus on the goal of stopping those planes from crashing.

The graphics and sounds are excellent. The game has a great cartoon feel and although the menu and ui are minimal it has a very consistent look that clearly didn’t happen by accident. The map and airport look good and there are plans to add more airports to the game which I hope will be done to a similarly high standard.

The game starts off very easy to let you get the feel for the controls but the difficulty level ramps up pretty quickly and you’ll soon have to deal with five or more planes at once. When you’ve got two planes flying at different speeds trying to land on the same runway your brain will start to melt, but in a good way.

The game features online leaderboards which is a nice touch, but like with most online stats the leaders are way out of most users reach. The current all-time top score is almost 15,000. My best is 53.

My only criticisms are that the airport is perhaps a little large which means you don’t have much room to sort your planes into stacks as you wait for them to land. The game also has an annoying habit of letting new planes enter when an existing plane is right by the edge so they crash before you can do anything. A warning icon does appear to give you time to move a plane out of the way, but it’s frustrating to lose a game in what seems like such an unfair manner. Finally I think the game could be improved by putting ticks on the planes paths so you see more easily when they well get to a certain point on the map. A small marker every five seconds of flying time would be very useful.

The game is a great pick-up-and-play title, and you won’t be able to play it just the once. With the game currently selling for a greatly reduced price it should be on every casual gamer’s iPhone.

Next Page »