CouchDB Document Cache
It’s well known that one of the best things you can do to speed up CouchDB is to use bulk inserts to add or update many documents at one time.
Bulk updates are easy to use if you’re just blindly inserting documents into the database because you can just maintain a list of documents. However, a common scheme that I often use is to call a view to determine whether a document representing an object exists, update it if it does, add a new document if it doesn’t. To help make this easier I use the DocCache class given below.
The cache contains two interesting methods, get and update. Rather than writing directly to CouchDB when you want to add or update a document just pass the document to update. This will cache the document and periodically save them in a bulk update.
It is possible that you will retrieve a document from CouchDB that an updated version exists in the cache. To avoid the possibility that changes get lost you should pass the retrieved document to get. This will either return the document you passed in or the document that’s waiting to be saved if it exists in the cache. Because there is a gap between when you ask for document to be saved and when it actually is saved any views you use may be out of date, but that’s the cost of faster updates with CouchDB.
One complicating factor in the code is that the updating process updates the documents you passed in with _id and _rev from the newly saved documents. This means you can cache documents in a your own datastructure and should you decide to save the document again you won’t get a conflict error because it will have been updated for you.
class DocCache:
def __init__(self, db, limit=1000):
self._db = db
self._cache = {}
self._new = []
self._limit = limit
self.inserted = 0
def __del__(self):
self.save()
def get(self, doc):
if "_id" in doc and doc["_id"] in self._cache:
return self._cache[doc["_id"]]
else:
return doc
def update(self, doc, force_save=False):
if "_id" in doc:
self._cache[doc["_id"]] = doc
else:
self._new.append(doc)
if force_save or len(self._cache) + len(self._new) > self._limit:
self.save()
def save(self):
docs = self._cache.values() + self._new
if len(docs) > 0:
inserted_docs = self._db.update(docs)
for doc, newdoc in zip(docs, inserted_docs):
if newdoc[0]:
doc["_id"], doc["_rev"] = newdoc[1], newdoc[2]
self.inserted += 1
self._cache = {}
self._new = []
Photo of a red couch by daveaustria.
Charming Roulette
Recently I went to a wedding which had a casino theme. To keep the guests entertained they gave every guest $100 from the Bank Of Fun to spend on the roulette and black jack tables. I decided to play roulette and I knew that the best way to maximise my chances of winning was to bet only on odd or even and to double my bet whenever I lost. At one point I was 2.6x up on my initial stake, but unfortunately, as you’d expect, I eventually lost the lot.
I want to see what I could have done to increase my peak winnings, and to try my best to leave the table with a positive cash flow. To do this we’ll simulate a roulette table using Python and try out various betting strategies. The Roulette wheel that was used at the Wedding was an American wheel and featured the numbers 1 to 36 as well as 0 and 00. Betting on odd or even will win if a number 1 to 36 comes up and it is odd or even. 0 or 00 will lose you your money. If you win your stake is doubled. This means that by betting on odd or even you stand a 47% chance of winning.
To help work out the best strategy we need to build a roulette wheel simulator. To do this we use the Python function given below. It takes four parameters and returns the amount money left at the end of the run. The first parameter is the amount of money to start with, the second is a function which takes the current amount of money and returns the bet. The next to function determine when to give up – either a limit on the number of rounds, or the amount of money to stop at. The variable wheel is a list containing 18 “odd” strings, 18 “even” strings as well as one “0″ and one “00″ string.
from random import choice
def roulette(stake, bet_func, go_limit=None, walk_away=None):
go = 0
while stake > 0 and (go_limit is None or go < go_limit) and (walk_away is None or stake < walk_away):
go += 1
bet = bet_func(stake)
if bet > stake: bet = stake
if choice(wheel) == "odd":
stake += bet
else:
stake -= bet
return stake
So, with the simulation in place let’s start working out some odds. The simplest betting strategy is to bet $1 each round. To do this we used this simple betting function.
def flat_bet(stake):
return 1
The graph below shows how likely you are to win when following this strategy for the given target. As you can see if you only want to increase you money from $100 to $101 then you’ve a 90% chance of doing this betting $1 each go. However, if you set your sights higher then your chances quickly diminish and you’ve almost no chance of making even a $40 profit.
The strategy I used was to double my bet every time I lost and reset to a $1 bet when I won. This means that on average you only stand to win $1 per round, but because your bet is doubled each win wipes out any previous loses. The code for this bet function is more complicated and we need to use a callable class to store the state of our bet.
class scale_bet:
def __init__(self, scale):
self.bet = 1
self.scale = scale
self.prev_stake = None
def __call__(self, stake):
if self.prev_stake is None or stake > self.prev_stake:
self.bet = 1
else:
self.bet *= self.scale
self.prev_stake = stake
return math.floor(self.bet)
The probably of winning is much better with the doubling strategy, and if you’re aiming for increasing your cash pile to $250 then you have a 25% chance of doing that.
The chances of winning are much better if you double your bet, but why stop at doubling? In the next test I aimed for a target of $200 and increased the scaling factor of the bet from 0.1 to 50. You can see from the graph below that increasing the scaling factor doesn’t change your chances of winning, instead it remains at about 47%.
The final chart shows the chance of reaching $200 with a bet which doubles when you lose. In this test the starting bet is set so that you have at least x goes remaining. We begin with having only one possible other bet, and go up to twenty. Despite what you might think, the chances of winning do not really change much.
So, what’s the outcome of all this? What ever you do, you’ve got a less than 50/50 chance of winning, but doubling your bet each time you lose will give a longer run before your lose your house.
Photo of a Roulette Wheel by John Wardell (Netinho).
Charts generated with Google Charts.
Interestingness
As an amateur photographer I upload all my photographs to Flickr. Most of the them are mediocre, but one or two are good enough that I think they can stand along side the photos from more professional users of Flickr.
For the same reason that I blog, I put my photos on Flickr because I feel that I have something useful or interesting to offer and to interact with new and interesting people. My blog gets between twenty and thirty visits a day – not much, but roughly the same as the number of visit I get to my photos on Flickr. The difference is that I only have twenty posts on my blog, whereas I have 2,000 photos on Flickr!
Plenty has been written about search engine optimisation for blogs, but not much has been written about SEO for Flickr. The majority of my photos have five or so tags, a title and are geotagged. Flickr does allow you to write a description and this would increase the about of text thereby giving search engines much more to go on. The key to gaining exposure on Flickr though, is to appear on Explore.
Flickr are not explicit about whether photos that appear in Explore are influenced by humans or not. They certainly imply that it’s chosen algorithmically though. If it’s chosen by computer then it should be possible to help your photos gain more exposure, beyond just taking nice photographs. If you look at the people who have their photos on Explore two things just out at you. Firstly it’s that they have a lot of contacts, and secondly that all their photos have lots of comments. You’d expect photos that appear on Explore to have a lot of comments, but typically all their photos have lots of comments. This implies two things, that you need to be active in the Flickr community, and that your contacts need to be active in looking at and commenting on your photos.
It appears that Flickr’s definition of Interestingness rewards not only excellent photos but also active community members. This is a really excellent design decision on Flickr’s part because it almost completely removes the ability to ’spam’ Explore – you do have to be active and to be producing great photos to get features.
So, how do you get your photo featured on Explore? Well, you need to be taking great photos, submitting them to groups and interacting with other users. Like the best photos, it’s hard work, with a touch of luck.
WWDC 09
It was with some trepidation that I listened in to this Monday’s Apple developer event, the WWDC keynote address. I have 16GB iPhone 3G, a current top-of-the-range model. With all speculation before the event it was clear that Apple were going to release a new model. But what were they going to include? Were they going to include the kitchen sink as some had been suggesting?
Fortunately, as the change in name would suggest, the new iPhone 3GS is an evolution rather than a revolution. Apple claim it has twice the magic which should equate to much faster application loading and probably better games too. In reality it’ll mean twice the cpu speed or twice the memory, or more likely both. It appears that the biggest change is that the iPhone 3GS contains a new graphics chick which gives it seven times the graphics throughput, Seven times!
The extra disk space that comes with a 32GB 3GS is nice, but is unlikely to be a reason to pay the extra for a 3GS. The same with voice dialling. The new phone does contain a compress, which will certainly make using the mapping application easier, and will allow for some really nice apps. When Google change Google Earth to us the compress it’ll be really nice to use.
I’m not going to pay the extra to upgrade before my contract is up, but I’ll certainly be a bit jealous of those with a new 3GS.
FlightControl Review
On Friday I download a fun little puzzle game for my iPhone, FlightControl.
The premise of the game is that you’re running air traffic approach control for a small airport and you need to arrange for the two types of passenger jets, light aircraft and helicopters to land in the appropriate places without crashing into each other. A simple concept with even simpler controls. You tap on the plane you want to direct and then drag the plane to the runway. It will then follow the path you dragged out. It’s incredibly easy to use and really lets you focus on the goal of stopping those planes from crashing.
The graphics and sounds are excellent. The game has a great cartoon feel and although the menu and ui are minimal it has a very consistent look that clearly didn’t happen by accident. The map and airport look good and there are plans to add more airports to the game which I hope will be done to a similarly high standard.
The game starts off very easy to let you get the feel for the controls but the difficulty level ramps up pretty quickly and you’ll soon have to deal with five or more planes at once. When you’ve got two planes flying at different speeds trying to land on the same runway your brain will start to melt, but in a good way.
The game features online leaderboards which is a nice touch, but like with most online stats the leaders are way out of most users reach. The current all-time top score is almost 15,000. My best is 53.
My only criticisms are that the airport is perhaps a little large which means you don’t have much room to sort your planes into stacks as you wait for them to land. The game also has an annoying habit of letting new planes enter when an existing plane is right by the edge so they crash before you can do anything. A warning icon does appear to give you time to move a plane out of the way, but it’s frustrating to lose a game in what seems like such an unfair manner. Finally I think the game could be improved by putting ticks on the planes paths so you see more easily when they well get to a certain point on the map. A small marker every five seconds of flying time would be very useful.
The game is a great pick-up-and-play title, and you won’t be able to play it just the once. With the game currently selling for a greatly reduced price it should be on every casual gamer’s iPhone.
Last.fm Chart Changes
For several years I’ve written and maintained a GreaseMonkey script which adds chart change information to your music charts. The biggest problem with a greasemonkey script is that you don’t control the page you’re modifying. Last week, for the umpteenth time, Last.fm changed their page again and broke the script.
Fortunately, I’ve fixed the script and have taken the opportunity to improve the webservice that it uses. This means that the charts should be more cachable to improve performance for you and reducing bandwidth usage for me. I’ve also added support for weekly charts so they’ll now have chart change information, as they used to before Last.fm’s most recent redesign removed it.
Finally, because my host Linode.com recently increased the disk space on all their plans by a third, I’m able to increase the length of time all charts are stored for 30 to 120. Unfortunately as I had to delete all the chart change information you won’t see a change initially but gradually you’ll see your charts are available for longer and longer.
You can download the updated script here.
Enjoy!
Strict Development
While working on my new open source project, CouchQL, I’m being very strict with my development process and following both issue driven development, and test driven development.
Issue driven development requires that every commit refers to an issue that has been logged in the bug tracking software. This means that every change must be described, accepted and then logged. This works better if your repository is connected to your bug tracking software such that any commit message with a issue number is automatically logged. In subversion this can be achieved with a post commit hook, such as this script for trac.
The connection between your commit messages and bug tracking software means that when changes are merged between branches new messages will be added to the issue, informing everyone what version of the software the issue has been fixed in. As well as just adding comments to issues it is also possible to mark bugs as fixed with commit messages such as “Fixes issue #43.” which should speed up your work flow. While Google Code does add hyperlinks between commit messages and issues, it doesn’t add automatically add comments, which is a pain.
Enforcing a development practice like this requires you to think about the changes you are making to your software, and focuses your mind one a particular goal. Bug tracking software will have the ability to assign priorities to changes as well to group them into milestones. This helps you to build up a feature list for each version of the software, and to know when you’ve achieved your goals and it’s time to release!
Test driven development is related in that before any changes to code are made a test must be written. This test must be designed to check the result of the change as closely and completely as possible. When the test passes (and all other tests still pass) the change that you were making is complete.
The benefits to this style of development are two fold. Firstly it should be easy to end up with test coverage of close to 100%. Secondly it forces you to think about the point of the change that you’re making. Combine this with the issue that you logged before you started, and you’ll really have a good idea of the scope and aim of your change before you ever touch the keyboard.
It can sometimes be hard to do this on every change you make, but the more often you do the better tested and more maintainable your code will be. Tools can really help you. If your version control system is linked to you bug tracking software then all you need to do is remember to log a bug and mention it in every commit message. A continuous integration testing tool such as buildbot makes keeping your tests complete very desirable as you’ll be notified of any breakages very quickly.
You can’t be made to follow development processes such as these, but if you understand the benefits, and want to use then they become second nature, and hopefully you’ll be a better programmer as a result.
CouchQL 0.1 released
I’ve just uploaded the first release of CouchQL. It can be installed from PyPI by typing “easy_install couchql” or you can download a tarball from Google Code.
It’s a very early release, but please play with it, break it and email me your results!
CouchQL development progressing
As I mentioned in a previous post I have been working of a library to ease the creation of map/reduce views in CouchDB.
The code is being hosted on google code and can be checked out and used now. The development is currently at a very early stage, but the fundamentals are sound.
Code such that given below will work. In this example it will return all the documents with a member ‘x’ whoes value is greater than one.
c = db.cursor()
c.execute("SELECT * FROM _ WHERE x > %s", (1, ))
for doc in c.fetchall():
# process doc
The code is executed as a temporary view, but very high on my list is to use permanent views for much higher performance. This will be added before a first release, as will the ability to have multiple expressions anded together in the where clause.
Leave a Comment
Leave a Comment
Leave a Comment



