Archive for the ‘python’ Category
For Christmas my wife and I brought each other a new FitBit One device (Amazon affiliate link included). These are small fitness tracking devices that monitor the number of steps you take, how high you climb and how well you sleep. They’re great for providing motivation to walk that extra bit further, or to take the stairs rather than the lift.
I’ve only had the device for less than a week, but already I’m feeling the benefit of the gamification on FitBit.com. As well as monitoring your fitness it also provides you with goals, achievements and competitions against your friends. The big advantage of the FitBit One over the previous models is that it syncs to recent iPhones, iPads, as well as some Android phones. This means that your computer doesn’t need to be on, and often it will sync without you having to do anything. In the worst case you just have to open the FitBit app to update your stats on the website. Battery life seems good, at about a week.
The FitBit apps sync your data directly to FitBit.com, which is great for seeing your progress quickly. They also provide an API for developers to provide interesting ways to process the data captured by the FitBit device. One glaring omission from the API is any way to get access to the minute by minute data. For a fee of $50 per year you can become a Premium member which allows you do to a CSV export of the raw data. Holding the data, collected by a user hostage is deeply suspect and FitBit should be ashamed of themselves for making this a paid for feature. I have no problem with the rest of the features in the Premium subscription being paid for, but your own raw data should be freely available.
The FitBit API does have the ability to give you the intraday data, but this is not part of the open API and instead is part of the ‘Partner API’. This does not require payment, but you do need to explain to FitBit why you need access to this API call and what you intend to do with it. I do not believe that they would give you access if your goal was to provide a free alternative to the Premium export function.
So, has the free software community provided a solution? A quick search revealed that the GitHub user Wadey had created a library that uses the urls used by the graphs on the FitBit website to extract the intraday data. Unfortunately the library hadn’t been updated in the last three years and a change to the FitBit website had broken it.
Fortunately the changes required to make it work are relatively straightforward, so a fixed version of the library is now available as andrewjw/python-fitbit. The old version of the library relied on you logging into to FitBit.com and extracting some values from the cookies. Instead I take your email address and password and fake a request to the log in page. This captures all of the cookies that are set, and will only break if the log in form elements change.
Another change I made was to extend the example dump.py script. The previous version just dumped the previous day’s values, which is not useful if you want to extract your entire history. In my new version it exports data for every day that you’ve been using your FitBit. It also incrementally updates your data dump if you run it irregularly.
If you’re using Windows you’ll need both Python and Git installed. Once you’ve done that check out my repository at github.com/andrewjw/python-fitbit. Lastly, in the newly checked out directory run python examples/dump.py <email> <password> <dump directory>.
Although PyV8 has a wiki page entitled How To Build it’s not simple to get the project built. They recommend using prebuilt packages, but there are none for recent version of Ubuntu. In this post I’ll describe how to build it on Ubuntu 11.11 and give a simple example of it in action.
The first step is make sure you have the appropriate packages. There may be others that are required and not part of the default install, but there are what I had to install.
sudo aptitude install scons libboost-python-dev
Next you need to checkout both the V8 and PyV8 projects using the commands below.
svn checkout http://v8.googlecode.com/svn/trunk/ v8 svn checkout http://pyv8.googlecode.com/svn/trunk/ pyv8
The key step before building PyV8 is to set the V8_HOME environment variable to the directory where you checked out the V8 code. This allows PyV8 to patch V8 and build it as a static library rather than the default dynamic library. Once you’ve set that you can use the standard Python setup.py commands to build and install the library.
cd v8 export PyV8=`pwd` cd ../pyv8 python setup.py build sudo python setup.py install
>>> import PyV8 >>> ctxt = PyV8.JSContext()
Recently I was taking part in a review of some Python code. One aspect of the code really stuck out to me. It’s not a structural issue, but a minor change in programming style that can greatly improve the maintainability of the code.
The code in general was quite good, but a code snippet similar to that given below jumped right to the top of my list of things to be fixed. Why is this so bad? Let us first consider what exceptions are and why you might use them in Python.
try: // code except Exception, e: // error handling code
Exceptions are a way of breaking out the normal program flow when an ‘exceptional’ condition arises. Typically this is used when errors occur, but exceptions can also be used as an easy way to break out of normal flow during normal but unusual conditions. In a limited set of situations it can make program flow clearer.
What does this code do though? It catches all exceptions, runs the error handling code and continues like nothing has happened. In all probability it’s only one or two errors that are expected and should be handled. Any other errors should be passed on a cause the program to actually crash so it can be debugged properly.
Let’s consider the following code:
analysis_type = 1 try: do_analysis(analysis_typ) except Exception, e: cleanup()
This code has a bug, the missing e in the do_analysis call. This will raise a NameError that will be immediately captured and hidden. Other, more complicated errors could also occur and be hidden in the same way. This sort of masking will make tracking down problems like this very difficult.
To improve this code we need to consider what errors we expect the do_analysis function to raise and what we want to handle. In the ideal case it would raise an AnalysisError and then we would catch that.
analysis_type = 1 try: do_analysis(analysis_typ) except AnalysisError, e: cleanup()
In the improved code the NameError will pass through and be picked up immediately. It is likely that the cleanup function needs to be run whether or not an error has occurred. To do that we can move the call into a finally block.
analysis_type = 1 try: do_analysis(analysis_typ) except AnalysisError, e: // display error message finally: cleanup()
This allows us to handle a very specific error and ensure that we clean up whatever error happens. Sometimes cleaning up whatever the exception (or in the event of no exception) is required, and in this case the finally block, which is always run, is the right place for this code.
Let’s now consider a different piece of code.
try: do_analysis(analysis_types[index]) except KeyError: // display error message
We’re looking up the parameter to do_analysis in a dictionary and catching the case where index doesn’t exist. This code is also capturing too much. Not because the exception is too general, but because there is too much code in the try block.
The issue with this code is what happens if do_analysis raises a KeyError? To capture the exceptions that we’re expecting we need to only wrap the dictionary lookup in and not catch anything from the analysis call.
try: analysis_type = analysis_types[index] except KeyError: // display error message finally: do_analysis(analysis_type)
So, if I’m reviewing your code don’t be afraid to write a few extra lines in order to catch the smallest, but correct, set of exceptions.
There are a number of tools for checking whether your Python code meets a coding standard. These include pep8.py, PyChecker and PyLint. Of these, PyLint is the most comprehensive and is the tool which I prefer to use as part of my buildbot checks that run on every commit.
PyLint works by parsing the Python source code itself and checking things like using variables that aren’t defined, missing doc strings and a large array of other checks. A downside of PyLint’s comprehensiveness is that it runs the risk of generating false positives. As it parses the source code itself it struggles with some of Python’s more dynamic features, in particular metaclasses, which, unfortunately, are a key part of Django. In this post I’ll go through the changes I make to the standard PyLint settings to make it more compatible with Django.
This line disables a few problems that are picked up entirely. W0403 stops relative imports from generating a warning, whether you want to disable these or not is really a matter of personal preference. Although I appreciate why there is a check for this, I think this is a bit too picky. W0232 stops a warning appearing when a class has no __init__ method. Django models will produce this warning, but because they’re metaclasses there is nothing wrong with them. Finally, E1101 is generated if you access a member variable that doesn’t exist. Accessing members such as id or objects on a model will trigger this, so it’s simplest just to disable the check.
These makes the output of PyLint easier to parse by Buildbot, if you’re not using it then you probably don’t need to include these lines.
Apart from a limited number of names PyLint tries to enforce a minimum size of three characters in a variable name. As qs is such a useful variable name for a QuerySet I force this be allowed as a good name.
The last change I make is to allow much longer lines. By default PyLint only allows 80 character long lines, but how many people have screens that narrow anymore? Even the argument that it allows you to have two files side by side doesn’t hold water in this age where multiple monitors for developers are the norm.
PyLint uses the exit code to indicate what errors occurred during the run. This confuses Buildbot which assumes that a non-zero return code means the program failed to run, even when using the PyLint buildstep. To work around this I use a simple management command to duplicate the pylint program’s functionality but that doesn’t let the return code propagate back to Builtbot.
from django.core.management.base import BaseCommand from pylint import lint class Command(BaseCommand): def handle(self, *args, **options): lint.Run(list(args + ("--rcfile=../pylint.cfg", )), exit=False)
If you’re writing a Python program that doesn’t have a text-based user interface (either it’s a GUI or runs as part of another program, e.g. a webserver) then you should avoid using the print statement. It’s tempting to use print to fill the console with information about what your program is up to. For code of any size though, this quickly devolves into a hard to navigate mess.
Python’s standard library contains a module, logging, that lets you write code to log as much information as you like and configure what you bits you are interested in at runtime.
There are two concepts that you need to understand with logging. Firstly there is the logging level. This is how you determine how important the message is. The levels range from debug as the least important, through info, warning, error, critical to exception, the most important. Secondly there is the logger. This allows you divide your messages into groups depending on the part of your code they relate to. For example, you might have a gui logger and a data logger.
The logging comes with a series of module level functions by each of the names of the logging levels. These make it quick and easy to log a message using the default logger.
logging.debug("Debug message") logging.error("Error retrieving %s", url)
The second of these two lines has more than one argument. In this case the logging module will treat the first argument as a format string and the rest as arguments to the format, so that line equivalent to this one.
logging.error("Error retrieving %s" % (url, ))
If you try to treat the logging code like you would a print statement and write logging.error("Error retrieving", url) then you’ll get the following, very unhelpfui, error message.
Traceback (most recent call last): File "/usr/lib/python2.6/logging/__init__.py", line 776, in emit msg = self.format(record) File "/usr/lib/python2.6/logging/__init__.py", line 654, in format return fmt.format(record) File "/usr/lib/python2.6/logging/__init__.py", line 436, in format record.message = record.getMessage() File "/usr/lib/python2.6/logging/__init__.py", line 306, in getMessage msg = msg % self.args TypeError: not all arguments converted during string formatting
Notice how this exception doesn’t tell you where the offending logging statement is in your code! Now you know the type of error that will cause this that will help in tracking the problem down, but there is more than can be done to help you find it. The logging library allows you to specify a global error handle, which combined with the print stack trace function will give you a much better error message.
import logging import traceback def handleError(self, record): traceback.print_stack() logging.Handler.handleError = handleError
Loggers are created by calling logging.getLogger('loggername'). This returns an object with the same set of log level functions as the module, but which can be controlled independently. For example:
gui_log = logging.getLogger('gui') gui_log.debug("created window") data_log = logging.getLogger('gui') data_log.debug("loaded file")
Where this comes in really handy is when you set the level of messages that you want to see independently for each logger. In the next code block we set the logging module so we’ll see lots of debugging messages from the GUI and only errors from the data layer. Although here we’re setting the levels directly in code, it’s not a big jump to make them configurable using a command line option.
The logging module also lets you configure how your messages are formatted, and to direct them to files rather than the console. Hopefully this short guide is useful, let me know in the comments!
Recently I’ve been reading the classic book by Richard Dawkins, The Blind Watchmaker. In it he begins by discussing how evolution can produce complex systems from only a few simple rules. He demonstrates this using a simple tree drawing algorithm in which a few ‘genes’ control aspects such as the number of branches and the branch angle. The trees are evolved solely through mutation of an initial tree, rather combing the ‘genes’ of two trees to produce a child, and introducing mutations in those children.
In reality evolution is driven by pressures from the environment on the genes and those that produce the fittest host will survive. As this is early in the book though Dawkins uses himself as the environment and manually picks the most visually appealing trees.
Although the book is essentially timeless as although new evidence is continually being found in favour of evolution, the general thrust remains true. The passages where he talks about his computer, however, have dated horribly (which is not surprising given it was first published in 1986!). In this post I’ll describe how to recreate the section where he describes evolving trees in Python so you can create your own trees on your pc.
As with the book our trees will be controlled by nine genes, each of which is an integer. Dawkins doesn’t state what the nine genes do as for his purposes that would confuse matters, but for us it’s vital. Fortunately figure three allows us to work out what genes one, five, seven and nine do for ourselves
- Horizontal scaling
- Number of branches per level
- Length of first branch
- Scaling factor for length of subsequent branches
- Vertical scaling
- Angle of first branch
- Angle of branching
- Scaling factor for angle of branching
- Levels of branching
The gene descriptions in bold are those that I deduced from the book, the others are ones I decided on myself. The first step in writing a program like this is to decide exactly how these genes will affect the drawn tree. To do this we create a series of functions, one for each gene, that converts the gene value into a value that can be used by the drawing code. These functions are given below.
horiz_scaling = lambda dna: (dna+10.0)/10.0 branches = lambda dna: dna initial_length = lambda dna: dna + 10 length_scaling = lambda dna: (dna+10.0)/10.0 vert_scaling = lambda dna: (dna+10.0)/10.0 initial_angle = lambda dna: dna/10.0 initial_angle_of_branching = lambda dna: 1.0+dna/5.0 change_in_angle_between_branches = lambda dna: dna/5.0 max_levels = lambda dna: dna
These functions are used by the draw_branch function which renders a single line, and recursively calls itself to draw the next level of branches.
def draw_branch(img, dna, level, start, angle, length, angle_between_all_branches): end = (start + math.sin(angle) * length * horiz_scaling(dna), start - math.cos(angle) * length * vert_scaling(dna)) img.line(start + end, (0, 0, 0)) if level >= max_levels(dna): return else: branch_angle = angle - angle_between_all_branches/2.0 angle_between_branches = 0 if branches(dna) == 0 else angle_between_all_branches/branches(dna) for i in range(branches(dna)+1): draw_branch(img, dna, level+1, end, branch_angle + angle_between_branches*i, length*length_scaling(dna), angle_between_all_branches + change_in_angle_between_branches(dna))
To start the drawing process off we need a function, draw_tree, which calls the branch drawing function with the initial values for the length of branch and angle between the subbranches.
def draw_tree(img, dna): draw_branch(img, dna, 0, (50, 70), initial_angle(dna), initial_length(dna), initial_angle_of_branching(dna))
Now we can draw a tree a we need to be able to generate the children of tree, which we do by picking a gene and either incrementing or decrementing it. A couple of genes make no sense if they are negative so they code prevents these from going below zero.
def evolve(dna): gene = random.choice(range(9)) if (gene in [1, 8] and dna[gene] == 0) or random.random() < 0.5: dna[gene] += 1 else: dna[gene] -= 1 return dna
If we combine these functions with a simple TK-based interface, as shown below, we exactly the abilities described in The Blind Watchmaker book. Nine possible trees are displayed, when the user clicks on one nine new children are created and displayed.
Without more details about the original program it’s hard to recreate it exactly, but this program is a decent starting point. Happy evolving!
To run this program yourself you’ll need to download and install Python from python.org and the PIL image library. Next code the sourcecode below into a file called “darwkins_trees.py” and double click on it.
#!/usr/bin/env python # Copyright <year> <copyright holder>. All rights reserved. # # Redistribution and use in source and binary forms, with or without modification, are # permitted provided that the following conditions are met: # # 1. Redistributions of source code must retain the above copyright notice, this list of # conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright notice, this list # of conditions and the following disclaimer in the documentation and/or other materials # provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY <COPYRIGHT HOLDER> ``AS IS'' AND ANY EXPRESS OR IMPLIED # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> OR # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # # The views and conclusions contained in the software and documentation are those of the # authors and should not be interpreted as representing official policies, either expressed # or implied, of Andrew Wilkinson. import math, os, random, sys from Tkinter import * import Image, ImageTk, ImageDraw image_size = (100, 100) samples = 9 horiz_scaling = lambda dna: (dna+10.0)/10.0 branches = lambda dna: dna initial_length = lambda dna: dna + 10 length_scaling = lambda dna: (dna+10.0)/10.0 vert_scaling = lambda dna: (dna+10.0)/10.0 initial_angle = lambda dna: dna/10.0 initial_angle_of_branching = lambda dna: 1.0+dna/5.0 change_in_angle_between_branches = lambda dna: dna/5.0 max_levels = lambda dna: dna def draw_branch(img, dna, level, start, angle, length, angle_between_all_branches): end = (start + math.sin(angle) * length * horiz_scaling(dna), start - math.cos(angle) * length * vert_scaling(dna)) img.line(start + end, (0, 0, 0)) if level >= max_levels(dna): return else: branch_angle = angle - angle_between_all_branches/2.0 angle_between_branches = 0 if branches(dna) == 0 else angle_between_all_branches/branches(dna) for i in range(branches(dna)+1): draw_branch(img, dna, level+1, end, branch_angle + angle_between_branches*i, length*length_scaling(dna), angle_between_all_branches + change_in_angle_between_branches(dna)) def draw_tree(img, dna): draw_branch(img, dna, 0, (50, 70), initial_angle(dna), initial_length(dna), initial_angle_of_branching(dna)) def evolve(dna): gene = random.choice(range(9)) if (gene in [1, 8] and dna[gene] == 0) or random.random() < 0.5: dna[gene] += 1 else: dna[gene] -= 1 return dna class Application(Frame): def __init__(self, master=None): Frame.__init__(self, master) self.dna = [0, 1, 0, 0, 0, 0, 0, 0, 1] self.grid() self.create_widgets() self.create_choices() def create_widgets(self): self.buttons =  for i in range(samples): button = Button(self) button["command"] = self.choose_tree(i) button.grid(row=i / 3, column=i % 3) self.buttons.append(button) def create_choices(self): self.choices = [evolve(self.dna[:]) for i in range(samples)] self.images = [Image.new("RGB", image_size, (255, 255, 255)) for _ in range(samples)] [draw_tree(ImageDraw.Draw(self.images[i]), self.choices[i]) for i in range(samples)] self.tkimages = [ImageTk.PhotoImage(image) for image in self.images] for i in range(samples): self.buttons[i]["image"] = self.tkimages[i] def choose_tree(self, i): def func(): self.dna = self.choices[i] self.create_choices() return func if __name__ == "__main__": root = Tk() app = Application(master=root) app.mainloop() root.destroy()
Recently I went to a wedding which had a casino theme. To keep the guests entertained they gave every guest $100 from the Bank Of Fun to spend on the roulette and black jack tables. I decided to play roulette and I knew that the best way to maximise my chances of winning was to bet only on odd or even and to double my bet whenever I lost. At one point I was 2.6x up on my initial stake, but unfortunately, as you’d expect, I eventually lost the lot.
I want to see what I could have done to increase my peak winnings, and to try my best to leave the table with a positive cash flow. To do this we’ll simulate a roulette table using Python and try out various betting strategies. The Roulette wheel that was used at the Wedding was an American wheel and featured the numbers 1 to 36 as well as 0 and 00. Betting on odd or even will win if a number 1 to 36 comes up and it is odd or even. 0 or 00 will lose you your money. If you win your stake is doubled. This means that by betting on odd or even you stand a 47% chance of winning.
To help work out the best strategy we need to build a roulette wheel simulator. To do this we use the Python function given below. It takes four parameters and returns the amount money left at the end of the run. The first parameter is the amount of money to start with, the second is a function which takes the current amount of money and returns the bet. The next to function determine when to give up – either a limit on the number of rounds, or the amount of money to stop at. The variable wheel is a list containing 18 “odd” strings, 18 “even” strings as well as one “0” and one “00” string.
from random import choice def roulette(stake, bet_func, go_limit=None, walk_away=None): go = 0 while stake > 0 and (go_limit is None or go < go_limit) and (walk_away is None or stake < walk_away): go += 1 bet = bet_func(stake) if bet > stake: bet = stake if choice(wheel) == "odd": stake += bet else: stake -= bet return stake
So, with the simulation in place let’s start working out some odds. The simplest betting strategy is to bet $1 each round. To do this we used this simple betting function.
def flat_bet(stake): return 1
The graph below shows how likely you are to win when following this strategy for the given target. As you can see if you only want to increase you money from $100 to $101 then you’ve a 90% chance of doing this betting $1 each go. However, if you set your sights higher then your chances quickly diminish and you’ve almost no chance of making even a $40 profit.
The strategy I used was to double my bet every time I lost and reset to a $1 bet when I won. This means that on average you only stand to win $1 per round, but because your bet is doubled each win wipes out any previous loses. The code for this bet function is more complicated and we need to use a callable class to store the state of our bet.
class scale_bet: def __init__(self, scale): self.bet = 1 self.scale = scale self.prev_stake = None def __call__(self, stake): if self.prev_stake is None or stake > self.prev_stake: self.bet = 1 else: self.bet *= self.scale self.prev_stake = stake return math.floor(self.bet)
The probably of winning is much better with the doubling strategy, and if you’re aiming for increasing your cash pile to $250 then you have a 25% chance of doing that.
The chances of winning are much better if you double your bet, but why stop at doubling? In the next test I aimed for a target of $200 and increased the scaling factor of the bet from 0.1 to 50. You can see from the graph below that increasing the scaling factor doesn’t change your chances of winning, instead it remains at about 47%.
The final chart shows the chance of reaching $200 with a bet which doubles when you lose. In this test the starting bet is set so that you have at least x goes remaining. We begin with having only one possible other bet, and go up to twenty. Despite what you might think, the chances of winning do not really change much.
So, what’s the outcome of all this? What ever you do, you’ve got a less than 50/50 chance of winning, but doubling your bet each time you lose will give a longer run before your lose your house.
Photo of a Roulette Wheel by John Wardell (Netinho).
Charts generated with Google Charts.