Posts Tagged ‘code’
I could wax lyrical about how programming is an art form and requires a great deal of creativity. However, it’s easy to loose focus on this in the middle of creating project specs and servicing your technical debt. Like many companies we recently held a hackathon event where we split up into teams and worked on projects suggested by the team members.
Different teams took different approaches to the challenge, one team set about integrating an open source code review site in our development environment, others investigated how some commercial technologies could be useful to us. My team built a collaborative filtering system using MongoDB. I’ll post about that project in the future, but in this post I wanted to focus on what we learnt about running a company Hackathon event.
If you’re lucky you’ll work in a company that’s focused on technology and you’ll always be creating new and interesting things. In the majority of companies technology is a means to a end, rather than the goal. In that case it’s easy to become so engrossed in the day to day work that you forget to innovate or to experiment with new technologies. A hackathon is a great way to take a step back and try something new for a few days.
Running a hackathon event should be divided into three stages, preparation, the event and the post event. Before the event you need to take some time to collect ideas and do some preliminary research. The event itself should be a whirlwind of pumping out code and building something exciting. Afterwards you need to take some time to demonstrate what you’ve built, and share what you’ve learnt.
Typical IT departments will been given a set of requirements and will need work out how to achieve them. What a hackathon event should allow is for the department to produce their own set of requirements, free from any external influences. In your day to day work what projects you actually tackle will be decided by a range of factors, but a hackathon is designed to let programmers take something where they’ve thought “we could do something interesting with that data” or “that technology might help us make things better” and run with it. The first stage is to collect all these ideas from the department and then to divide the team up into groups to tackle the most popular projects, perhaps with a voting staging to whittle the ideas down. To keep things fun and quick small teams are key here, any more than five people and you’ll start to get bogged down in process and design by committee.
Once you’ve got the projects and teams sorted you can prepare for the event. People who are working on each project need to think about what they want to have built by the end of event and should be dreaming up ways to tackle the problem. Coding before is banned, but things will go much quicker if you’ve come up with a plan to attack the problem.
For the event you need to remove as many distractions as possible. Certainly you need tell others in the company that you will not be available for a few days. Whether other possibilities such as not reading your email are doable depends on how often you need to deal with crises. Hopefully with no-one fiddling with your servers fewer things will go wrong than on an average day. Moving location, either to meeting rooms or to an external space are both good ways of getting space to focus on the work.
Once the time has expired you need to wrap the event up, and the key thing is to demonstrate what you’ve built to the rest of team. Not everyone in IT is happy with standing and presenting, but a few minutes to people they know well should not be a problem. It’s tempting to invite people from outside the department to the presentations, and before my first hackathon I was very keen on bringing management, who often won’t understand what IT do, into the meeting. Hopefully what is built will show how innovative and full of ideas programmers can be. In reality though the best you can really hope for is a set of tech demos that require quite a lot of understanding about the limitations inherent in building things quickly, which those outside IT will struggle to understand.
The presentations should focus on two things, a demonstration of what you’ve built and a discussion on the technology used and decisions made. The aim should be to excite the team about what you’ve built and to impart some of what you’ve learnt. A good way to spark a discussion is to talk about what problems you encountered. How long the presentations should be depends a lot on the complexity of the problems being tackled, but ten to twenty minutes per project is a good length of time.
In the longer term after the event you’ll need to decide which of the projects to keep, which to improve on or rewrite and which to abandon completely. In a more structured and prepared presentation showing the projects to management maybe a good idea. The presentations immediately following the hackathon are likely to be a fairly ramshackle affair and won’t make the most ‘professional’ impression.
Traditional ‘training’ session don’t work to well in IT, it’s a easier to learn by doing. Most people are also quite sceptical of ‘build a raft’ style team building exercises, compared to those hackathons are the perfect mix of learning and fun. They’re also a great way to work on the problems that you’ve wanted to solve for ages but have never found the time. Even if you don’t get anything you can use out of the event the process of getting there will be worthwhile. And who knows, you might build a million dollar product for your company.
Recently I’ve been working a couple of open source projects and as part of them I’ve been using some libraries. In order to use a library though, you need to understand how it is designed, what function calls are available and what those functions do. The two libraries I’ve been using are Qt and libavformat, which is part of FFmpeg and they show two ends of the documentation spectrum.
Now, it’s important to note that Qt is a massive framework owned by Nokia, with hundreds of people working on it full-time including a number of people dedicated to documentation. FFmpeg on the other hand is a purely volunteer effort with only a small number of developers working on it. Given the complicated nature of video encoding to have a very stable and feature-full library such as FFmpeg available as open source software is almost a miracle. Comparing the levels of documentation between these two projects is very unfair, but it serve as a useful example of where documentation can sometimes be lacking across all types of projects, both open and closed source.
So, lets look at what documentation it is important to write by considering how you might approach using a new library.
When you start using some code that you’ve not interacted with before the first thing that you need is to get a grasp on the concepts underlying the library. Some libraries are functional, some object orientated. Some use callbacks, others signals and slots. You also need to know the top level groupings of the elements in the library so you can narrow your focus that parts of the library you actually want to use.
Qt’s document starts in a typical fashion, with a tutorial. This gives you a very short piece of code that gets you up and running quickly. It then proceeds to describe, line by line, how the code works and so introduces you to the fundamental concepts used in the library. FFmpeg takes a similar approach, and links to a tutorial. However, the tutorial begins with a big message about it being out of date. How much do you trust the out of date tutorial?
Once you’ve a grasp of the fundamental design decisions that were taken while building the library, you’ll need to find out what functions you need to call or objects you need to create to accomplished your goal. Right at the top of the menu the QT documentation has links to class index, function index and modules. These let you easily browse the contents of the library and delve into deeper documentation. Doxygen is often used to generate documentation for an API, and it seem to be the way FFmpeg is documented. Their frontpage contains… nothing. It’s empty.
Finally, you’ll need to know what arguments to pass to a function and what to expect in return. This is probably the most common form of documentation to write so you probably (hopefully?) already write it. Despite my earlier criticisms, FFmpeg does get this right and most of the function describe what you’re supposed to pass into the function. With this sort of documentation it’s important to strike a balance. You need to write enough documentation such that people can call your function and have it work first time, but you don’t want to write too much so that it takes ages to get to grips with or replicates what you could find out by reading the code.
Some people hold on to the belief that the code is the ultimate documentation. Certainly writing self-documenting code is a worthy goal, but there are other levels of documentation that are needed before someone could read and the code and understand it well enough for it to be self-documenting. So, next time you’re writing a library make sure you consider:
- How do people get started?
- How do people navigate through the code?
- and, how do people work out how to call my functions?
There are a number of tools for checking whether your Python code meets a coding standard. These include pep8.py, PyChecker and PyLint. Of these, PyLint is the most comprehensive and is the tool which I prefer to use as part of my buildbot checks that run on every commit.
PyLint works by parsing the Python source code itself and checking things like using variables that aren’t defined, missing doc strings and a large array of other checks. A downside of PyLint’s comprehensiveness is that it runs the risk of generating false positives. As it parses the source code itself it struggles with some of Python’s more dynamic features, in particular metaclasses, which, unfortunately, are a key part of Django. In this post I’ll go through the changes I make to the standard PyLint settings to make it more compatible with Django.
This line disables a few problems that are picked up entirely. W0403 stops relative imports from generating a warning, whether you want to disable these or not is really a matter of personal preference. Although I appreciate why there is a check for this, I think this is a bit too picky. W0232 stops a warning appearing when a class has no __init__ method. Django models will produce this warning, but because they’re metaclasses there is nothing wrong with them. Finally, E1101 is generated if you access a member variable that doesn’t exist. Accessing members such as id or objects on a model will trigger this, so it’s simplest just to disable the check.
These makes the output of PyLint easier to parse by Buildbot, if you’re not using it then you probably don’t need to include these lines.
Apart from a limited number of names PyLint tries to enforce a minimum size of three characters in a variable name. As qs is such a useful variable name for a QuerySet I force this be allowed as a good name.
The last change I make is to allow much longer lines. By default PyLint only allows 80 character long lines, but how many people have screens that narrow anymore? Even the argument that it allows you to have two files side by side doesn’t hold water in this age where multiple monitors for developers are the norm.
PyLint uses the exit code to indicate what errors occurred during the run. This confuses Buildbot which assumes that a non-zero return code means the program failed to run, even when using the PyLint buildstep. To work around this I use a simple management command to duplicate the pylint program’s functionality but that doesn’t let the return code propagate back to Builtbot.
from django.core.management.base import BaseCommand from pylint import lint class Command(BaseCommand): def handle(self, *args, **options): lint.Run(list(args + ("--rcfile=../pylint.cfg", )), exit=False)
Yesterday Google announced a new feature for Google Code’s Project Hosting. You can now edit files directly in your browser and commit them straight into the repository, or, if you don’t have commit privileges, attach your changes as a patch in the issue tracker.
If you’re trying to run a successful open source project then the key thing you want is more contributors. The more people adding to your project the better and more useful it will become, and the more likely it is to rise out of the swamp of forgotten, unused projects to become something that is well known and respected.
It’s often been said that to encourage interaction you need to lower the barrier so that people can contribute with little or no effort on their part. Initially open source projects are run by people who are scratching their own itches, and producing something that is useful to themselves. Google’s intention with this feature is clearly to allow someone to think “Project X” has a bug, I’ll just modified the code and send the developers a patch. The edit feature is very easy to find, with a prominent “Edit File” link at the top of the screen when you’re browsing the source code so Google have clearly succeeded in that respect.
My big concern here is that committing untested code to your repository is right up there at top of the list of things that programmers should never, ever, do. I like to think of myself as an expert Python programmer, but I’ll occasionally make simple mistakes like missing a comma or a bracket. It’s rare that anything beyond a trivially small change will work perfectly first time. Only by running the code do you pick up these and ensure that your code is at least partially working.
I’m all for making it easy to contribute, but does contributing a large number of untested changes really help anyone? I’m not so sure. Certainly this feature is brilliant for making changes to documentation where all you need to do is to read the file to know that the change is correct, but it seems a long way from best-practice for making code changes.
Perhaps I should be thinking about this as a useful tool for sketching out possible changes to code. If you treat it as the ability to make ‘pseudo-code’ changes to a file to demonstrate how you might tackle a problem it seems to make more sense, but open source has always lived by the mantra ‘if you want it fixed, fix it yourself’.
I suppose I should worry about getting my pet open source project to a state where people want to contribute changes of any quality, and then I can worry about making the changes better!
If you’re writing a Python program that doesn’t have a text-based user interface (either it’s a GUI or runs as part of another program, e.g. a webserver) then you should avoid using the print statement. It’s tempting to use print to fill the console with information about what your program is up to. For code of any size though, this quickly devolves into a hard to navigate mess.
Python’s standard library contains a module, logging, that lets you write code to log as much information as you like and configure what you bits you are interested in at runtime.
There are two concepts that you need to understand with logging. Firstly there is the logging level. This is how you determine how important the message is. The levels range from debug as the least important, through info, warning, error, critical to exception, the most important. Secondly there is the logger. This allows you divide your messages into groups depending on the part of your code they relate to. For example, you might have a gui logger and a data logger.
The logging comes with a series of module level functions by each of the names of the logging levels. These make it quick and easy to log a message using the default logger.
logging.debug("Debug message") logging.error("Error retrieving %s", url)
The second of these two lines has more than one argument. In this case the logging module will treat the first argument as a format string and the rest as arguments to the format, so that line equivalent to this one.
logging.error("Error retrieving %s" % (url, ))
If you try to treat the logging code like you would a print statement and write logging.error("Error retrieving", url) then you’ll get the following, very unhelpfui, error message.
Traceback (most recent call last): File "/usr/lib/python2.6/logging/__init__.py", line 776, in emit msg = self.format(record) File "/usr/lib/python2.6/logging/__init__.py", line 654, in format return fmt.format(record) File "/usr/lib/python2.6/logging/__init__.py", line 436, in format record.message = record.getMessage() File "/usr/lib/python2.6/logging/__init__.py", line 306, in getMessage msg = msg % self.args TypeError: not all arguments converted during string formatting
Notice how this exception doesn’t tell you where the offending logging statement is in your code! Now you know the type of error that will cause this that will help in tracking the problem down, but there is more than can be done to help you find it. The logging library allows you to specify a global error handle, which combined with the print stack trace function will give you a much better error message.
import logging import traceback def handleError(self, record): traceback.print_stack() logging.Handler.handleError = handleError
Loggers are created by calling logging.getLogger('loggername'). This returns an object with the same set of log level functions as the module, but which can be controlled independently. For example:
gui_log = logging.getLogger('gui') gui_log.debug("created window") data_log = logging.getLogger('gui') data_log.debug("loaded file")
Where this comes in really handy is when you set the level of messages that you want to see independently for each logger. In the next code block we set the logging module so we’ll see lots of debugging messages from the GUI and only errors from the data layer. Although here we’re setting the levels directly in code, it’s not a big jump to make them configurable using a command line option.
The logging module also lets you configure how your messages are formatted, and to direct them to files rather than the console. Hopefully this short guide is useful, let me know in the comments!