The Importance Of Documentation

Book Pile by Paul WatsonRecently I’ve been working a couple of open source projects and as part of them I’ve been using some libraries. In order to use a library though, you need to understand how it is designed, what function calls are available and what those functions do. The two libraries I’ve been using are Qt and libavformat, which is part of FFmpeg and they show two ends of the documentation spectrum.

Now, it’s important to note that Qt is a massive framework owned by Nokia, with hundreds of people working on it full-time including a number of people dedicated to documentation. FFmpeg on the other hand is a purely volunteer effort with only a small number of developers working on it. Given the complicated nature of video encoding to have a very stable and feature-full library such as FFmpeg available as open source software is almost a miracle. Comparing the levels of documentation between these two projects is very unfair, but it serve as a useful example of where documentation can sometimes be lacking across all types of projects, both open and closed source.

So, lets look at what documentation it is important to write by considering how you might approach using a new library.

When you start using some code that you’ve not interacted with before the first thing that you need is to get a grasp on the concepts underlying the library. Some libraries are functional, some object orientated. Some use callbacks, others signals and slots. You also need to know the top level groupings of the elements in the library so you can narrow your focus that parts of the library you actually want to use.

Qt’s document starts in a typical fashion, with a tutorial. This gives you a very short piece of code that gets you up and running quickly. It then proceeds to describe, line by line, how the code works and so introduces you to the fundamental concepts used in the library. FFmpeg takes a similar approach, and links to a tutorial. However, the tutorial begins with a big message about it being out of date. How much do you trust the out of date tutorial?

Once you’ve a grasp of the fundamental design decisions that were taken while building the library, you’ll need to find out what functions you need to call or objects you need to create to accomplished your goal. Right at the top of the menu the QT documentation has links to class index, function index and modules. These let you easily browse the contents of the library and delve into deeper documentation. Doxygen is often used to generate documentation for an API, and it seem to be the way FFmpeg is documented. Their frontpage contains… nothing. It’s empty.

Finally, you’ll need to know what arguments to pass to a function and what to expect in return. This is probably the most common form of documentation to write so you probably (hopefully?) already write it. Despite my earlier criticisms, FFmpeg does get this right and most of the function describe what you’re supposed to pass into the function. With this sort of documentation it’s important to strike a balance. You need to write enough documentation such that people can call your function and have it work first time, but you don’t want to write too much so that it takes ages to get to grips with or replicates what you could find out by reading the code.

Some people hold on to the belief that the code is the ultimate documentation. Certainly writing self-documenting code is a worthy goal, but there are other levels of documentation that are needed before someone could read and the code and understand it well enough for it to be self-documenting. So, next time you’re writing a library make sure you consider:

  • How do people get started?
  • How do people navigate through the code?
  • and, how do people work out how to call my functions?

Photo of Book Pile by Paul Watson.

Communicating With Stakeholders

Communication by pshanksLast week I had a discussion with my boss about the best way to communicate with my stakeholders about the progress of any work that they have asked my team to do. The question basically became how much information should be communicated during a project. The requirements and delivery phases obviously require close communication, but what is appropriate while the developers are hard at working, churning out code? If, like me, you are required to work a number of disparate departments then the people in the departments may want to know what work is currently on your plate before they ask you to do something. What’s the best way to keep a status board up to date?

Traditionally we have had a Bugzilla installation which was used to store a record of almost every change we made. A subversion post commit hook allows us to link every commit back to a piece of work in Bugzilla. This works well for coding in an issue-driven-development style, but does result in Bugzilla sending a lot of emails. Many of which are completely irrelevant to people outside of IT. Indeed even people inside IT, but who aren’t directly linked to that piece of work, don’t need to be informed by email of every checkin.

Recently we have begun to experiment with FogBugz. While similar to Bugzilla it has a number of subtle differences. Firstly FogBugz is designed to be used in a helpdesk environment so it provides the ability to communicate both within the team and with external stakeholders from the same interface. This gives you the ability to communicate on two different levels, with all the communication still being tracked. The second difference is that FogBugz is not known amongst the people outside of IT. Not all stakeholder know about Bugzilla, but some do, and some can even search and reply using the webinterface. With FogBugz we’ll take this ability away as, at least initially, only a limited number of IT people will have access to the cases.

FogBugz had other project management advantages to Bugzilla. It allows you to create subcases to break down a piece of work into more easily implemented chucks. It also has advanced estimation capabilities to allow you to project how long a milestone will take to complete.

The crux of discussion is this: which is better, a known tool that provides complete access to status to those that want it, or a tool that enables those doing the work to plan better but prevents those outside from seeing how the work is progressing for themselves?

In my view developers should use the tools that they feel help them work the best. Looking at a list of tasks is probably not going to help someone who is not doing the work to understand whether the project is on track or not. However, it is important that stakeholders are kept informed of progress regularly so switching to a less open development model should not be used as an excuse to become more insular, quite the opposite in fact. Switching to a less open development model should force you to be more explicit and include status updates as part of your regular working schedule.

Photo of Communication by pshanks.

Can the entrance barrier ever be too low?

Stop Sign by thecrazyfilmgirlYesterday Google announced a new feature for Google Code’s Project Hosting. You can now edit files directly in your browser and commit them straight into the repository, or, if you don’t have commit privileges, attach your changes as a patch in the issue tracker.

If you’re trying to run a successful open source project then the key thing you want is more contributors. The more people adding to your project the better and more useful it will become, and the more likely it is to rise out of the swamp of forgotten, unused projects to become something that is well known and respected.

It’s often been said that to encourage interaction you need to lower the barrier so that people can contribute with little or no effort on their part. Initially open source projects are run by people who are scratching their own itches, and producing something that is useful to themselves. Google’s intention with this feature is clearly to allow someone to think “Project X” has a bug, I’ll just modified the code and send the developers a patch. The edit feature is very easy to find, with a prominent “Edit File” link at the top of the screen when you’re browsing the source code so Google have clearly succeeded in that respect.

Editing a file on Google Code

My big concern here is that committing untested code to your repository is right up there at top of the list of things that programmers should never, ever, do. I like to think of myself as an expert Python programmer, but I’ll occasionally make simple mistakes like missing a comma or a bracket. It’s rare that anything beyond a trivially small change will work perfectly first time. Only by running the code do you pick up these and ensure that your code is at least partially working.

I’m all for making it easy to contribute, but does contributing a large number of untested changes really help anyone? I’m not so sure. Certainly this feature is brilliant for making changes to documentation where all you need to do is to read the file to know that the change is correct, but it seems a long way from best-practice for making code changes.

Perhaps I should be thinking about this as a useful tool for sketching out possible changes to code. If you treat it as the ability to make ‘pseudo-code’ changes to a file to demonstrate how you might tackle a problem it seems to make more sense, but open source has always lived by the mantra ‘if you want it fixed, fix it yourself’.

I suppose I should worry about getting my pet open source project to a state where people want to contribute changes of any quality, and then I can worry about making the changes better!

Photo of Stop Sign by thecrazyfilmgirl.

Strict Development

While working on my new open source project, CouchQL, I’m being very strict with my development process and following both issue driven development, and test driven development.

Issue driven development requires that every commit refers to an issue that has been logged in the bug tracking software. This means that every change must be described, accepted and then logged. This works better if your repository is connected to your bug tracking software such that any commit message with a issue number is automatically logged. In subversion this can be achieved with a post commit hook, such as this script for trac.

The connection between your commit messages and bug tracking software means that when changes are merged between branches new messages will be added to the issue, informing everyone what version of the software the issue has been fixed in. As well as just adding comments to issues it is also possible to mark bugs as fixed with commit messages such as “Fixes issue #43.” which should speed up your work flow. While Google Code does add hyperlinks between commit messages and issues, it doesn’t add automatically add comments, which is a pain.

Enforcing a development practice like this requires you to think about the changes you are making to your software, and focuses your mind one a particular goal. Bug tracking software will have the ability to assign priorities to changes as well to group them into milestones. This helps you to build up a feature list for each version of the software, and to know when you’ve achieved your goals and it’s time to release!

Test driven development is related in that before any changes to code are made a test must be written. This test must be designed to check the result of the change as closely and completely as possible. When the test passes (and all other tests still pass) the change that you were making is complete.

The benefits to this style of development are two fold. Firstly it should be easy to end up with test coverage of close to 100%. Secondly it forces you to think about the point of the change that you’re making. Combine this with the issue that you logged before you started, and you’ll really have a good idea of the scope and aim of your change before you ever touch the keyboard.

It can sometimes be hard to do this on every change you make, but the more often you do the better tested and more maintainable your code will be. Tools can really help you. If your version control system is linked to you bug tracking software then all you need to do is remember to log a bug and mention it in every commit message. A continuous integration testing tool such as buildbot makes keeping your tests complete very desirable as you’ll be notified of any breakages very quickly.

You can’t be made to follow development processes such as these, but if you understand the benefits, and want to use then they become second nature, and hopefully you’ll be a better programmer as a result.

Rude Code

In this second part of my “Etiquette of Programming” series I’m going to talk about making sure your code fits in with the style of existing code, while helping to bring it up to best practice standards.

When you’re working on an existing codebase, either fixing bugs or enhancing features, you’ll be going into the code and adding more code. When someone has to come back to this section and improve the feature or fix more bugs (not that you would introduce any, would you?) they need to read both the original code and your code. If your code sticks out, or causes the reader to say “Whoa, what’s going on here?” then your code is rude, and no-one likes rude people do they?

Just like writing prose, everyone has a different style of writing coding. People prefer different programming paradigms, design patterns, variable naming schemes and have different aesthetic preferences for spacing their code out with white space. There are more differences, but these four are the main aspects that make up your style of code. I’m going to talk about all four aspects in turn, from rudest to nicest.

Most people code in an object orientated style, but most languages allow you to pick and choose between procedural and functional styles as you wish. Each style has its own benefits and drawbacks, and each should be used in the right place. What you should avoid is switching styles in the same section of code. The mental switch required for someone who is reading code written in multiple styles is too great for it to be an easy or enjoyable experience.

If you choose to write your code in an object orientated style the you could chose to use any number of design patterns. From factories and singletons to full blown class hierarchies design patterns can be incredibly complicated. They all solve essentially the same problem though. They aim to help you structure your code better to give you a more reliable and more maintainable program. An existing code base will usually have evidence of design patterns being in use. Whatever you think of the particular patterns in use, if you want you code to be maintainable then it’s important to use the same patterns.

The final three points are not as important as the first two, but combined they can really cause great difficulty for those reading and extending your code. Most of the time you’ll be using APIs that you don’t know too well. You’ll probably now that there is a function or variable that does what you want, but you’ll need to guess at its name, or look it up. Consistent naming schemes greatly aid in remembering all the nooks and crannies of an API and reduce the mental effort needed to code using it. Typically you’ll be using several different libraries which will have different naming schemes, but if you can be as consistent as possible in the code that you control you’ll have more room in your head for more interesting things.

Lastly we have whitespace. In most languages whitespace is unimportant, and even in languages such as Python where it is, they are pretty forgiving about how you format your code. Four or two space indentation. Braces on the same or next line. Spaces around operators. The variations in style are enormous, but when you’re making a bug fix resist the temptation to change the style to fit your view because it will obscure the change that you’re actually making. If you need to reformat code it should be done in a single, dedicated revision which is clearly marked as only affecting white space.

For code to be maintainable and extendable it ideally looks as is it was written by just one person. None of these points are meant as hard and fast rules. Rather they are something to bear in mind to try and help you rein your natural desire to make a mark on the code. Best practice dogma changes over time and code should evolve with it. However, it should evolve in a dedicated refactoring step not piece by piece as new features are added.

Next time I’ll talk about the etiquette of source control and how it should be used to make your colleagues lives easier.

The Etiquette Of Programming

You might be a smart developer. You might be someone who gets things done. Unless you’re a lone programmer working in your bedroom though, that’s not enough. Collaborating and cooperating with your teammates is vital to keeping your project moving forwards and to ensure it’s extendable and maintainable.

In this series I’m going to talk about how small changes to your coding style, small changes to your communication and small changes to your working practices can make a huge difference to those you’re working with. A lot of these tips are common sense, some of them you’ll already know, but hopefully some of them will make you think and might make your life easier.

If you’re fresh out of university and joining a team of programmers with 30 years experience then you will probably get there and think “WTF!” about some aspect of their working practices. Most of what you read about programming is of the “here’s a shiny new toy” variety. If you immediately started using everything you read about not only would you probably go insane but you’d also annoy your coworkers immensely.

In a project with a reasonably sized team, especially one with people of different abilities, consistency is key. If you’re starting a project in a new company which has never written software before then you’ll have the opportunity to do things right.  If you’re rewriting an old project from scratch then not only can you improve the code but also you can improve the working practices too. The first of those possibilities is unlikely, while the second is rare. What is much more likely is that you’re working either to maintain or to incrementally improve an existing code base.

Improving code and practices is an incremental process. Try to change too much too quickly and your coworkers will find it hard to keep up with the changes given the mixed state of the code base. How to stage the transition from old and broken working style to best practices is something I’ll discuss later.

Vim or Emacs, Windows or Linux, Mac or PC. Computers are full of ideological wars that will probably never reach a cease fire. This is a good thing. Arguments spur people on to ensure that their chose side is unquestionably better than their foe, and we all win as a result. When you’re working in a team though, arguments of this nature are trivial and can undermine the success of your team. In a future post I’ll talk about how to control these arguments so they remain productive.

In my next post, I’ll talk about rude code, how to deal with it and how to avoid writing it.