Introducing A New Language

code.close()At work, there is a discussion going on at the moment about introducing Kotlin into our tech stack. We’re a JVM based team, with the majority of our code written in Java and few apps in Scala. I don’t intend to discuss the pros and cons of any particular language in this post, as I don’t have enough experience of them to decide yet (more on that to come as the discussion evolves). Instead, I wanted to talk about how you can decide when to introduce a new language.

Programmers, myself included, have a habit of being attracted to anything new and shiny. That might be a new library, a new framework or a new language. Whatever it is, the hype will suggest that you can do more, with less code and fewer bugs. The reality often turns out to be a little different, and by the time you have implemented a substantial production system then you’ve probably pushed up against the limits, and found areas where it’s hard to do what you want, or where there are bugs or reliability problems. It’s only natural to look for better tools that can make your life easier.

If you maintain a large, long-lived code base then introducing anything new is something that has to be considered carefully. This is particularly true of a new language. While a new library or framework can have its own learning curve, a new language means the team has to relearn how to do the fundamentals from scratch. A new language brings with it a new set of idioms, styles and best practices. That kind of knowledge is built up by a team over many years, and is very expensive both in time and mistakes to relearn.

Clearly, if you need to start writing code in a radically different environment then you’ll need to pick a new language. If like us, you mostly write Java server applications and you want to start writing modern web-based frontends to your applications then you need to choose to add Javascript, or one of the many Javascript based languages, into your tech stack.

The discussion that we’re having about Java, Scala and Kotlin is nowhere near as clear-cut however. Fundamentally choosing one over the other wouldn’t let us write a new type of app that we couldn’t write before, because they all run in the same environment. Scala is functional, which is a substantial change in idiom, while Kotlin is a more traditional object-orientated language, but considerably more concise than Java.

To help decide it makes sense to write a new application in the potential new language, or perhaps rewrite an existing application. Only with some personal experience can you hope to make a decision that’s not just based on hype, or other people’s experiences. The key is treat this code as a throw-away exercise. If you commit to putting the new app into production, then you’re not investigating the language, you’re commiting to add it to your tech stack before you’ve investigated it.

As well as the technical merits, you should also look into the training requirements for the team. Hopefully there are good online tutorials, or training courses available for your potential technology, but these will need to be collated and shared, and everyone given time to complete them. If you’re switching languages then you can’t afford to leave anyone behind, so training for the entire team is essential.

Whatever you feel is the best language to choose, you need to be bold and decisive in your decision making. If you decide to use a new language for an existing environment then you need to commit to not only writing all new code in it, but to also fairly quickly port all your existing code over as well. Having multiple solutions to the same problem (be it the language you write your server-side, or browser-side apps in, or a library or framework) create massive amounts of duplicated code, duplicated effort and expensive context switching for developers.

Time and again I’ve seen introducing the new shiny solution create a mountain of technical debt because old code is not ported to the new solution, but instead gets left behind in the vague hope that one day it will get updated. New technology and ways of working can have a huge benefit, but never underestimate the cost, and importance, of going all the way.


Photo of code.close() by Ruiwen Chua.

Advertisements

Accessing FitBit Intraday Data

JoggingFor Christmas my wife and I brought each other a new FitBit One device (Amazon affiliate link included). These are small fitness tracking devices that monitor the number of steps you take, how high you climb and how well you sleep. They’re great for providing motivation to walk that extra bit further, or to take the stairs rather than the lift.

I’ve only had the device for less than a week, but already I’m feeling the benefit of the gamification on FitBit.com. As well as monitoring your fitness it also provides you with goals, achievements and competitions against your friends. The big advantage of the FitBit One over the previous models is that it syncs to recent iPhones, iPads, as well as some Android phones. This means that your computer doesn’t need to be on, and often it will sync without you having to do anything. In the worst case you just have to open the FitBit app to update your stats on the website. Battery life seems good, at about a week.

The FitBit apps sync your data directly to FitBit.com, which is great for seeing your progress quickly. They also provide an API for developers to provide interesting ways to process the data captured by the FitBit device. One glaring omission from the API is any way to get access to the minute by minute data. For a fee of $50 per year you can become a Premium member which allows you do to a CSV export of the raw data. Holding the data, collected by a user hostage is deeply suspect and FitBit should be ashamed of themselves for making this a paid for feature. I have no problem with the rest of the features in the Premium subscription being paid for, but your own raw data should be freely available.

The FitBit API does have the ability to give you the intraday data, but this is not part of the open API and instead is part of the ‘Partner API’. This does not require payment, but you do need to explain to FitBit why you need access to this API call and what you intend to do with it. I do not believe that they would give you access if your goal was to provide a free alternative to the Premium export function.

So, has the free software community provided a solution? A quick search revealed that the GitHub user Wadey had created a library that uses the urls used by the graphs on the FitBit website to extract the intraday data. Unfortunately the library hadn’t been updated in the last three years and a change to the FitBit website had broken it.

Fortunately the changes required to make it work are relatively straightforward, so a fixed version of the library is now available as andrewjw/python-fitbit. The old version of the library relied on you logging into to FitBit.com and extracting some values from the cookies. Instead I take your email address and password and fake a request to the log in page. This captures all of the cookies that are set, and will only break if the log in form elements change.

Another change I made was to extend the example dump.py script. The previous version just dumped the previous day’s values, which is not useful if you want to extract your entire history. In my new version it exports data for every day that you’ve been using your FitBit. It also incrementally updates your data dump if you run it irregularly.

If you’re using Windows you’ll need both Python and Git installed. Once you’ve done that check out my repository at github.com/andrewjw/python-fitbit. Lastly, in the newly checked out directory run python examples/dump.py <email> <password> <dump directory>.


Photo of Jogging by Glenn Euloth.

Hackathons, and why your company needs one

CodeI could wax lyrical about how programming is an art form and requires a great deal of creativity. However, it’s easy to loose focus on this in the middle of creating project specs and servicing your technical debt. Like many companies we recently held a hackathon event where we split up into teams and worked on projects suggested by the team members.

Different teams took different approaches to the challenge, one team set about integrating an open source code review site in our development environment, others investigated how some commercial technologies could be useful to us. My team built a collaborative filtering system using MongoDB. I’ll post about that project in the future, but in this post I wanted to focus on what we learnt about running a company Hackathon event.

If you’re lucky you’ll work in a company that’s focused on technology and you’ll always be creating new and interesting things. In the majority of companies technology is a means to a end, rather than the goal. In that case it’s easy to become so engrossed in the day to day work that you forget to innovate or to experiment with new technologies. A hackathon is a great way to take a step back and try something new for a few days.

Running a hackathon event should be divided into three stages, preparation, the event and the post event. Before the event you need to take some time to collect ideas and do some preliminary research. The event itself should be a whirlwind of pumping out code and building something exciting. Afterwards you need to take some time to demonstrate what you’ve built, and share what you’ve learnt.

Typical IT departments will been given a set of requirements and will need work out how to achieve them. What a hackathon event should allow is for the department to produce their own set of requirements, free from any external influences. In your day to day work what projects you actually tackle will be decided by a range of factors, but a hackathon is designed to let programmers take something where they’ve thought “we could do something interesting with that data” or “that technology might help us make things better” and run with it. The first stage is to collect all these ideas from the department and then to divide the team up into groups to tackle the most popular projects, perhaps with a voting staging to whittle the ideas down. To keep things fun and quick small teams are key here, any more than five people and you’ll start to get bogged down in process and design by committee.

Once you’ve got the projects and teams sorted you can prepare for the event. People who are working on each project need to think about what they want to have built by the end of event and should be dreaming up ways to tackle the problem. Coding before is banned, but things will go much quicker if you’ve come up with a plan to attack the problem.

For the event you need to remove as many distractions as possible. Certainly you need tell others in the company that you will not be available for a few days. Whether other possibilities such as not reading your email are doable depends on how often you need to deal with crises. Hopefully with no-one fiddling with your servers fewer things will go wrong than on an average day. Moving location, either to meeting rooms or to an external space are both good ways of getting space to focus on the work.

Once the time has expired you need to wrap the event up, and the key thing is to demonstrate what you’ve built to the rest of team. Not everyone in IT is happy with standing and presenting, but a few minutes to people they know well should not be a problem. It’s tempting to invite people from outside the department to the presentations, and before my first hackathon I was very keen on bringing management, who often won’t understand what IT do, into the meeting. Hopefully what is built will show how innovative and full of ideas programmers can be. In reality though the best you can really hope for is a set of tech demos that require quite a lot of understanding about the limitations inherent in building things quickly, which those outside IT will struggle to understand.

The presentations should focus on two things, a demonstration of what you’ve built and a discussion on the technology used and decisions made. The aim should be to excite the team about what you’ve built and to impart some of what you’ve learnt. A good way to spark a discussion is to talk about what problems you encountered. How long the presentations should be depends a lot on the complexity of the problems being tackled, but ten to twenty minutes per project is a good length of time.

In the longer term after the event you’ll need to decide which of the projects to keep, which to improve on or rewrite and which to abandon completely. In a more structured and prepared presentation showing the projects to management maybe a good idea. The presentations immediately following the hackathon are likely to be a fairly ramshackle affair and won’t make the most ‘professional’ impression.

Traditional ‘training’ session don’t work to well in IT, it’s a easier to learn by doing. Most people are also quite sceptical of ‘build a raft’ style team building exercises, compared to those hackathons are the perfect mix of learning and fun. They’re also a great way to work on the problems that you’ve wanted to solve for ages but have never found the time. Even if you don’t get anything you can use out of the event the process of getting there will be worthwhile. And who knows, you might build a million dollar product for your company.


Photo of Code by Lindsey Bieda.

Programming Documentary

TV Camera manI’m a huge science and engineering documentary geek. I prefer watching documentaries over all other forms of television. It doesn’t really matter what the documentary is about, I’ll usually watch it. After getting ready for my wedding I had a bit of time before I had to walk down the aisle so I watched a documentary about pilots learning to land Marine One at the White House. There probably aren’t many people who would choose to spend that time that way.

Science documentaries have experienced a renaissance over the last few years, particularly on the BBC. The long running Horizon series has been joined by a raft of other mini-series presented by Brian Cox, Alice Roberts, Marcus Du Sutoy, Jim Al-Kalili and Michael Mosely. These cover a large part of the sciences, including Chemistry, Biology and Physics. Physics in particular is regularly on our screens. Whether it’s talking about quantum mechanics or astronomy or something else it seems that Physics has never been more popular.

As someone who writes computer programmes for a living this makes me worry that your average man on the street may end up with a better understanding of quantum mechanics than they do of the computer on their desk, or in their pocket.

It wasn’t always like this. Back in 1981 the BBC ran the BBC Computer Literacy project, which attempted to teach the public to program using the BBC Micro through a ten part television series.

Clearly if a project like this was to be attempted today there would be no need for the BBC to partner with hardware manufactures. People have access to many different programmable devices, they just don’t know how to program them.

Recent programs that have focused on computers were Rory Cellan Jones’ Secret History of Social Networking and Virtual Revolution by Aleks Krotoski. Neither of these were technical documentaries, instead they focused on business, cultural and sociological impacts of computers and the internet.

It’s not that more technical aspects of computer don’t appear as part of other documentaries, recently Marcus Du Sautoy announced that he is filming a episode of Horizon on Artificial Intelligence. It won’t air until next spring, so it’s hard to comment, but I suspect it will focus on the outcome of the software rather than the process of how computers can be made to appear intelligent.

Jim Al-Kalili’s recent series on the history of electricity, Shock and Awe, ends with a section on the development of the transistor. During it, and over a picture of the LHC, he says something rather interesting.

Our computer’s have become so powerful that they are helping us to understand the universe in all its complexity.

The Large Hadron Collider/ATLAS at CERNIf you don’t understand computers it’s impossible to understand how almost all modern science is done. Researches in all disiplinces need to be proficent at programming in order to anaylse their own data. Business is run on software, often which is customised to the individual requirements of the company. It boggles my mind that people can be so reliant on computers yet have so little idea of how they work.

So, what would my ideal programming documentary cover? The most obvious thing is the internet. A history of computer networking could begin at the development of the first computer networks, describe how TCP/IP divides data into packets and routes it between computers. It could move on to HTTP and HTML both of which are fundamentally simple yet apply to our everyday lives. To bring things up to date it could focus on Google and Facebook and show people the inside of a data centre. I suspect that most people have no idea where their Google search results are coming from.

I doubt that there is much demand for the updated series as long as the 10 part original, but the soon to be released Raspberry Pi machine would be an ideal way to recreate the tinkering appeal of the original BBC Micro. There’s something magical about seeing a program you’ve written appearing on the TV in your living room, rather than on the screen of your main PC. An alternative would be to provide an interpreter as part of a website so you can just type in a URL and start programming.

Raspberry PIA documentary focussing on programming would have a difficulty that the original series never had – the fact that computing power is common place means that people are used to software created by large teams with dedicated designers. An individual with no experience can’t hope to come to close to something like that. Fortunately computers are so much more powerful today that much of the complexity that you needed to cope with can be abstracted away. Libraries like VPython make it very simple to produce complicated and impressive 3D graphics.

I’m certainly not the only person who wants to help teach the masses to program, but realistically you need an organisation like the BBC to do something that might actually make a difference. Do I think that you create a compelling and informative documentary that might inspire people to program, and give them a very basic understand of how to do it. Definitely.


Photo of TV Camera man by Chris Waits.

Photo of The Large Hadron Collider/ATLAS at CERN by Image Editor.

Photo of Raspberry PI by Paul Downey.

The Importance Of Documentation

Book Pile by Paul WatsonRecently I’ve been working a couple of open source projects and as part of them I’ve been using some libraries. In order to use a library though, you need to understand how it is designed, what function calls are available and what those functions do. The two libraries I’ve been using are Qt and libavformat, which is part of FFmpeg and they show two ends of the documentation spectrum.

Now, it’s important to note that Qt is a massive framework owned by Nokia, with hundreds of people working on it full-time including a number of people dedicated to documentation. FFmpeg on the other hand is a purely volunteer effort with only a small number of developers working on it. Given the complicated nature of video encoding to have a very stable and feature-full library such as FFmpeg available as open source software is almost a miracle. Comparing the levels of documentation between these two projects is very unfair, but it serve as a useful example of where documentation can sometimes be lacking across all types of projects, both open and closed source.

So, lets look at what documentation it is important to write by considering how you might approach using a new library.

When you start using some code that you’ve not interacted with before the first thing that you need is to get a grasp on the concepts underlying the library. Some libraries are functional, some object orientated. Some use callbacks, others signals and slots. You also need to know the top level groupings of the elements in the library so you can narrow your focus that parts of the library you actually want to use.

Qt’s document starts in a typical fashion, with a tutorial. This gives you a very short piece of code that gets you up and running quickly. It then proceeds to describe, line by line, how the code works and so introduces you to the fundamental concepts used in the library. FFmpeg takes a similar approach, and links to a tutorial. However, the tutorial begins with a big message about it being out of date. How much do you trust the out of date tutorial?

Once you’ve a grasp of the fundamental design decisions that were taken while building the library, you’ll need to find out what functions you need to call or objects you need to create to accomplished your goal. Right at the top of the menu the QT documentation has links to class index, function index and modules. These let you easily browse the contents of the library and delve into deeper documentation. Doxygen is often used to generate documentation for an API, and it seem to be the way FFmpeg is documented. Their frontpage contains… nothing. It’s empty.

Finally, you’ll need to know what arguments to pass to a function and what to expect in return. This is probably the most common form of documentation to write so you probably (hopefully?) already write it. Despite my earlier criticisms, FFmpeg does get this right and most of the function describe what you’re supposed to pass into the function. With this sort of documentation it’s important to strike a balance. You need to write enough documentation such that people can call your function and have it work first time, but you don’t want to write too much so that it takes ages to get to grips with or replicates what you could find out by reading the code.

Some people hold on to the belief that the code is the ultimate documentation. Certainly writing self-documenting code is a worthy goal, but there are other levels of documentation that are needed before someone could read and the code and understand it well enough for it to be self-documenting. So, next time you’re writing a library make sure you consider:

  • How do people get started?
  • How do people navigate through the code?
  • and, how do people work out how to call my functions?

Photo of Book Pile by Paul Watson.

Cleaning Your Django Project With PyLint And Buildbot

Cleaning by inf3ktionThere are a number of tools for checking whether your Python code meets a coding standard. These include pep8.py, PyChecker and PyLint. Of these, PyLint is the most comprehensive and is the tool which I prefer to use as part of my buildbot checks that run on every commit.

PyLint works by parsing the Python source code itself and checking things like using variables that aren’t defined, missing doc strings and a large array of other checks. A downside of PyLint’s comprehensiveness is that it runs the risk of generating false positives. As it parses the source code itself it struggles with some of Python’s more dynamic features, in particular metaclasses, which, unfortunately, are a key part of Django. In this post I’ll go through the changes I make to the standard PyLint settings to make it more compatible with Django.

disable=W0403,W0232,E1101

This line disables a few problems that are picked up entirely. W0403 stops relative imports from generating a warning, whether you want to disable these or not is really a matter of personal preference. Although I appreciate why there is a check for this, I think this is a bit too picky. W0232 stops a warning appearing when a class has no __init__ method. Django models will produce this warning, but because they’re metaclasses there is nothing wrong with them. Finally, E1101 is generated if you access a member variable that doesn’t exist. Accessing members such as id or objects on a model will trigger this, so it’s simplest just to disable the check.

output-format=parseable
include-ids=yes

These makes the output of PyLint easier to parse by Buildbot, if you’re not using it then you probably don’t need to include these lines.

good-names= ...,qs

Apart from a limited number of names PyLint tries to enforce a minimum size of three characters in a variable name. As qs is such a useful variable name for a QuerySet I force this be allowed as a good name.

max-line-length=160

The last change I make is to allow much longer lines. By default PyLint only allows 80 character long lines, but how many people have screens that narrow anymore? Even the argument that it allows you to have two files side by side doesn’t hold water in this age where multiple monitors for developers are the norm.

PyLint uses the exit code to indicate what errors occurred during the run. This confuses Buildbot which assumes that a non-zero return code means the program failed to run, even when using the PyLint buildstep. To work around this I use a simple management command to duplicate the pylint program’s functionality but that doesn’t let the return code propagate back to Builtbot.

from django.core.management.base import BaseCommand

from pylint import lint

class Command(BaseCommand):
    def handle(self, *args, **options):
        lint.Run(list(args + ("--rcfile=../pylint.cfg", )), exit=False)

Photo of Cleaning by inf3ktion.

Unittesting QSyntaxHighlighter

Testing 1, 2, 3 by alisdairI’m using test driven development while building my pet project, DjangoDE. A key part of an IDE is the syntax highlighting of the code in the editor, so that’s one area where I’ve been trying to build up the test suite.

To test the syntax highlighter the obvious approach is to send the right events to write some code into the editor the check the colour of the text. Although the QT documentation is usually excellent, it doesn’t go into enough detail on the implementation of the syntax highlighting framework to enable you to query the colour of the text. In this post I’ll explain how the colour of text is stored, and how you can query it.

A syntax highlighting editor is normally implemented using a QPlainTextEdit widget. This object provides the user interface to the editor and manages the display of the text. The widget contains a QTextDocument instance, which stores the text. To add syntax highlighting you derive a class from QSyntaxHighlighter then instantiate it, passing the document instance as the parameter to the constructor. This is explained in detail in the syntax highlighter example.

The document stores the text as a sequence of QTextBlock objects. These store the text as well as the formatting information used to display it. You might think that you can just call QTextBlock::charFormat to get the colour of the text. Unfortunately it’s not that simple as the format returned by that call is the colour that you’ve explicitly set, not the syntax highlight colour.

Each QTextBlock is associated with a QTextLayout object that controls how the block is rendered. Each layout has a list of FormatRange objects, accessible using the additionalFormats method. It is this list that the QSyntaxHighlighter sets to specify the colour of the text.

Now we know where the colour information is stored, we can find out what colour a particular character will be. Firstly you need to find out which QTextBlock the text you want is. In a plain text document each line is represented by a separate block, so this is quite straightforward. You then get the list of FormatRanges and then iterate through, checking to see if the character you want is between format_range.start and format_range.start + format_range.length

For an example of this you can check out the test file from DjangoDE here


Photo of Testing 1, 2, 3 by alisdair.