linux programming

Caring For Your Environment

I’ve been thinking about where all my spare time goes recently because I just don’t have the time to do things I want to do, development-wise. In my life I have a couple of obvious (non-family related) time sinks like:

  • XBOX 360: Bioshock, Assain’s Creed, Gears of War, …
  • Physical Exercise

Now clearly physical exercise is unnecessary for a desk-jockey like me right? But there’s evidence to suggest that physical exercise might make me smarter. I need all the smarts I can get so I guess I’ll be continuing that for now.
What about gaming? Well I’ve been gaming since Jet Set Willy so I don’t think this is really a time luxury anymore. It’s now a defining character trait.

So that’s not given me very much wiggle room so I started to look more closely at what I actually do when I’m the big fat “H” in HCI. I discovered that I was spending some time on Project Euler which is definitely not time-wasted but is perhaps a little frivolous so I stopped it. But after that still no development work was getting done. Then I found I would spend a fair amount of time tending my development environment garden.

Gardening Again

Recent projects have included:

  • Switching my development machine from Gentoo to Ubuntu, ooo-ooo-ooo
  • Setting up SVN over SSH
  • Getting Emacs to provide true type font support
  • Upgrading Ubuntu to Gutsy for Compiz-Fusion support
  • Trying to get Gutsy to work with my f*cking ATI Radeon 9600 so I can actually use Compiz-Fusion
  • Trialling Lisp based tiling X window managers

And so it goes on. I can always think of something to do, and it’s very much like gardening I think. I admit that I haven’t really done much real gardening but when I did have a garden I found I could spend hours in it mowing, pruning, removing stones, painting fences, … you get the idea. The only difference is that with gardening, gardening and aesthetics are the objectives. The objectives of my development environment “gardening” are less clear. I’m clearly not getting very much productivity benefit from trying to get Compiz-Fusion to work, only that it makes me feel very powerful to be able to make an ATI graphics card work as expected with Linux.

What’s in your garden? Are the weeds ten feet high but you just can’t see them or could you submit it for a competition and win? Is this sort of OCD the preserve of the programmer or have I really lost it this time?


That Funny Nose Thing Again

In one of my previous jobs we had an unwritten rule that if you wanted to introduce a new programming language into the organisation you had to have a pretty good reason for it. When I joined that company around 1999 the company was using C/C++, Tcl and a little Java. By the time I had left they were using a lot of Java, a lot of Python (thanks, at least in part to me), a bunch more C++ and a steadily growing amount of C#. I wasn’t exactly responsible for the addition of the other languages but I contributed code to all of them I think.

Back then I decided that a small company should not adopt and keep that many technologies simultaneously within a single team without retiring some of the older code. From a company’s point of view it is in their interests to stop this in-house proliferation of tools & programming languages because it makes the code base both harder to support and harder to integrate. But it seems, to me at least that tools and programming languages are one part of the programmer condition. I just can’t get enough of them. They look shiny and new and full of promise. They are quite simply bewitching to me.

No one nose what I feel for you

Unlike Elizabeth Montgomery programming tools have little sex appeal and don’t do that funny nose thing. You know, the nose thing that makes everything better, when it all turns to shit at the end-of-the-show.

It is therefore in my company’s interests to pick languages & tools that are general purpose because it will reduce the possibility of tools proliferation later. But I know the drill. Hey, I practically wrote the drill. Find something I want to use, find a reason why I want to use it or why what we have now is deficient. Then bitch and moan until I get my way.

Sometimes though the benefits of a switch to a different tool or programming language can be compelling. Steve Yegge claimed last month that his MUD Wyvern has so many lines-of-code in Java that it is simply unsupportable by one tool/person and so he’s going to use Mozilla’s Rhino to reduce the LoCs. Yeah, that does sound like a good plan, but I think I’ll check back with Steve in 2010 to see how he’s getting on.

As I already mentioned in the “Towers of blub” I have been on a personal quest for about 1.5 years now to find a more powerful programming language. At the moment I have been learning Common Lisp. So it was that this week that my Tabitha nose picked up the strong scent of a new programming language gaining ground. The new player is Scala. I read a couple of blog posts about it, had a look at a tutorial a reference manual or two and was, as you Americans say, pretty stoked. I was thinking about when I was going to download it to see what it could do for me.

But then I was hit by a 10 foot wall of apathy. Whilst it’s interesting to be able to see as much of the language & tools landscape as is humanly possible I’m starting to wonder if it’s a very worthwhile use of my time. Perhaps I should stop evaluating all these different tools and languages and actually write some code. In fact if I was going to list over the years which technologies I’ve learned and subsequently forgotten, instead of coding, it would probably make quite a long list.

So I think I’ll do what Dan Weinreb’s going to do and just keep an eye on Scala to see what happens next. Now, since he is way smarter than me I reckon this is a pretty safe bet. BTW, I’ve tried it before people and this technique really does work. Pick someone whose opinion you respect (this is obviously never going to be a politician, a teacher or a member of law enforcement) and simply base your opinion on theirs. You don’t really need a great deal of rhetoric to back your arguments up just remember whose opinion you copied and come back to it later. So I’ll keep an eye on Scala and remember that I might need it one day and save the rest of my time for some more serious keyboard intercourse and beer.

Then I had the other slightly larger epiphany. More important, at least in hindisght, are not the tools and languages I use but the things that I do with those tools & languages. I’ll be more specific. The important things about what I do can be broken into technical and non-technical. The technical things like distributed computing, defensive coding, testing, multi-threading, relational databases and networking are important knowledge and experience that I draw on all the time and are language and tool independent. Those are the things that I really need to know to know how to program. But the things that make me (or would make me if I was any good at them!) really effective are the non-technical things like communication, interview and planning skills. I need to spend time working on developing all these other skills rather than finding the next new programming language or tool.

Bewitched first aired in the US on 17 September, 1964. A time when COBOL was pretty shiny and new. In 1991 I had to learn COBOL for my undergraduate degree. I have not used COBOL since the last programming assignment we did and I remember almost nothing about it. But I did learn something valuable from that assignment because it was the first time I had ever tried to produce a piece of software in a team.

So, it seems then that I really shouldn’t care very much, about what I have to create my solutions ‘in’, as long as I don’t have to use too many. I think that there’s ways round most programming language deficiencies, unless you use COBOL of course.

article finance programming python

Calculating peak-to-trough drawdown

Ok, so this is a little bit technical but it’s an intriguing puzzle that got me thinking quite hard. So here’s the problem. Sometimes investors want to be able to judge what the absolute worst case scenario would have been if they’d invested in something. Look at the following random graph of pretend asset prices:


You’ll see that there are two points on the graph (marked in red) where if you had invested at the first point and pulled out on the second point you would have the worst-case loss. This is the point of this analysis and is a way for investors in the asset to see how bad, ‘bad’ has really been in the past. Clearly past prices are not an indicator of future losses. 🙂

The upper one is the ‘peak’ and the lower one is the ‘trough’. Well, finding these two babys by eye is trivial. To do it reliably (and quickly) on a computer, is not that straight forward. Part of the problem is coming up with a consistent natural language of what you want your peak and trough to be. This took me some time. I believe what I really want is: the largest positive difference of high minus low where the low occurs after the high in time-order. This was the best I could do. This led to the first solution (in Python):

def drawdown(prices):
	maxi = 0
	mini = 0
	for i in range(len(prices))[1:]:
	   maxj = 0
	   max = 0
	   for j in range(i+1, len(prices)):
		if prices[i] - prices[j] > max:
		    maxj = j
		    max = prices[i] - prices[j]
	   if max > prices[maxi] - prices[mini]:
	   	maxi = i
		mini = maxj
	return (prices[maxi], navs[mini])

Now this solution is easy to explain. It’s what I have come to know as a ‘between’ analysis. I don’t know if that’s the proper term but it harks back to the days when I used to be a number-cruncher for some statisticians. The deal is relatively straight-forward: compare the fist item against every item after it in the list and store the largest positive difference. If this difference is also the largest seen in the data-set so far then make it the largest positive difference of all points. At the end you just return the two points you found. This is a natural way to solve the problem because it looks at all possible start points and assesses what the worst outcome would be.

The problem with this solution is that it has quadratic complexity. That is for any data-series of size N the best and worst case will result in N * N-1 iterations, in shorthand this is O(N^2). For small n this doesn’t really matter, but for any decently sized data-series this baby will be slow-as-molasses. The challenge then is to find an O(N) solution to the problem and to save-those-much-needed-cycles for something really important:

def drawdown(prices):
  prevmaxi = 0
  prevmini = 0
  maxi = 0

  for i in range(len(prices))[1:]:
    if prices[i] >= prices[maxi]:
      maxi = i
      # You can only determine the largest drawdown on a downward price!
      if (prices[maxi] - prices[i]) > (prices[prevmaxi] - prices[prevmini]):
	prevmaxi = maxi
	prevmini = i
      return (prices[prevmaxi], prices[prevmini])

This solution is a bit harder to explain. We move through the prices and the first part of the ‘if’ will find the highest part of the peak so far. However, the second part of the ‘if’ is where the magic happens. If the next value is less than the maximum then we see if this difference is larger than any previously encountered difference, if it is then this is our new peak-to-trough.

The purist in me likes that fact that the O(N) solution looks like easier code to understand than the O(N^2) solution. Although the O(N^2) solution is, I think, an easier concept to grapple with, when it’s translated into code it just doesn’t grok.

article programming

Time: the unseen global variable

Just about everyone knows that global variables need to be used sparingly. The more you use the more likely you are to capture complex state in places that are hard to maintain. Or something.

As well as all the globals you can see and measure there exists a shadowy league of ‘unseen’ globals in your programs. Some, like environment variables, are clearly designed as global variables and are desirable and understandable. However, some are wistful and ephemeral and dance round your program like wicked elves. Time is the biggest and most scarey of these elves.

For most programs you write time probably doesn’t matter, they are to all intents-and-purposes time-less. But as soon as you start entering the shadowy world of time, and the even more nebulous one of time-zones and daylight savings, a whole set of other state is being used. In my experience the programs and components that I have written that have been dependent on time have been some of the most complex to develop and maintain. This is for a variety of reasons but in summary:

time is not constant and can be interpreted in more than one way.

This leads to all manner of difficulties:

  1. Code that depends on the current system time ‘Now()’ and doesn’t pass it as a parameter is always going to be fragile. This is mostly because its behaviour can be non-deterministic unless you properly account for the fact that time is not-constant. This is especially important because your programs are susceptible to hard-to-spot boundary effects if you write expressions that use Now() more than once and depend on it returning the same value for each call. Which of course it never will.
  2. Time and date should never, ever, ever be separated from one another. You get all sorts of tricky errors when you split the two. Especially when you are performing some sort of time zone or daylight savings calculation where the two should change together but do not.
  3. Some programming languages represent the date (no time) as a date with a time of 00:00:00. Which is intuitive, but consider then what happens when you load a date (with no time) from a database in the past, when there were daylight savings, into a time when when there are no daylight savings. In the frame of reference of now your localised past time will now be an hour earlier and so will be in the final hour of the previous day. This problem clearly applies to timezones also but is because you made the mistake of not having a consistent view of time.
  4. Not only can the meaning of calendar time change after-the-fact (due to time-zones) but it can also be interpreted differently by different cultures.

There’s probably a lot of other time related pickles you can get yourself into. If Harold Lloyd was a programmer ...

You’d probably not be surprised to hear me say that unit-testing is one way of addressing at least some of these problems. This does two things. If you are to get good coverage for your unit-tests you are practically forced to make time a parameter wherever it’s used, instead of calling Now(). As a direct consequence of this your code can now be called ‘As Of’ and you will be able to offer the historical view where appropriate.

Indeed, I would say that where a piece of sotware has a time-context, then it will only be a matter of time before someone says: “Ok, that’s what it says today but what if I want to rerun it for that time in the past 3 weeks ago?”.

The time-zone and daylight savings problems can be nailed by having a consistent view on the treatment of time. For instance storing all dates/times as UTC is one thing. But if you ever need to store a local time then it should be clear what frame of reference is being used to store that time. So you might need to additionally know: the calendar, the timezone, and the daylight savings rules before you can correctly store a time.

Then and only then will time become your faithful and obedient friend.

article programming

Software Dream-ualisation

Ok, I admit it, by some measures I am a sad, sad individual. Why? Because sometimes I dream about programming. Now those that know me might be thinking that in my dreams I sit in amongst monster rigs hacking away at some monster problem.

I have been told that the best crackers in the world can do this under 60 minutes but unfortunately I need someone who can do this under 60 seconds. — Gabriel, from the movie Swordfish

Sadly the un-reality of my slumber is a little more prosaic and not like Swordfish at all. These dreams are always bizarrely specific programming tasks that would require a small amount of thought if I was awake but since I’m not conscious, they are a little harder.

Last night was different, I dreamt of nothing and woke to the sound of vomitting children (my own). Once that drama was resolved I couldn’t find a way to drift off to sleep again because for some strange reason I’d started thinking about software visualisation. I don’t even want to think about how I got onto that train of thought.

Anyway, the thought went something like this. Could software have colour? I couldn’t see why not. If software had colour then would it be useful? I reasoned that yes, it could be designed to be useful to give software a colour. For instance, colours could be assigned to code-patterns and this might ease in the understanding of that code. Since code is easier to write than to read this seemed like a worthwhile aim.

But then I thought perhaps a better visualisation would be to colour and orient objects on a plane based on the amount of messages that object issues / receives, or some other arbitary scheme. This sounded like a really rather jolly idea and I resolved to investigate it more fully in the morning and I promptly fell asleep.

When daylight arrived I found a link to this research that does something very similar to what I was dream-scribing but the site hadn’t been touched since 1998. Other research, whilst relevant, seemed similarly dormant. The most recent research I could find is here. It uses Vizz3D to turn software into ‘cities’ that can be navigated. This is indeed exciting stuff, even if it was done in C/C++.

It’s long since fascinated me that the world of software is a dynamic ever-shifting place but the tools with which we work on that software (especially for very large projects) don’t really help in trying to conceptualise that software. Indeed, the code most of us see in the maintenance phase is at a lower level of abstraction than that of the overall structure of that software and the structure can be very hard to see by just looking at the code.

Sure we can use various tools like profilers and coverage analysers to view different dimensions of the software plane but they are not the whole picture and compositing those analyses into a coherent whole is still not easy.

Fast forward ten years, perhaps DevStudio or Eclipse will ship with a project visualizer. The information transmitted in a single visualisation could save hours of code-grokking. It probably won’t change the world but it would be very, very, useful.

But perhaps in ten years we will have brains the size of water-melons and be able to program computers using only our minds (like in Firefox). I guess it’s time to go back to sleep now. Sweet dreams.