article


Ok, so this is a little bit technical but it’s an intriguing puzzle that got me thinking quite hard. So here’s the problem. Sometimes investors want to be able to judge what the absolute worst case scenario would have been if they’d invested in something. Look at the following random graph of pretend asset prices:

Peak-To-Trough

You’ll see that there are two points on the graph (marked in red) where if you had invested at the first point and pulled out on the second point you would have the worst-case loss. This is the point of this analysis and is a way for investors in the asset to see how bad, ‘bad’ has really been in the past. Clearly past prices are not an indicator of future losses. :-)

The upper one is the ‘peak’ and the lower one is the ‘trough’. Well, finding these two babys by eye is trivial. To do it reliably (and quickly) on a computer, is not that straight forward. Part of the problem is coming up with a consistent natural language of what you want your peak and trough to be. This took me some time. I believe what I really want is: the largest positive difference of high minus low where the low occurs after the high in time-order. This was the best I could do. This led to the first solution (in Python):


def drawdown(prices):
	maxi = 0
	mini = 0
	for i in range(len(prices))[1:]:
	   maxj = 0
	   max = 0
	   for j in range(i+1, len(prices)):
		if prices[i] - prices[j] > max:
		    maxj = j
		    max = prices[i] - prices[j]
	   if max > prices[maxi] - prices[mini]:
	   	maxi = i
		mini = maxj
	return (prices[maxi], navs[mini])

Now this solution is easy to explain. It’s what I have come to know as a ‘between’ analysis. I don’t know if that’s the proper term but it harks back to the days when I used to be a number-cruncher for some statisticians. The deal is relatively straight-forward: compare the fist item against every item after it in the list and store the largest positive difference. If this difference is also the largest seen in the data-set so far then make it the largest positive difference of all points. At the end you just return the two points you found. This is a natural way to solve the problem because it looks at all possible start points and assesses what the worst outcome would be.

The problem with this solution is that it has quadratic complexity. That is for any data-series of size N the best and worst case will result in N * N-1 iterations, in shorthand this is O(N^2). For small n this doesn’t really matter, but for any decently sized data-series this baby will be slow-as-molasses. The challenge then is to find an O(N) solution to the problem and to save-those-much-needed-cycles for something really important:


def drawdown(prices):
  prevmaxi = 0
  prevmini = 0
  maxi = 0

  for i in range(len(prices))[1:]:
    if prices[i] >= prices[maxi]:
      maxi = i
    else:
      # You can only determine the largest drawdown on a downward price!
      if (prices[maxi] - prices[i]) > (prices[prevmaxi] - prices[prevmini]):
	prevmaxi = maxi
	prevmini = i
      return (prices[prevmaxi], prices[prevmini])

This solution is a bit harder to explain. We move through the prices and the first part of the ‘if’ will find the highest part of the peak so far. However, the second part of the ‘if’ is where the magic happens. If the next value is less than the maximum then we see if this difference is larger than any previously encountered difference, if it is then this is our new peak-to-trough.

The purist in me likes that fact that the O(N) solution looks like easier code to understand than the O(N^2) solution. Although the O(N^2) solution is, I think, an easier concept to grapple with, when it’s translated into code it just doesn’t grok.

So, code reviews are great. Get the benefit of some ass-hole telling you that your comments should be C-style (/*) and not C++-style (//) and remind you that the member name ‘mSuckThis’ is not suitable, ever. No really, code reviews are great. It’s just that a lot of times they just don’t work.

The first time I encountered code-review was when my boss of the time had just read some book on how to manage programmers and was keen to inflict it on all his employees. His code-review process was to take all my work print it out and go through it line-by-line. Master and student style.

This type of code-review, in the way that he implemented it, was meaningless. It concentrated on an important but largely automatable aspect of code review and that is: adherence to coding guidelines.

As I see it there are three types of defect that code review is trying to identify:

  1. Adherence to coding guidelines (or lack of it) and inter-package dependencies.
  2. Identification of localised errors: “that loop is infite”, or “that algorithim should be log(N) and not N^2″, “that module is way too big”
  3. Identification of non-local errors. Where local means local to the code-review. For instance the impact of adding a throw on a widely used method and how that affects all the dependent code paths.

I question anyone’s ability to fully understand the dynamic nature of any reasonable sized piece of software by just looking at a small excerpt. Every time you review that code you have to ‘load’ that supporting information into your head to be able to identify whether the code is totally awesome or tragically bogus. In my experience defects of a local type (type 2) were sadly rarely identified by code review and defects of a non-local type (type 3) almost never.

The improvement of code-quality I’m passionate about. But I don’t see any realistic way to achieve what I want. To identify non-local errors you really need your code reviewer to sit next to you during the development or be as deeply involved in the code as you are. It probably would need a similar approach to reliably find local errors too. However your reviewer is rarely as involved as that. It seems that some judicious use of pair programming might be the answer but that comes with its own problems.

It seems that to get the best out of code-reviews you have to be very careful about how you implement them. Sure, let’s automate what we can automate and pair program on tricky items but the real code-review needs to be extremely skilfully handled to get the best bang-for-your-buck-chuck.

Just about everyone knows that global variables need to be used sparingly. The more you use the more likely you are to capture complex state in places that are hard to maintain. Or something.

As well as all the globals you can see and measure there exists a shadowy league of ‘unseen’ globals in your programs. Some, like environment variables, are clearly designed as global variables and are desirable and understandable. However, some are wistful and ephemeral and dance round your program like wicked elves. Time is the biggest and most scarey of these elves.

For most programs you write time probably doesn’t matter, they are to all intents-and-purposes time-less. But as soon as you start entering the shadowy world of time, and the even more nebulous one of time-zones and daylight savings, a whole set of other state is being used. In my experience the programs and components that I have written that have been dependent on time have been some of the most complex to develop and maintain. This is for a variety of reasons but in summary:

time is not constant and can be interpreted in more than one way.

This leads to all manner of difficulties:

  1. Code that depends on the current system time ‘Now()’ and doesn’t pass it as a parameter is always going to be fragile. This is mostly because its behaviour can be non-deterministic unless you properly account for the fact that time is not-constant. This is especially important because your programs are susceptible to hard-to-spot boundary effects if you write expressions that use Now() more than once and depend on it returning the same value for each call. Which of course it never will.
  2. Time and date should never, ever, ever be separated from one another. You get all sorts of tricky errors when you split the two. Especially when you are performing some sort of time zone or daylight savings calculation where the two should change together but do not.
  3. Some programming languages represent the date (no time) as a date with a time of 00:00:00. Which is intuitive, but consider then what happens when you load a date (with no time) from a database in the past, when there were daylight savings, into a time when when there are no daylight savings. In the frame of reference of now your localised past time will now be an hour earlier and so will be in the final hour of the previous day. This problem clearly applies to timezones also but is because you made the mistake of not having a consistent view of time.
  4. Not only can the meaning of calendar time change after-the-fact (due to time-zones) but it can also be interpreted differently by different cultures.

There’s probably a lot of other time related pickles you can get yourself into. If Harold Lloyd was a programmer ...

You’d probably not be surprised to hear me say that unit-testing is one way of addressing at least some of these problems. This does two things. If you are to get good coverage for your unit-tests you are practically forced to make time a parameter wherever it’s used, instead of calling Now(). As a direct consequence of this your code can now be called ‘As Of’ and you will be able to offer the historical view where appropriate.

Indeed, I would say that where a piece of sotware has a time-context, then it will only be a matter of time before someone says: “Ok, that’s what it says today but what if I want to rerun it for that time in the past 3 weeks ago?”.

The time-zone and daylight savings problems can be nailed by having a consistent view on the treatment of time. For instance storing all dates/times as UTC is one thing. But if you ever need to store a local time then it should be clear what frame of reference is being used to store that time. So you might need to additionally know: the calendar, the timezone, and the daylight savings rules before you can correctly store a time.

Then and only then will time become your faithful and obedient friend.

Something that’s been concerning me for some time is the cost and benefit of courses and seminars. Most employers and employees would perceive programmer’s training as a positive benefit and I think I’d have to agree but it seems that there is a common view that all training is good because it’s personal development. To deny that training to an employee would make you a bad employer because you are stunting your employee’s professional growth. Well I’m not so sure. I’d even go as far as to say:

A lot of technical training is of limited value.

There I said it. It’s out. I’m probably never going to get to go on a course ever again, ever.

The last purely technical course I went on was a compulsory learn Java course in-or-around 1999 (yes I’ve been avoiding courses since then). I remember it not for the content, which was forgettable, but for the fact that I’d snapped my wrist 1 week before and I could only type with one hand. The course, however, was custom designed for our company and our tutors had been briefed about what we needed to know. I would say that this sort of training, i.e. directed, has good benefit but again it only teaches the how. The why is lost.

Compare this with the ‘shrink-wrapped’ course. Which is offered by a training company on a technology and is a generic product. In my experience I probably end up using a small-ish fraction of the material learnt on such courses. This is because to attract the candidates they need to give the course a broad appeal. However, the chances are that I’m going on a course for the broad appeal are low, it’s more likely I’m doing it for a very narrow reason. A narrow reason usually defined by the next biggest project of the moment. Sure it’s helpful to know all the aspects of a particular technology, but the things I don’t need to know right now will very soon be forgotten.

This is not the only problem with shrink-wrapped courses. There is also a tendency for candidates to choose ‘advanced’ courses that are sometimes beyond their current ability, secure in the knowledge that “it can’t be that hard” and they will pick it up. When this happens the tutor has to work very hard to bring everyone up to the same level so that he/she can teach some of the more advanced aspects.

So this sort of leraning is inefficient in that the information that needs to be conveyed is often greater than is needed, but there’s another deeper problem with courses. In my opinion I think that programmer’s would sometimes be better schooled if they learnt good approach first and implementation details later. Take security for instance, almost every application these days needs some sort of built in security. I’d argue that it would be useful for programmers to go on a ‘security for programmers course’ which covered lots of different security aspects relevant to programmers. More useful than, say, a technical course suited to a particular technology which attempts to teach security, amongst a lot of other things. As I’m starting to learn it’s the principles that matter, not the implementations. At least this way you’ll see the entire security picture and then when faced with a situation which could be a security risk you can say ‘here’s a potential risk’ now I need to find a way to mitigate it.

In some ways this sort of training is perhaps best delivered inside the organisation by mentors. Big-brothers (and sisters) who can guide the novice through the general principles leaving the rookie to grapple with the fine details of the implementation. Sadly, when you get to my age, big-brothers are most likely to be grand parents so I guess I’ll just have to keep getting my training from Amazon.

Sometime in 1983 my mother noticed that a high street chemist that also sold photographic and electrical goods was also selling ZX81 computers for £20. It wasn’t christmas or a birthday but she thought that it might be useful for me. It wasn’t even something that I had asked for. She just thought it might be useful.

When we halved the polystrene casing we revealed the little black marvel, purest black with it’s name embossed in red. In case you ever forgot. I ran my fingers across its highly sensitive keypad and was sure I was witnessing something special. The form of the ZX81 is well known but for me as prominent in my memory was the little blue book which was the manual. ZX81 Manual Front Cover I spent a lot of time reading and referring to the ZX81 basic manual. The art work as much imprinted on my mind as the little black flashing  K  that was the ‘ear’ of the ZX81. The cover of the manual is futuristic with two tiny spacecraft parked on top of some space port or something. Completely stark raving crazy but it’s futuristic look added to the mystique of this little black box. One of the great parts about Sinclair research was their marketing. Their products, although remarkable for the time, were very poorly built and unreliable. But somehow they created desire.

The most memorable part of the book was a clock program from Chapter 19: Time & Motion. It’s hard to say quite what was so magical about this program, but I was awestruck when I ran it. The chapter doesn’t very clearly state that the code will draw a clock and a second hand but after typing it and pressing ‘RUN’ a numbered-dial slowly appears and then a dot sweeps around the outer-edge of the dial. For your pleasure I found a ZX81 emulator and typed the program in again and have recreated the magic for you right here.

Chapter 19: Time & Motion

Pretty heady stuff, I’m sure you’ll agree. Over the next two or so years I spent a lot of time buying Sinclair magazines and typing in programs from them. It was great. You bought a magazine that you could read and then you could type in the program and also get a game to play. All for 60p. The games mostly blew goats and I spent more time checking my typing than playing the game but that didn’t really matter. One of Sir Clive’s great ideas was to attach keywords to the keys themselves. This meant that there wasn’t really any need for a full-parser because the ZX81 knew what to expect and would make the keyboard accept the right keystrokes at the right time. Whilst not terribly flexible this solution also meant that there was a whole lot less typing.

I can’t claim that I learned a lot about computers or programming in those halcyon days but my cliche 1,000 mile journey had started with a single cliche’d step.

Oh Sinclair, oh my Sinclair ZX81,
We used to laugh and have such fun,
During our time together I have no regret,
I cherish the day that we met.

Everything about you was so damn fine,
From your RAM pack wobble to your sleek lines,
But now you do what time says you must,
You sit in a corner and you pick up dust.

Sniffle.

« Previous PageNext Page »