Categories
article project management

Understand stakeholders better for the best chances of success

Two Problems

Someone has given you a new shiny project to do, so you’ve got a problem that needs a solution. Thing is now you have two problems, the easy one is probably the shiny project, the harder one is managing the stakeholders.

To avoid poor project outcomes some stakeholders (like managers) will just want information, some are providing resources to the project and some have to be managed to prevent them derailing or threatening the project’s success.

First Steps in Stakeholder Analysis

As with anything in project management the first step is always to make a list that captures something of the problem you’re facing and from that we can pull out some germ of truth to help us figure out what to do next.

So in your list gather:

  • The stakeholder’s name, email, handle, etc.
  • The stakeholder’s role or relationship to the project, and any groups you can identify
  • The stakeholder’s project power vs project interest (optional)

We’ll explore these topics in a bit more depth now.

Roles

When ordinary people become stakeholders in a project some of their expectations as stakeholders differ on account of the different roles that the stakeholder plays. For example, the managing director of your firm will have a different point of view on the project to the lead developer, which will be different again than the point of view of the firm’s clients.

Some key roles to look out for in your project are:

  • Sponsor – the person who wants this project and whose word is final on the project. This person can clear the path of potential issues around and above the project (if necessary). Ideally it’s just one person, if it’s more than one try to make it one because it removes a potential political hurdle when dual sponsors have different views.
  • Subject Matter Experts (SMEs) – these are easy to spot, they’re the people that you can consult on more technical aspects or specifics. They may or may not be people that contribute directly to the project, but having an SME agree with you (or you agreeing with them!) is never a bad way to be.
  • Project Managers – you might think you’re the project manager, but often there are other people on the periphery who will exert control over the project without explicitly being given authority to do so. It’s good to identify the people that might do this, if only to communicate what the lines of responsibility actually are.
  • Project Team – the workers, this might include you but generally it’s anyone who will work on the project in whatever capacity. If you have lead engineers or support staff it’s useful to identify this too (so that when you share this information other people can use it like a directory).
  • Externals – so if you have external stakeholders to your organisation then the chances are their opinion is quite important. This is likely because they’re clients or they will be beneficiaries or affected by the project in some way. They will probably not contribute much to the project but you need to understand them. If you have more than one type of external stakeholder, record that as well.

Power vs Interest (optional)

There are many things you can do with stakeholder analysis. They’re all designed to give you insight into what people might expect. Power vs interest is a very common and reasonably useful way, but there are plenty of others, each with their own strengths and benefits.

To be honest there’s no absolute need for this but you might do it for two reasons:

  1. It’s sometimes quite revealing to formulate opinions on people’s expectations through this lens.
  2. It can be quite fun!

Before you make the list though, be careful to consider who might see your list. Even though you may try to be truthful/accurate you may find the people you have captured information on disagree with your categorisations. Which can be both embarrassing and frustrating, so consider whether you will make two lists. One to communicate and share with the other stakeholders, and one that is just for you to understand their motivations.

If you do decide to try it, it’s very straight forward. For each of your stakeholders mark them out of 10 in terms of their power/influence on this project (0=no power or influence, 9=very powerful/influential) and then mark them again for their interest (0=no interest, 9=very interested).

Then plot them on a scatter chart, where they appear on that diagram will influence how you communicate with them.

  • Bottom left – no power/no interest – you don’t want too many people in this quadrant if you do you might have a problem delivering the project.
  • Top left – lots of power/no interest – keep these people up-to-date they have the ability to derail your efforts.
  • Bottom right – no power/lots of interest hopefully the junior members of your project team is in here somewhere.
  • Top right – lots of power/lots of interest – hopefully your sponsor, PM and more senior project team members are in here.
Power and influence vs interest graph showing four quadrants to fully understand your stakeholders better

These things are very subjective, so when you plot it first time it probably won’t feel quite right. You might have scored some of the people on different aspects of power/interest (since there are many). So tweak it a bit until it feels like you can justify the diagram as a whole.

Putting it all together

You’ve got your list, you’ve plotted your scatter diagram what now? Well if you’re lazy like me you might leave it there. You’ve done the analysis, you know more now than you did at the start, but some more useful things you could do with your stakeholder analysis are:

  • Communicate On It – I’ve hinted at this already, but it’s very helpful for everyone to know what their role is on a project. It helps set their expectations from you and your expectations of them. I’d personally just share the version with just names and roles in it at the start of the project. If questions arise at this stage it’s good to get them out in the open.
  • Plan From It – a lot of the people in your stakeholder analysis are likely to be key resources in your project. Having them, their diaries and their availability easily accessible in your project folder makes everything a bit simpler.
  • Reorganise the Team – if after doing the power vs interest analysis you decide that your team isn’t balanced, and you have scope to do it, use the analysis to reorganise the team or make some substitutions. To be honest though I’ve rarely had this luxury.
  • Communicate With It – a lot of successful project management is about communication. If you understand the stakeholders then you can tailor the communications to suit the stakeholders. For example the sponsor is employing you to handle the details, so high level progress and issues will probably be fine for them – but it depends on their level of interest. The SMEs and externals probably just like to know high level progress but you might phrase it differently for each audience. Finally, the project team probably want short term detailed plans. So, come up with a small suite of reports and decide who gets what report and use your analysis as a guide to the distribution and content.
Categories
article graphics shaders

New Toys

I have been interested in computer graphics for a long time but never really interested enough to make some positive steps toward it. Like a lot of people, I have tried to get a basic app working in OpenGL or DirectX but never really got very far. It was all a bit intimidating.

However things have changed. These days, on DirectX, there are a load of fantastic tutorials on the internet, as well as seriously helpful libraries. If you’re interested in getting some physics working there’s a few good books and often the content is backed up with source code. For WebGL (which is a flavour of OpenGL) there’s great tutorials to help, and really cool insights into how it works.

There’s quite a lot to take in before you can really grok what the graphics pipeline does, whatever API flavour you chose. It was whilst trying to figure out how shaders work that I stumbled across some stunning examples of what is possible. What I didn’t realise, at first, is that the examples I was looking at are simply fragment shaders.

If you don’t know how 3D graphics works you might say ‘so what?’. But let’s just say it’s not the easy route to getting 3D computer graphics done – not to me anyway. The mathematics involved looks in a lot of ways harder, but the programming looks way easier. Programming this way is mostly declarative, there’s no bonkers API and there are far fewer loops because the magic is in the power of the GPU and the shader loop.

My only problem is that shadertoy.com doesn’t quite work how I want it to. For one thing it keeps timing out (guess they need more funds, so I added some through Patreon), but that aside I wanted a bit more control over the shader and how it can be embedded. Partially for this blog but also to learn a bit more WebGL.

That happened two weeks ago. I’ve been messing around trying to get something working for this site and making some embeddable shaders that I can embed directly here. I think I’m almost there …

Categories
article

Server timed out. Rebooting.

TL;DR: The hat is back!

It has been over 10 years since I last wrote something on this blog. Some usual life stuff happened that I won’t trouble you with. The rain fell, but mostly, the sun shone.

After 5 years of letting the site rot I’ve just spent the best part of 2 days getting it back up, it’s had:

  • OS upgrade;
  • WordPress upgrade;
  • but most importlanty a new logo!

5 years of rot has gifted me about 4,000 accounts created by SEO spam-bots so in a fit of rage I deleted all registered accounts. As far as I can tell there is very little spam (if any) on the site, which brings me to my next point.

I’ve disabled comments. I realise, a blog without comments isn’t really a blog at all. But the problem is, as the number of spam posts increase it becomes painful to moderate and so I don’t. Turning off comments is fairer to potential commenters, who might write a comment and never have it published. At least I’m saving you some time.

I’m looking into alternatives, but so far in all the usual places. Perhaps I’ll find something that addresses my problems with spam that doesn’t involve Google. We’ll see.

After all the work of getting the site up I felt like I should probably write something. I’ve been doing some interesting (to me) hobby research into knowledge engineering and separately, computer graphics and shaders. Alongside my day job of project management, there’s a lot of potential topics, and perhaps some of those musings will end up right here – in the hat.

To keep the pace up, the promise I’m making myself is that new posts will be shorter. So … let’s end it there and see what happens next.

Categories
article programming

The Spread-able System

Spreadsheets are everywhere. They are simple to create and are an immensely powerful tool. Unsurprisingly then this means that a lot of areas of business rely on spreadsheets to function correctly. But spreadsheets are dangerous too. They suffer from, well-known, fundamental flaws.

The problem is that spreadsheets are a special type of code, and I’m not talking about the Excel ‘macros’ I’m talking about the formulas. As such they probably need to be treated the same way as other types of code, but their very nature makes this difficult. But I’m getting ahead of myself, let’s first look at some of what is good and bad about spreadsheets.

Pros

Spreadsheets are remarkable for their:

  • Utility – we can bend them into almost any shape we want because they give one way to represent almost any business process;
  • Portability – we can pick up our little gobbets of data and logic and relocate them to almost anywhere inside or outside the company, in file-systems, mail servers and web-sites;
  • Simplicity – you don’t have to explain a spreadsheet to anyone. They might have to be a proto-genius to figure out how it works but the working knowledge they would need to get started is pre-loaded in their heads and ready-to-run.

Cons

So they sound pretty useful, and I like to think that I’m a pragmatic guy, so why do I hate them so much? Many have noted about the shortcomings of spreadsheets. The page on spreadsheets at Wikipedia spells it out clearly enough so I’ll paraphrase:

  1. Productivity – Working with spreadsheets requires a lot of “sheet-shuffling” to reach the required goal. The bigger the sheet, the more time is spent copying, cutting and pasting cells around.
  2. Reliability – Although what consitutes an error in a spreadsheet is subjective, the paper A Critical Review of the Literature on Spreadsheet Errors” (pdf) reveals a series of studies (some more recent than others) that have shown that approximately 5% of cells contain errors.
  3. Collaboration – Sharing a spreadsheet is difficult. Having two independent people working on the same sheet and merging their results is as far as I know impossible.

The first two items don’t bother me overly. Yes, it’s a problem but then the alternatives aren’t that great either. Consider what you would do if you didn’t have a spreadsheet to fulfill the task. You’d either do it with a bit of paper and a calculator (i.e. simulate a spreadsheet) or get a programmer to do the task for you. Either way the amount of productivity loss/gain and the amount of errors aren’t going to be that significantly different from using a spreadsheet. Don’t get me wrong, I love my fellow programmer, but we make a LOT of mistakes too. The difference perhaps is that bespoke systems usually end up getting audited (and hence fixed) and spreadsheets often don’t. Although this point is probably moot.

Good + Bad = Too Bad

My real beef is with what happens when you have the ‘pro’ of high portability with the ‘con’ of low collaborative power. You have no way of knowing which version of the spreadsheet you have is the “true” one, and which version is duff. Every copy, whether it be inadvertently through forwarding a sheet by email to someone else or explicitly by taking a ‘backup’ is a 12 foot tall baby-eating, business-crushing monster waiting to rip you and everyone you love apart.

Hug the Monster, Then Run

The thing is we kind of have to embrace the baby-business-beating monster because it’s about all we’ve got. There are some tasks, as a programmer, that I’m really happy that you as the non-programmer don’t bother me with and solve yourself in sheets. Want to set-up an intra-company phone-book as a spreadsheet so you don’t have to bother will all that “Access” voodoo? Be my guest, but I’m watching you. Want to set-up a spreadsheet to run your fantasy football so you don’t have to add two numbers together? Go right ahead, I’ll even drive you to the game so you don’t miss the turn. Want to set up a spreadsheet to calculate payments and and do a mail-merge with the results … STOP. RIGHT. NOW.

The truth is though that you might not know that you’re creating the mother-of-all spreadsheets when you start. I might not know it either but there will probably come a time when a line is crossed and then I will want to know what you’ve been doing and who you’ve been doing it with. I’m just like that.

Unless you are small company (and hence don’t have a lot of choice) you have to be very afraid of trusting anything that might lose you money to a spreadsheet. You need to be very aware of the risks and the potential-costs you are letting yourself in for. Here in Europe there is even a special interest group dedicated to highlighting the risks of spreadsheets. Those guys must throw wild parties …

The Missing Links

In my opinion there is something missing, something that can fill the gap between spreadsheet and system.

I think we need something that can:

  1. Track spreadsheet changes – Not knowing which spreadsheet is “true” and which lies (by being able to identify revisions of the sheet that have happened after yours was ‘branched’), and not being able to merge sheets is a problem. Perhaps someone solved it already, if they had that would be great.
  2. Track spreadsheets themselves – Having some more information about what sort of corporate-data was being accessed, who was using it and how frequently they ran it might alert us to potential spreadsheet monsters being born.
  3. Narrow the gap – Making spreadsheets more like traditional software systems, without significantly castrating the usefulness of the spreadsheet, would be great too. This is a little like asking for the moon on a stick though.

Perhaps I’ll make something like this one day. I have to admit it’s not a terribly exciting project but it has some potential I think. Perhaps I could spice it up by throwing a party and invite the guys from the “European Spreadsheet Risks Interest Group”. Now we’re talking. How will I budget for the 7-up, party hats and streamers? In a spreadsheet of course.

Categories
article

Educate A Business Person Today!

One recurring theme I have noticed with users of systems I’ve worked on, is that they aren’t nearly as stupid as I think. They often try to make sense of the system thrust in-front of them. This is mostly out of necessity since it stands between them and them doing their job, and so they need to make sense of it. Having written some truly awful systems myself I wish all of them the very best of luck.

Another, seemingly unrelated observation, is that business people are truly astounded, and often suspicious, of how long it takes to provide a solution to a particular problem. Writing software is simply hard so that partly explains it. Sometimes however some of the solutions that are asked for can come quickly. This might happen if the system was expressly designed to handle new cases of the particular solution being requested or if producing the solution requires little more than a configuration or script change. Or it might just be dumb luck that the release cycle has worked in their favour.

Disconnect

This disconnect between implementation times, with no apparent reason to the business user can cause problems. Sometimes it feeds the suspicion that they are being ‘had’ in some elaborate con:

“If change ‘x’ takes a week then surely change ‘y’ should take half as long. How could it not? It only takes half as many words to say out loud. Those guys in IT need firing.”

When that business person is a manager it can lead to awkward situations for developers:

“Change ‘x’ took a week, change ‘y’ will take half as long. How can it not? It’s the only thing that stands between us and product success. If it doesn’t I’ll fire those IT guys”.

The simple truth is that unless a business person has also the developer’s view of the system they will not be able to make sound judgements about it. Hell, I have a developers view and not even my judgements are particularly sound.

However humans are pretty adaptable creatures, and rather than telling them the answer we should explain the answer in a way that they can understand. If they want to listen then educating them has a few potential benefits. For one thing it might make you look like you care about your users, rather than being that IT jerk who steals everyone’s food from the refrigerator. However, if you get your point across without sounding (to them) like a lunatic then you might improve their mental model of how the system actually works.

Breed

There is a breed of programmer out their in the world today that has either evolved or engineered themselves into a situation where they are the only one who ‘knows’. Yes, you know who you are. Sometimes they do this as a survival instinct to make themselves indispensable, sometimes because they’re not great communicators or educators. These are the people that need to be fired because their value is way-less than they think it is. They actually harm the productivity of the company by being obstructive or uncommunicative, plus they’re a real pain-in-the-ass to work with.

Yes you will need to keep a watchful eye on your newly educated fledglings. Especially the managers, but there’s nothing knew about that.

Categories
article programming

Floating Point Flotsam

I have never been particularly clear about when to choose single or double floating point arithmetic. I think I have been operating on a sort of ‘trial-and-error’ approach for some time. It seems ludicrous that I should not, after all these years, know exactly when to chose single or double precision arithmetic. The truth is that I don’t, and it’s now time to fix that.

First, a little background. I was, for reasons that are too sad to explain, interested in how to calculate (accurately) the time of the Vernal Equinox. That is the exact time of the year (to the minute – if possible) that the sun is 90o above the earth at the equator. Anyway, I found some code which is an implementation of an astronomical algorithim designed expressly for the purpose of calculating the vernal equinox and is accurate to about 20 minutes. Which is good enough for now.

Now the next piece of background is that I was also doing this in Lisp and up until now I have let the reader interpret all the literals I enter. When I tried to do the calculation in the REPL I could get no closer than being in the same part of the day as I was expecting and with the result somewhat rounded to the nearest half day. After some head scratching it finally occurred to me that the the computation required more significant digits than I really had. It seems that the formula I was entering was being interpreted as single precision floating point numbers because if I had wanted double’s I would have suffixed my literal numbers with a ‘d’. It would seem that d could also stand for D’uh. Seems fair enough. Time to do some research then …

You see, according to the IEEE standard 754-1985 a single precision floating point number has 23 significant bits and a double has 53 significant bits. For me to answer the question of how many significant figures I can get in decimal would require me to know that the smallest decimal fraction that I can represent in binary is. This number would be 1/223 which is about .00000011920928955078125.

Now, floating point numbers are represented from a fractional part and an exponent part to give the decimal representation of the number. Therefore you can never get more accuracy than the smallest binary fraction multiplied by the exponent you have. This means that the significant figures should be something like:

log10(1/(2^23)) = -6.9

Therefore when a number has 7 significant figures you are already losing a little accuracy, the more significant figures you add the worse it will get. It was then clear that my astronomical antics were less than stellar since the first literal in the computation has 11 significant figures.

Indeed whilst I was thinking about this problem it occurred to me that if I wanted to continue using single precision I could split the fractional part from the integer part and continue this way. However, this is still inferior to a double because it would give me 7 significant figures for both parts and therefore a total of 14 significant figures. This is inferior because using my shiny new brain I can show that a double will give about 16 signficant figures. Of course I could have also concluded that double’s are better than two floats by noting that 23 bits+ 23 bits < 53 bits but that would never have been as much fun.

Type Word Size Mantissa Dec Sig. Figs.
Single 32 23 7
Double 64 53 16
Extended 96 63 19
Quad-Extended 128 113 34

You could argue that I could have saved myself 20 minutes of time by looking the answer up but then again I would never remember the answer unless I proved it to myself first. So, now my spring occurs at the same time as everyone else’s and I know why. Oh it happened already? Sheeeeiiiitttt.

Categories
article

A Version Aversion

I like war stories. They remind me that I’m not alone on this journey.

This is a war story. It involves a lot of entrails, questionable surgery and plenty of walking wounded. I carry the scars so perhaps you don’t have to …

… when I joined the team the system consisted of:

  1. A few thousand lines of C++ code which ran as 10 process on two Solaris hosts;
  2. Some Java code that ran on an NT4 J2EE server;
  3. A bunch of windows client PCs all running the same (but different to the server) Java VM
  4. A collection of ‘glue’ scripts written in shell script and Tcl.

It ran in two locales, and the hardware was broadly equivalent in both, from what I recall. Now compare this to what we had by the time I left:

  1. A larger body C++ code with 15 or so processes
    on one Solaris host as well as a set of additional x86 UNIX hosts (of mixed hardware pedigree) that were sprinkled with various flavours of Solaris and Linux (RedHat) that would run between 2-4 processes depending on the number of cores;
  2. 2 J2EE servers of the same vendor;
  3. A further 2 J2EE servers of a different vendor (don’t ask!);
  4. A collection of scripts written in shell script & Python (thankfully we stamped out the Tcl);
  5. 3 primary locations each running a 2 different versions of our software and a single satellite location (hanging from a primary location).

Each server release involved somewhere around 10 hosts running different hardware, OS & JVM. It’s similar and different to the problem that software vendors must have when they need to make their product run on multiple platforms. However the difference for us was that our software was a distributed system and each component needed to seamlessly interact with its peers. Something that not many software vendors make a habit of, other than Microsoft I suppose.

Against this background was a team of 10 developers in 3 timezones developing software for a constantly changing and fairly lucrative business. Quickly made enhancements could secure profits, instability and failures might secure losses so it was important to try and keep the system running as smoothly as possible. However, the large code base (>100,000 lines) and confusing deployment array made every release a roller coaster ride. In my last two years of the job the release cycle, whilst somewhat improved from when I started, had increased from 1-2 months to almost 6. This had an unforseen consequence that developers would, out of necessity, place new features onto release branches to be able to get features out faster. That’s when the madness started.

There were too many release versions, operating system versions and client library versions to contend with. Sometimes even trivial changes become enormous chess games where the order of the changes that we made would determine whether the system would actually run or not. Eventually it was bound to grind to a halt because with that many deployment configurations each release had too many testing dependencies. There were two problems here, firstly since this was now a very widely distributed system it would be difficult for us to have an accurate test deployment that worked. Further, making distributed systems work is hard anyway and the more configurations you have to manage the more complex it’s going to be. We were trying to help ourselves by retrospectively adding unit-tests but the coverage was still fairly low and so we could never have very much confidence that a built system was actually going to work. What we really desperately needed were integration tests but we never quite managed it.

That’s where Joel’s post from last week comes in. As described by Joel we essentially had a SEQUENCE-MANY situation. Where to be sure of stability we had to test many releases against many deployment configurations. It would be fair to say that we failed to do this adequately. I sometimes wonder if we could have done it a little better.

  1. Could We Have Had Tighter Control Over The Hardware? Unlike the problem of enforcing standards in Web browsers we of course had full control over the deployment environment so we could have mandated a common platform for it. As enticing as this sounds, talking to system admins now and then would tend to suggest that this is simply not possible if the hardware is to be purchased incrementally. This is because after you buy the first 2 Dell servers with a standard specification, a month later that specification will have changed. As more time passes the drift between the hardware is larger.

    If, however, you sourced a job lot of the hardware in the same place at the same time, you could buy extra (for spares and future requirements) and attempt to keep this variable constant. It would have been expensive to do but it is at least possible in this scenario. I think that this probably would have reduced the number of different cross-compilations that were required and reduced the number of different JVMs that we had to manage. The biggest problem though is that we would have, to a certain extent, needed to know the future to be able to predict what sorts and what amounts of hardware we would need when we set out. That kind of makes it a non-starter, coupled with the fact that I’ve never actually heard of anyone doing this for real.

  2. Could We Have Had Tighter Control Over The Software This is the thing that concerns me the most and is definitely a place we didn’t do as well as we should have. We let people go ahead and implement locale specifc solutions that were unworkable globally but those created internal system dependencies that would later need to be ‘undone’. Anyone who has ever worked on a system after its release will know it’s much easier to get it right first time. This is because if you create an intermediate solution that ends up being used then you have to manage the old intermediate-version ‘out’.

    Indeed, there was a story here too. The original system architect moved on a year or so after I joined. He used to worry about 80% of the code that got committed, when he left no-one really had his insight into the architecture and the rust quickly set in. Related to the loss of architect, as already mentioned, was the lack of integration tests. Both would have helped us to identify which code was bogus and have it fixed before it reached a release stage.

The one thing we did succesfully manage to do was to stop developers changing release branches. But the effect of that was to make us look like chumps when the business had to be denied features until new releases could be rolled out. Ho hum.

The idealist in me thinks we could have done a few things to make it work better but the pragmatist thinks that we did what we had to do. Whilst the idealist in my head makes a lot of noise and gets listened to an awful lot the pragmatist is the one who gets the most results. When you are faced with a daily tightrope walk, like we were, you have to try and be both idealist and pragmatist. Choosing the idealist’s course when you think you can get away with it and the pramatist when you can’t.

But when all else fails just hope for the best. The scars will heal. Eventually.

Categories
article finance programming python

Calculating peak-to-trough drawdown

Ok, so this is a little bit technical but it’s an intriguing puzzle that got me thinking quite hard. So here’s the problem. Sometimes investors want to be able to judge what the absolute worst case scenario would have been if they’d invested in something. Look at the following random graph of pretend asset prices:

Peak-To-Trough

You’ll see that there are two points on the graph (marked in red) where if you had invested at the first point and pulled out on the second point you would have the worst-case loss. This is the point of this analysis and is a way for investors in the asset to see how bad, ‘bad’ has really been in the past. Clearly past prices are not an indicator of future losses. 🙂

The upper one is the ‘peak’ and the lower one is the ‘trough’. Well, finding these two babys by eye is trivial. To do it reliably (and quickly) on a computer, is not that straight forward. Part of the problem is coming up with a consistent natural language of what you want your peak and trough to be. This took me some time. I believe what I really want is: the largest positive difference of high minus low where the low occurs after the high in time-order. This was the best I could do. This led to the first solution (in Python):


def drawdown(prices):
	maxi = 0
	mini = 0
	for i in range(len(prices))[1:]:
	   maxj = 0
	   max = 0
	   for j in range(i+1, len(prices)):
		if prices[i] - prices[j] > max:
		    maxj = j
		    max = prices[i] - prices[j]
	   if max > prices[maxi] - prices[mini]:
	   	maxi = i
		mini = maxj
	return (prices[maxi], navs[mini])

Now this solution is easy to explain. It’s what I have come to know as a ‘between’ analysis. I don’t know if that’s the proper term but it harks back to the days when I used to be a number-cruncher for some statisticians. The deal is relatively straight-forward: compare the fist item against every item after it in the list and store the largest positive difference. If this difference is also the largest seen in the data-set so far then make it the largest positive difference of all points. At the end you just return the two points you found. This is a natural way to solve the problem because it looks at all possible start points and assesses what the worst outcome would be.

The problem with this solution is that it has quadratic complexity. That is for any data-series of size N the best and worst case will result in N * N-1 iterations, in shorthand this is O(N^2). For small n this doesn’t really matter, but for any decently sized data-series this baby will be slow-as-molasses. The challenge then is to find an O(N) solution to the problem and to save-those-much-needed-cycles for something really important:


def drawdown(prices):
  prevmaxi = 0
  prevmini = 0
  maxi = 0

  for i in range(len(prices))[1:]:
    if prices[i] >= prices[maxi]:
      maxi = i
    else:
      # You can only determine the largest drawdown on a downward price!
      if (prices[maxi] - prices[i]) > (prices[prevmaxi] - prices[prevmini]):
	prevmaxi = maxi
	prevmini = i
      return (prices[prevmaxi], prices[prevmini])

This solution is a bit harder to explain. We move through the prices and the first part of the ‘if’ will find the highest part of the peak so far. However, the second part of the ‘if’ is where the magic happens. If the next value is less than the maximum then we see if this difference is larger than any previously encountered difference, if it is then this is our new peak-to-trough.

The purist in me likes that fact that the O(N) solution looks like easier code to understand than the O(N^2) solution. Although the O(N^2) solution is, I think, an easier concept to grapple with, when it’s translated into code it just doesn’t grok.

Categories
article project management

You think your code don’t smell?

So, code reviews are great. Get the benefit of some ass-hole telling you that your comments should be C-style (/*) and not C++-style (//) and remind you that the member name ‘mSuckThis’ is not suitable, ever. No really, code reviews are great. It’s just that a lot of times they just don’t work.

The first time I encountered code-review was when my boss of the time had just read some book on how to manage programmers and was keen to inflict it on all his employees. His code-review process was to take all my work print it out and go through it line-by-line. Master and student style.

This type of code-review, in the way that he implemented it, was meaningless. It concentrated on an important but largely automatable aspect of code review and that is: adherence to coding guidelines.

As I see it there are three types of defect that code review is trying to identify:

  1. Adherence to coding guidelines (or lack of it) and inter-package dependencies.
  2. Identification of localised errors: “that loop is infite”, or “that algorithim should be log(N) and not N^2”, “that module is way too big”
  3. Identification of non-local errors. Where local means local to the code-review. For instance the impact of adding a throw on a widely used method and how that affects all the dependent code paths.

I question anyone’s ability to fully understand the dynamic nature of any reasonable sized piece of software by just looking at a small excerpt. Every time you review that code you have to ‘load’ that supporting information into your head to be able to identify whether the code is totally awesome or tragically bogus. In my experience defects of a local type (type 2) were sadly rarely identified by code review and defects of a non-local type (type 3) almost never.

The improvement of code-quality I’m passionate about. But I don’t see any realistic way to achieve what I want. To identify non-local errors you really need your code reviewer to sit next to you during the development or be as deeply involved in the code as you are. It probably would need a similar approach to reliably find local errors too. However your reviewer is rarely as involved as that. It seems that some judicious use of pair programming might be the answer but that comes with its own problems.

It seems that to get the best out of code-reviews you have to be very careful about how you implement them. Sure, let’s automate what we can automate and pair program on tricky items but the real code-review needs to be extremely skilfully handled to get the best bang-for-your-buck-chuck.

Categories
article programming

Time: the unseen global variable

Just about everyone knows that global variables need to be used sparingly. The more you use the more likely you are to capture complex state in places that are hard to maintain. Or something.

As well as all the globals you can see and measure there exists a shadowy league of ‘unseen’ globals in your programs. Some, like environment variables, are clearly designed as global variables and are desirable and understandable. However, some are wistful and ephemeral and dance round your program like wicked elves. Time is the biggest and most scarey of these elves.

For most programs you write time probably doesn’t matter, they are to all intents-and-purposes time-less. But as soon as you start entering the shadowy world of time, and the even more nebulous one of time-zones and daylight savings, a whole set of other state is being used. In my experience the programs and components that I have written that have been dependent on time have been some of the most complex to develop and maintain. This is for a variety of reasons but in summary:

time is not constant and can be interpreted in more than one way.

This leads to all manner of difficulties:

  1. Code that depends on the current system time ‘Now()’ and doesn’t pass it as a parameter is always going to be fragile. This is mostly because its behaviour can be non-deterministic unless you properly account for the fact that time is not-constant. This is especially important because your programs are susceptible to hard-to-spot boundary effects if you write expressions that use Now() more than once and depend on it returning the same value for each call. Which of course it never will.
  2. Time and date should never, ever, ever be separated from one another. You get all sorts of tricky errors when you split the two. Especially when you are performing some sort of time zone or daylight savings calculation where the two should change together but do not.
  3. Some programming languages represent the date (no time) as a date with a time of 00:00:00. Which is intuitive, but consider then what happens when you load a date (with no time) from a database in the past, when there were daylight savings, into a time when when there are no daylight savings. In the frame of reference of now your localised past time will now be an hour earlier and so will be in the final hour of the previous day. This problem clearly applies to timezones also but is because you made the mistake of not having a consistent view of time.
  4. Not only can the meaning of calendar time change after-the-fact (due to time-zones) but it can also be interpreted differently by different cultures.

There’s probably a lot of other time related pickles you can get yourself into. If Harold Lloyd was a programmer ...

You’d probably not be surprised to hear me say that unit-testing is one way of addressing at least some of these problems. This does two things. If you are to get good coverage for your unit-tests you are practically forced to make time a parameter wherever it’s used, instead of calling Now(). As a direct consequence of this your code can now be called ‘As Of’ and you will be able to offer the historical view where appropriate.

Indeed, I would say that where a piece of sotware has a time-context, then it will only be a matter of time before someone says: “Ok, that’s what it says today but what if I want to rerun it for that time in the past 3 weeks ago?”.

The time-zone and daylight savings problems can be nailed by having a consistent view on the treatment of time. For instance storing all dates/times as UTC is one thing. But if you ever need to store a local time then it should be clear what frame of reference is being used to store that time. So you might need to additionally know: the calendar, the timezone, and the daylight savings rules before you can correctly store a time.

Then and only then will time become your faithful and obedient friend.