Virtual Host Development: Part 2

Last week I made the case for using virtual hosts for software development purposes. So this week we’re going to actually try it out. There are a number of considerations to make before we get stuck in so let’s go …

Choosing a Virtual Host Manager

As far as I can tell there are two main equivalent software products for virtual host solutions. They are: Microsoft Virtual PC 2007 (MVPC) and VMware Workstation 6 (WS6). The most notable difference between the hosts is that WS6 costs (at time of writing) $189 and MVPC is free.

However, free comes at a cost. There are a number of disadvantages (for me at least) with MVPC that mean that I will be ponying up the cashwa for vmware’s offering.

  • Guest Display Hardware – Since we have a virtual host it also has a virtual display card. The one that sits inside MVPC is only capable of screen resolutions upto 1600×1200. Since I have a wide-screen display that is set to 1920×1200 it’s a bit of a waste. However vmware can also display on multiple monitors making use of dual head displays.
  • Guest Support – Yes I love Linux and I have to say (because I tried it) that trying to create an Ubuntu virtual guest on Virtual PC was like trying to herd cats. I’d get one part working only to find that something else was bust, whilst it’s possible it’s a pain. If you decide to go ahead with this you should take a look at this article which explains the hoops you must get your kitty’s through. In theory, I guess, you should be able to get any x86 OS that works in WS6 working in MVPC (as long as it isn’t 64 bit). So far though, and this is an observation based on one data point, I think the virtual hardware in WS6 must be more generic than MVPC.
  • Guest OS Tools – Both MVPC and WS6 require you to install additional tools after the core OS install that enhance the operation of the virtual machine. Why I couldn’t really figure out exactly what the tools do it seems reasonable to assume that they’re useful. Indeed it’s not until you install the tools on WS6 that you can access the higher resolutions. With WS6 then it provides tools for Linux where as MVPC does not. Whether this makes an observable improvement to the WS6 experience for Linux would require more time to tell ..
  • Host OS Support – VMware will run on Linux. So should I decide that I want to take my removable disks and put them somewhere else I could. At least in theory
  • Converters – WS6 has a free ‘converter’ which allows the creation of WS6 virtual windows (only) hosts from MVPC ones. Which is nice. Not only that though the converters are bootable so that you can create a virtual ‘copy’ of an already installed non-virtual OS. Whilst I did successfully convert MVPC hosts I didn’t try and virtualise an existing host
  • Snapshot Manager – vmware has a snapshot manager which makes the management of snapshots a lot simpler to comprehend. You get a sort of snapshot time-line that you can chose from and cycle through. MVPC has a similar concept but it’s not as clean.

Snapshot Manager

The virtual host manager you chose will depend on what you need. For me MVPC is not a choice, however there are probably a fair few people who could get good results from MVPC making the benefits of WS6 not that important. Since you can download a trial version of WS6 I would suggest you at least try it before deciding. To me WS6 product feels mature and the user interface seems more intuitive. Your mileage may vary.

For the remainder of this discussion I will talk about using WS6. Whilst you could in theory use MVPC and mostly apply the same ideas the two are not equivalent, as already discussed.

Creating The Host

Having settled on vmware I should say a little about the settings I chose. I’m not saying that these are ideal but for development purposes they seemed like sensible defaults to me. Firstly I instructed WS6 to create me a custom virtual host. For the most part I used the presented defaults but deviated on a few items:

  1. Non-Local Disk – When I say non-local I mean non-local to the host OS. If the disk of a virtual host is the same physical disk of the non-virtual host then you will have I/O performance issues. This is especially true for laptops that tend to have slower hard disks than their desktop equivalents (4200rpm vs 7200rpm). That’s why I went and bought an external drive enclosure and speedy hard disk before I started
  2. Memory – Since these hosts are development hosts I’m probably going to need some RAM, especially since I intend to run VS2005 on the windows host. Therefore I chose 1Gb RAM for the windows hosts and 512Mb for the Ubuntu host. Note that for windows changing the amount of memory after the OS is activated can force you to reactivate your license. So getting it right-enough first time would be a bonus. After running both systems on this configuration for a couple of weeks I can say that the settings are convenient enough for me and I’ve not noticed any excessive paging and performance is fine.
  3. Networking – I elected to chose NAT as my virtual networking setup. This puts my virtual hosts on their own subnet, they access the internet as the non-virtual host and then the results are routed back by vmware. The other alternative might be to bridge virtual and real host but this, I think, would make my virtual host a peer of the real one on the domain. Something I want to avoid for now.
  4. Disk Size – I allocated 80Gb disks to each host. In practice I would probably only use about 10% of that space but WS6 does not allocate the space all at once. This means I only use as much space as I need. This obviously incurs an overhead because the disk is allocated on demand. However I wanted to see if this really causes a noticeable problem before I elect to allocate all the disk at once. Keeping the space to a minimum obviously has advantages for backup purposes but also (I would guess) mean that snapshots take less time.

Building The Host

Obviously I must now load an appropriate OS onto each of my virtual hosts. You do this by simply attaching an ISO to the machine and installing in the normal way. Once installed you should install the VMware tools that improve the integration between your host and guest OS. After installing the extensions (and not before) you should activate your license on Windows based-hosts.

Next you should apply all the updates that are necessary to make your host a good netizen, either through Microsoft Update or whatever package manager your OS uses.

It was at this point that I ran into an issue with Ubuntu which relates to the networking. By default Ubuntu 7.10 turns ‘roaming’ network mode on and this doesn’t play nice with WS6. So turning it off and selecting DHCP insted was needed.

Network Roaming Mode Is Bad

Once built and patched I created checkpoints for both systems to rollback to or clone. This will be useful when I need to create a ‘test’ host for deployment or system testing.

Stack ‘Em High

One of my complaints about developing for Windows is the height of the development stack. Not only do I have to install dev studio but I have to install a bunch of service packs, add-ins and 3rd party software, and their service packs, to have anything like a workable development system. The problem is, when something goes wrong and your installation gets knackered in some way (it’s happened to me twice in the last year) you can kiss goodbye to 1 days development time whilst you sort it out.

That’s why, once I had all the development tools installed, I then took another snapshot to record the base development install so I can revert to it in seconds rather than hours. It’s my intention that when I went to install a new software tool I will clone a snapshot, install the software and see what it does before I commit to installing it into the main disk image.

Shared Disks

One important feature of having a virtual host is being able to easily share data between the guest and host OS. The clipboard works fine for a lot of things but when you want to expose a 1Gb ISO from the host to the guest it becomes a bit painful. Sure you could FTP the files from one to the other but that would be a pain. What you really need is to expose a part of the host file system to your virtual machine. Under windows and WS6 this results in the creation of \\.host URI that is accessible from the guest windows OS.

However for Ubuntu it failed because the installation of the tools failed without me noticing!! It turns out there is a bug in one of the vmware headers that incorrectly sets which kernel API to use. This is one of the problems with Linux I guess in as much as vendors have to release source code drivers because of all the different versions the guest OS could be. However, because it’s source code it’s also easy to fix! By following the instructions on the Ubuntu forums I had the matter resolved in minutes.

Accessing Domain Hosts

There’s a couple of problems with interacting with other network hosts when using virtual hosts.

  1. If you’re developing on a shared network and you are unable to make your virtual machine a peer on the network then you will probably have chosen NAT networking. This, as discussed, puts you machines on their own subnet. The consequence of this is that you will need to start using qualified domain names to be able to access the hosts that are in the same domain as your non-virtual machine. This isn’t a huge discomfort and it’s arguably the right thing to do if you’re embedding those names into software that you’re writing anyway.
  2. The second problem relates to the use of Windows authentication. Applications that use windows authentication (like SQL Server Management Studio) will fail because your virtual machine is not part of the domain. One solution is again to make your virtual host a peer on the network. Whilst this would be the ideal it might create implications for how you’re going to manage a battery of virtual machines and might result in some loss of liberty because you’re system administrator got scared. The solution is to use a feature of windows networking that allows the user to manage multiple identities. This will allow you to enter a set of credentials for each machine that you will connect to. After making these changes applications that use Windows authentication work flawlessly.

Stored Passwords


After spending a week of development on virtual machines I like it. I don’t notice any performance lags on the widows host everything works beautifully. On Linux I occasionally have mouse problems where the mouse refuses to roll over a particular screen area. This needs to be reset by disconnecting control from the VM (Ctrl+Alt) and then clicking (to regain the control) and trying again. It’s possible that this problem is something to do with known mouse problems, perhaps more investigation is required.

The WS6 ribbon makes switching between machines and displays simple. The ribbon also makes using WS6 analgous to using remote desktop which feels natural.

WS6 Ribbon

It had been my original intention to run only my development needs inside the virtual machines and keep the real machine for my office needs like e-mail, spreadsheet and word processing. However that becomes awkward to juggle between and means that I sometimes miss e-mail or IMs when the development host is maximized over the top of the host desktop. Having said that though maybe not getting distracted is a good thing 😉

However, I’m starting to think that I should adjust the windows virtual machine RAM and install all of the office applications and IM software into the windows development host. Reducing the host OS to simply a shell for the VMware software. All in all though, I’d recommend all developers to try it out at least. It ticks all the boxes I wanted and after the setup and configuration it makes development a saner process. Which has to be good, right?


Virtual Host Development: Part 1

I got a new PC this week. Oooooh. It instantly presented me with a challenge:

Q. Is it possible to make full use of a standard mid-range specification development desktop host?
A. … errr … dunno …

As a developer I have long believed that we, me included, fool ourselves into purchasing new hardware and software that we don’t need. This constant upgrade cycle blind-sides us from separating what’s important from what is aesthetically pleasing. Since 2004 (when we approached the PC clock-speed ceiling) I can’t think of any advance in software or hardware that has really compelled me to want to go and buy something new. I discovered some time back that the biggest productivity bonus you can get from any developer is to give them two monitors (oh, and a graphics card that can support them too!). Giving those developers Vista or a bigger/faster machine won’t buy you much other less help-desk time supporting old kit and a better relationship with your hardware vendor.

The things that have changed since 2004 is the cheapness of RAM & disk and the addition of multi-core CPUs. Although 64 bit has also come online too that in itself only really gives you more addressable RAM. For the moment the circa 3Gb addressable in most 32 bit OS’s is fine for me and my development needs. This means that by default I get as much RAM as the OS can handle, 2 large disks and a multi-core CPU.

So, in the absence of a new OS to tax it the new hardware was begging for a thrashing and I thought knew just the thing. I would start doing virtual host development. That is, install a VMware style virtual host manager and run my development environment entirely inside a guest operating system. The host operating system, i.e. the real one, would be used for sending email, surfing, browsing documents and running the virtual host manager.

This way of working has some desirable consequences:

  1. Snapshots – Most virtual host managers allow the taking of snapshots. So the plan is to take a snapshot after each major milestone of the virtual host build. That is:
    • Right after the install (& patch) of the base OS
    • After the install of the standard development tools
    • After the install of personal development tools that I use

    Regardless of the OS you use your development environment will usually take you a fairly long time to build. This is because you have to load: IDEs, source control, useful editors and whatever else you fancy. The installation of the DevStudio stack alone can take the best part of a day. The ability to go back in time to anyone of those points will be very useful if something goes wrong. It’s not only that though, most virtual host managers will allow you to ‘clone’ a snapshot so that you can take your current environment, clone it, fire it up, install something wacky, and see if it works. If it doesn’t work as you expected then you live to fight another day.

  2. Security – I have never particularly liked installing my custom tools and utilities on my company’s hardware. It complicates matters when system administrator’s are needed to try and resolve problems. This is because, and rightly so, the first thing they’ll blame when something goes wrong is that piece of custom software I installed last week. Therefore, I think, at least in theory that I will no longer need administrator privileges on my own machine because I will have an administrator account on my virtual hosts. I’m not convinced of this view just yet, but I aim to demonstrate it here!
  3. Testing – Deployment of my developed code is always hard to test. It’s hard because you need a clean machine to work with to be able to prove that the test was indeed a success. With the aid of snapshots I can boot a clean machine and attempt a deployment to it. If it succeeds great, if it fails I just start again safe in the knowledge that I haven’t soiled my clean machine and the test is still a good one.
  4. OS Choice – I still think UNIX is far and away the most powerful and transparent OS, but that might be because I’ve been banging my head against it for a very long time. Anyway, I like having UNIX around and when I don’t have it it makes me kind-of sad and angry. Angry that I can’t: grep a file for a regex pattern, pipe it through gawk to get a particular column, pipe that through sort, then pipe that through uniq -c and pipe the whole lot through sort -g. How else would you answer the question of how can you do a:
    SELECT Value, COUNT(*) 
    WHERE Value LIKE '%XYZ%'
    GROUP BY Value 

    … on a flat file? On second thoughts don’t answer that.

In Part 2 I will look at what I discovered. The pitfalls of running a development host within a virtual host and more on why I really do love UNIX an awful lot.

(Ok perhaps I’ll leave that bit out :-))


The Free & The Damned

I sometimes wonder if I worked in a company that made software for others, instead of a company that makes software for itself, whether I would be a better: programmer, ventroliquist and lover. Ok, scratch the bit about the lover.

There are many potential reasons why this might be so but I want to focus on just two.

  1. When the software is the company’s business then as a developer you are closer to that business with all the benefits that brings. Essentially it’s the difference between supporting the business and being the business
  2. The second point is the related point that when software is the business, rather than for internal use, it is necessarily of a higher standard.

Item #2 is because your internal users pay for their software only indirectly. They don’t sign a EULA, and the cheques that they do sign are for salaries and aren’t budgeted in quite the same way. Our internal users will put up with sub-standard sofrware that people who have to sign a EULA and pay hard-green for will not.

The thing is that working for a company that doesn’t produce software as its primary function is still great fun. For much of what I’ve been doing for that last few years the business area I work in is only tangentially computer related. That is computers & technology are critical to the business because of the level of automation they provide not for any other reason. This makes the technology they use the means and not the ends.

This is both liberating and damning all at once. It’s liberating because I’m free to ignore decades of good software engineering practice if it is profitable and expedient to do so. Put trivially, if my software could directly generate revenues then time spent making it compile without warnings is time that it could be accumulating wealth. It’s damning because all those unfixed warnings are going to make an expensive mistake one day. I, as the programmer, have to choose where to draw the line.

Good quality software should be well-designed on very many levels: interaction, architecture, performance, etc. Anyone who buys software should demand good quality software or their money back. Therefore I would think (obviously I don’t know :-)) that being both free & damned isn’t necessarily a bad place to be. At least you’ll get the chance to make a bit of lucre while you’re free and when you’re in hell you’ll always have a job refactoring.


Constrained By Types (In Another Dimension)

I just read this. Which was interesting. I love the way that Steve has a simple point to make and spends 000s of words doing it, the posts always seem to ramble a bit (a little like mine!) but they’re usually fully of interesting tidbits and insights into software development so it’s usually worth spending 30 minutes or so on his issuances.

You can write C++ like straight C code if you like, using buffers and pointers and nary a user-defined type to be found. Or you can spend weeks agonizing over template metaprogramming with your peers, trying to force the type system to do something it’s just not powerful enough to express. Guess which group gets more actual work done? My bet would be the C coders. C++ helps them iron things out in sticky situations (e.g. data structures) where you need a little more structure around the public API, but for the most part they’re just moving data around and running algorithms, rather than trying to coerce their error-handling system to catch programmatic errors. It’s fun to try to make a bulletproof model, but their peers are making them look bad by actually deploying systems. In practice, trying to make an error-proof system is way more work than it’s worth.

This post raises a point that I hadn’t really considered before which is that perhaps we should consider static types to be a form of metadata, much like comments. The more static types you have the more constraining your model will be on the system that you’re creating. This is as it should be because that’s why you created those static types in the first place, right? But that model could just as well not exist. You could have created a system without all that new-fangled OOP crap and it might be a lot less complex. You could have the whole thing written in half-a-day and still be home in time for the footy.

A few years ago I was assigned to a trading system project that was to replace an existing legacy system. The existing trading system was single threaded, multi-user and suffered all sorts of performance & concurrency problems. One of it’s strengths, though, was that it was partly written in Tcl. Now Tcl isn’t one of the worlds greatest languages but it is dynamically typed and that gives it a certain flexibility. For instance, the shape and content of our core data was changing fairly constantly. Now, because that data was basically a bunch of name-value pairs inside the system it was possible to change the shape of the data in the system while it was running. I doubt that this ‘feature’ was ever consciously designed in to the legacy system from the beginning, but the flexibility and simplicity it gave was really very powerful.

When the replacement system came it was written in C++ and Java and had its own language embedded within it. It was a technical triumph in many ways and represented a tremendous leap forwards in many other ways. But the flexibility had gone because it would have taken extra effort to preserve that flexibility using statically typed languages. As a result the new system was faster and cleaner but more fragile and much harder to maintain. It was only when we started turning the legacy system off that it occurred to me that we had sacrificed something really quite important without really knowing it.

This flexibility, then, obviously has the downside that if our system is not very constrained it will most likely be a total mess. That was one of the drivers for replacing the legacy system in the first place. Moreover, this is especially likely to be true if there’s more than one person working on the system. Those static types clearly served a purpose after all in allowing us to categorise our thoughts about the behaviours of the system and then communicate them to others.

I suspect the solution is like almost everything, it’s a compromise. Experience will tell you when you need to constrain and when not to. Indeed, this is pretty close to the actual conclusion Steve comes to as well. In practice though I suspect that on large complex projects you would have to be very disciplined about the programming idioms you are going to use in the absence of those static types. It’s another dimension to the programming plane, and I don’t need to tell you that there are quite a few other dimensions already.


The Cracked Mirror

I remember reading somewhere (perhaps here but I can’t find the reference now) that the business of programming is the act of producing a simplified mirror-model of the business processes that it is trying to encapsulate.

This seems an intuitive statement. If I come to your company and make software to help your business it must capture at least some of the essence of what your business does. Indeed the programs that I and my colleagues write should in some way mirror the businesses that they belong to. If they don’t then we have to consider that I and my colleagues have probably failed.

In 1997 I entered the finance industry knowing almost nothing about finance. Pure green. So, eager to learn, I thought that if I looked at the code that I would be able to understand some of what was going on in the business. It turned out that this was as true as it was false. Yes the code mirrored the business but it turns out that that particular mirror was cracked. What I thought I understood about the business was distorted by irrelevant detail.

It’s obvious when I think about it now but the code that I was looking at had not been placed infront of me by an alien life-force (although some of the dudes were pretty strange) it had evolved. Code had been added to support business ventures that had subsequently been ended or even worse code had been added that was just plain wrong. In both these circumstances the users of the system compensated for the semantic gap between business and system, by doing what humans do best: working around the problem.

It seems then that this cracked mirror is inevitable because software decays. To really know what’s going on in your organisation you have to bang on doors. You have to ask the users the questions that make you look like Mr. Stupid. Only then can you build the model in your head of what is really going on.

What’s that you say? You’ve got business analysts? FIRE THEM! They don’t work people. I mean, yes they do work, but unless they are top-notch they create more problems than they solve.

Perhaps I’m preaching to the choir. Perhaps the choir went home. Perhaps I’ve been abducted by aliens and I’m still living in my 1997. Perhaps not.

linux programming

Caring For Your Environment

I’ve been thinking about where all my spare time goes recently because I just don’t have the time to do things I want to do, development-wise. In my life I have a couple of obvious (non-family related) time sinks like:

  • XBOX 360: Bioshock, Assain’s Creed, Gears of War, …
  • Physical Exercise

Now clearly physical exercise is unnecessary for a desk-jockey like me right? But there’s evidence to suggest that physical exercise might make me smarter. I need all the smarts I can get so I guess I’ll be continuing that for now.
What about gaming? Well I’ve been gaming since Jet Set Willy so I don’t think this is really a time luxury anymore. It’s now a defining character trait.

So that’s not given me very much wiggle room so I started to look more closely at what I actually do when I’m the big fat “H” in HCI. I discovered that I was spending some time on Project Euler which is definitely not time-wasted but is perhaps a little frivolous so I stopped it. But after that still no development work was getting done. Then I found I would spend a fair amount of time tending my development environment garden.

Gardening Again

Recent projects have included:

  • Switching my development machine from Gentoo to Ubuntu, ooo-ooo-ooo
  • Setting up SVN over SSH
  • Getting Emacs to provide true type font support
  • Upgrading Ubuntu to Gutsy for Compiz-Fusion support
  • Trying to get Gutsy to work with my f*cking ATI Radeon 9600 so I can actually use Compiz-Fusion
  • Trialling Lisp based tiling X window managers

And so it goes on. I can always think of something to do, and it’s very much like gardening I think. I admit that I haven’t really done much real gardening but when I did have a garden I found I could spend hours in it mowing, pruning, removing stones, painting fences, … you get the idea. The only difference is that with gardening, gardening and aesthetics are the objectives. The objectives of my development environment “gardening” are less clear. I’m clearly not getting very much productivity benefit from trying to get Compiz-Fusion to work, only that it makes me feel very powerful to be able to make an ATI graphics card work as expected with Linux.

What’s in your garden? Are the weeds ten feet high but you just can’t see them or could you submit it for a competition and win? Is this sort of OCD the preserve of the programmer or have I really lost it this time?


That Funny Nose Thing Again

In one of my previous jobs we had an unwritten rule that if you wanted to introduce a new programming language into the organisation you had to have a pretty good reason for it. When I joined that company around 1999 the company was using C/C++, Tcl and a little Java. By the time I had left they were using a lot of Java, a lot of Python (thanks, at least in part to me), a bunch more C++ and a steadily growing amount of C#. I wasn’t exactly responsible for the addition of the other languages but I contributed code to all of them I think.

Back then I decided that a small company should not adopt and keep that many technologies simultaneously within a single team without retiring some of the older code. From a company’s point of view it is in their interests to stop this in-house proliferation of tools & programming languages because it makes the code base both harder to support and harder to integrate. But it seems, to me at least that tools and programming languages are one part of the programmer condition. I just can’t get enough of them. They look shiny and new and full of promise. They are quite simply bewitching to me.

No one nose what I feel for you

Unlike Elizabeth Montgomery programming tools have little sex appeal and don’t do that funny nose thing. You know, the nose thing that makes everything better, when it all turns to shit at the end-of-the-show.

It is therefore in my company’s interests to pick languages & tools that are general purpose because it will reduce the possibility of tools proliferation later. But I know the drill. Hey, I practically wrote the drill. Find something I want to use, find a reason why I want to use it or why what we have now is deficient. Then bitch and moan until I get my way.

Sometimes though the benefits of a switch to a different tool or programming language can be compelling. Steve Yegge claimed last month that his MUD Wyvern has so many lines-of-code in Java that it is simply unsupportable by one tool/person and so he’s going to use Mozilla’s Rhino to reduce the LoCs. Yeah, that does sound like a good plan, but I think I’ll check back with Steve in 2010 to see how he’s getting on.

As I already mentioned in the “Towers of blub” I have been on a personal quest for about 1.5 years now to find a more powerful programming language. At the moment I have been learning Common Lisp. So it was that this week that my Tabitha nose picked up the strong scent of a new programming language gaining ground. The new player is Scala. I read a couple of blog posts about it, had a look at a tutorial a reference manual or two and was, as you Americans say, pretty stoked. I was thinking about when I was going to download it to see what it could do for me.

But then I was hit by a 10 foot wall of apathy. Whilst it’s interesting to be able to see as much of the language & tools landscape as is humanly possible I’m starting to wonder if it’s a very worthwhile use of my time. Perhaps I should stop evaluating all these different tools and languages and actually write some code. In fact if I was going to list over the years which technologies I’ve learned and subsequently forgotten, instead of coding, it would probably make quite a long list.

So I think I’ll do what Dan Weinreb’s going to do and just keep an eye on Scala to see what happens next. Now, since he is way smarter than me I reckon this is a pretty safe bet. BTW, I’ve tried it before people and this technique really does work. Pick someone whose opinion you respect (this is obviously never going to be a politician, a teacher or a member of law enforcement) and simply base your opinion on theirs. You don’t really need a great deal of rhetoric to back your arguments up just remember whose opinion you copied and come back to it later. So I’ll keep an eye on Scala and remember that I might need it one day and save the rest of my time for some more serious keyboard intercourse and beer.

Then I had the other slightly larger epiphany. More important, at least in hindisght, are not the tools and languages I use but the things that I do with those tools & languages. I’ll be more specific. The important things about what I do can be broken into technical and non-technical. The technical things like distributed computing, defensive coding, testing, multi-threading, relational databases and networking are important knowledge and experience that I draw on all the time and are language and tool independent. Those are the things that I really need to know to know how to program. But the things that make me (or would make me if I was any good at them!) really effective are the non-technical things like communication, interview and planning skills. I need to spend time working on developing all these other skills rather than finding the next new programming language or tool.

Bewitched first aired in the US on 17 September, 1964. A time when COBOL was pretty shiny and new. In 1991 I had to learn COBOL for my undergraduate degree. I have not used COBOL since the last programming assignment we did and I remember almost nothing about it. But I did learn something valuable from that assignment because it was the first time I had ever tried to produce a piece of software in a team.

So, it seems then that I really shouldn’t care very much, about what I have to create my solutions ‘in’, as long as I don’t have to use too many. I think that there’s ways round most programming language deficiencies, unless you use COBOL of course.

article finance programming python

Calculating peak-to-trough drawdown

Ok, so this is a little bit technical but it’s an intriguing puzzle that got me thinking quite hard. So here’s the problem. Sometimes investors want to be able to judge what the absolute worst case scenario would have been if they’d invested in something. Look at the following random graph of pretend asset prices:


You’ll see that there are two points on the graph (marked in red) where if you had invested at the first point and pulled out on the second point you would have the worst-case loss. This is the point of this analysis and is a way for investors in the asset to see how bad, ‘bad’ has really been in the past. Clearly past prices are not an indicator of future losses. 🙂

The upper one is the ‘peak’ and the lower one is the ‘trough’. Well, finding these two babys by eye is trivial. To do it reliably (and quickly) on a computer, is not that straight forward. Part of the problem is coming up with a consistent natural language of what you want your peak and trough to be. This took me some time. I believe what I really want is: the largest positive difference of high minus low where the low occurs after the high in time-order. This was the best I could do. This led to the first solution (in Python):

def drawdown(prices):
	maxi = 0
	mini = 0
	for i in range(len(prices))[1:]:
	   maxj = 0
	   max = 0
	   for j in range(i+1, len(prices)):
		if prices[i] - prices[j] > max:
		    maxj = j
		    max = prices[i] - prices[j]
	   if max > prices[maxi] - prices[mini]:
	   	maxi = i
		mini = maxj
	return (prices[maxi], navs[mini])

Now this solution is easy to explain. It’s what I have come to know as a ‘between’ analysis. I don’t know if that’s the proper term but it harks back to the days when I used to be a number-cruncher for some statisticians. The deal is relatively straight-forward: compare the fist item against every item after it in the list and store the largest positive difference. If this difference is also the largest seen in the data-set so far then make it the largest positive difference of all points. At the end you just return the two points you found. This is a natural way to solve the problem because it looks at all possible start points and assesses what the worst outcome would be.

The problem with this solution is that it has quadratic complexity. That is for any data-series of size N the best and worst case will result in N * N-1 iterations, in shorthand this is O(N^2). For small n this doesn’t really matter, but for any decently sized data-series this baby will be slow-as-molasses. The challenge then is to find an O(N) solution to the problem and to save-those-much-needed-cycles for something really important:

def drawdown(prices):
  prevmaxi = 0
  prevmini = 0
  maxi = 0

  for i in range(len(prices))[1:]:
    if prices[i] >= prices[maxi]:
      maxi = i
      # You can only determine the largest drawdown on a downward price!
      if (prices[maxi] - prices[i]) > (prices[prevmaxi] - prices[prevmini]):
	prevmaxi = maxi
	prevmini = i
      return (prices[prevmaxi], prices[prevmini])

This solution is a bit harder to explain. We move through the prices and the first part of the ‘if’ will find the highest part of the peak so far. However, the second part of the ‘if’ is where the magic happens. If the next value is less than the maximum then we see if this difference is larger than any previously encountered difference, if it is then this is our new peak-to-trough.

The purist in me likes that fact that the O(N) solution looks like easier code to understand than the O(N^2) solution. Although the O(N^2) solution is, I think, an easier concept to grapple with, when it’s translated into code it just doesn’t grok.

article programming

Time: the unseen global variable

Just about everyone knows that global variables need to be used sparingly. The more you use the more likely you are to capture complex state in places that are hard to maintain. Or something.

As well as all the globals you can see and measure there exists a shadowy league of ‘unseen’ globals in your programs. Some, like environment variables, are clearly designed as global variables and are desirable and understandable. However, some are wistful and ephemeral and dance round your program like wicked elves. Time is the biggest and most scarey of these elves.

For most programs you write time probably doesn’t matter, they are to all intents-and-purposes time-less. But as soon as you start entering the shadowy world of time, and the even more nebulous one of time-zones and daylight savings, a whole set of other state is being used. In my experience the programs and components that I have written that have been dependent on time have been some of the most complex to develop and maintain. This is for a variety of reasons but in summary:

time is not constant and can be interpreted in more than one way.

This leads to all manner of difficulties:

  1. Code that depends on the current system time ‘Now()’ and doesn’t pass it as a parameter is always going to be fragile. This is mostly because its behaviour can be non-deterministic unless you properly account for the fact that time is not-constant. This is especially important because your programs are susceptible to hard-to-spot boundary effects if you write expressions that use Now() more than once and depend on it returning the same value for each call. Which of course it never will.
  2. Time and date should never, ever, ever be separated from one another. You get all sorts of tricky errors when you split the two. Especially when you are performing some sort of time zone or daylight savings calculation where the two should change together but do not.
  3. Some programming languages represent the date (no time) as a date with a time of 00:00:00. Which is intuitive, but consider then what happens when you load a date (with no time) from a database in the past, when there were daylight savings, into a time when when there are no daylight savings. In the frame of reference of now your localised past time will now be an hour earlier and so will be in the final hour of the previous day. This problem clearly applies to timezones also but is because you made the mistake of not having a consistent view of time.
  4. Not only can the meaning of calendar time change after-the-fact (due to time-zones) but it can also be interpreted differently by different cultures.

There’s probably a lot of other time related pickles you can get yourself into. If Harold Lloyd was a programmer ...

You’d probably not be surprised to hear me say that unit-testing is one way of addressing at least some of these problems. This does two things. If you are to get good coverage for your unit-tests you are practically forced to make time a parameter wherever it’s used, instead of calling Now(). As a direct consequence of this your code can now be called ‘As Of’ and you will be able to offer the historical view where appropriate.

Indeed, I would say that where a piece of sotware has a time-context, then it will only be a matter of time before someone says: “Ok, that’s what it says today but what if I want to rerun it for that time in the past 3 weeks ago?”.

The time-zone and daylight savings problems can be nailed by having a consistent view on the treatment of time. For instance storing all dates/times as UTC is one thing. But if you ever need to store a local time then it should be clear what frame of reference is being used to store that time. So you might need to additionally know: the calendar, the timezone, and the daylight savings rules before you can correctly store a time.

Then and only then will time become your faithful and obedient friend.

article programming

Software Dream-ualisation

Ok, I admit it, by some measures I am a sad, sad individual. Why? Because sometimes I dream about programming. Now those that know me might be thinking that in my dreams I sit in amongst monster rigs hacking away at some monster problem.

I have been told that the best crackers in the world can do this under 60 minutes but unfortunately I need someone who can do this under 60 seconds. — Gabriel, from the movie Swordfish

Sadly the un-reality of my slumber is a little more prosaic and not like Swordfish at all. These dreams are always bizarrely specific programming tasks that would require a small amount of thought if I was awake but since I’m not conscious, they are a little harder.

Last night was different, I dreamt of nothing and woke to the sound of vomitting children (my own). Once that drama was resolved I couldn’t find a way to drift off to sleep again because for some strange reason I’d started thinking about software visualisation. I don’t even want to think about how I got onto that train of thought.

Anyway, the thought went something like this. Could software have colour? I couldn’t see why not. If software had colour then would it be useful? I reasoned that yes, it could be designed to be useful to give software a colour. For instance, colours could be assigned to code-patterns and this might ease in the understanding of that code. Since code is easier to write than to read this seemed like a worthwhile aim.

But then I thought perhaps a better visualisation would be to colour and orient objects on a plane based on the amount of messages that object issues / receives, or some other arbitary scheme. This sounded like a really rather jolly idea and I resolved to investigate it more fully in the morning and I promptly fell asleep.

When daylight arrived I found a link to this research that does something very similar to what I was dream-scribing but the site hadn’t been touched since 1998. Other research, whilst relevant, seemed similarly dormant. The most recent research I could find is here. It uses Vizz3D to turn software into ‘cities’ that can be navigated. This is indeed exciting stuff, even if it was done in C/C++.

It’s long since fascinated me that the world of software is a dynamic ever-shifting place but the tools with which we work on that software (especially for very large projects) don’t really help in trying to conceptualise that software. Indeed, the code most of us see in the maintenance phase is at a lower level of abstraction than that of the overall structure of that software and the structure can be very hard to see by just looking at the code.

Sure we can use various tools like profilers and coverage analysers to view different dimensions of the software plane but they are not the whole picture and compositing those analyses into a coherent whole is still not easy.

Fast forward ten years, perhaps DevStudio or Eclipse will ship with a project visualizer. The information transmitted in a single visualisation could save hours of code-grokking. It probably won’t change the world but it would be very, very, useful.

But perhaps in ten years we will have brains the size of water-melons and be able to program computers using only our minds (like in Firefox). I guess it’s time to go back to sleep now. Sweet dreams.