Fundamentally Lacking

A while’s back I wrote up my experiences of trying to learn Lisp through various books and I got this comment:

If you want to go down the rabbit hole however (and since you asked), here’s my advice: read SICP. If you’ve been programming in Algol-inspired languages for the last 15 years, your head will probably explode. It is worth it. Do the examples. Follow along with the videos: they contain all kinds of little nuggets of wisdom that may take time to sink in.

SICP, I thought, what the hell is that? It transpires that this book is, as they say in colloquial circles, like frikkin’ huge man. Huge in fame (it’s taught in a lot of places that I could only dream of attending when I was as an undergraduate) and, as it turns out, also huge in terms of sheer quantity of information inside it. So I ordered it on Amazon, waited a few weeks for it to arrive and then queued it up on my bookshelf.

About a month ago I started gently plodding my way through it, scratching my head, doing the exercises and all. It really is very good, but I have no idea how I would have done this as an introductory programming course at Uni. Let’s just say the exercises can be a little challenging, to me at least.

The amazing thing is I’ve only really just finished Chapter 1 and I feel like I’ve put my head through a wringer. So I’d really like it if you can come and visit me at the funny farm after I’ve done Chapter 5. Visiting hours are 9-5 and please don’t shout at the other patients it freaks them out. Anyway, today’s mind melt is Exercise 2.5 and it asks us to …

Exercise 2.5. Show that we can represent pairs of nonnegative integers using only numbers and arithmetic operations if we represent the pair a and b as the integer that is the product (2^a * 3^b). Give the corresponding definitions of the procedures cons, car, and cdr.

Up until now all the really hard problems had ‘Hints:’ in parentheses. That’s usually how I know when I’m going to be screwed. I wrote some numbers down, first on paper then in Egg-Smell but I just couldn’t see how I was going to extract two numbers from the encoding. Then, bizarrely, whilst reading about Church numerals (eek) I discovered that there’s a fundamental theorem of arithmetic that I never knew about. How does that happen? How did something that fundamental escape me all these years? Reading it was like being told by a German co-worker that the only time an apostrophe is appropriate in “it’s” is for a contraction of “it is”. Scheiße.

Not only is there a fundamental theorem but it suggests the answer to Exercise 2.5. Perhaps one day I’ll be able to program & punctuate. One day.


The Free & The Damned

I sometimes wonder if I worked in a company that made software for others, instead of a company that makes software for itself, whether I would be a better: programmer, ventroliquist and lover. Ok, scratch the bit about the lover.

There are many potential reasons why this might be so but I want to focus on just two.

  1. When the software is the company’s business then as a developer you are closer to that business with all the benefits that brings. Essentially it’s the difference between supporting the business and being the business
  2. The second point is the related point that when software is the business, rather than for internal use, it is necessarily of a higher standard.

Item #2 is because your internal users pay for their software only indirectly. They don’t sign a EULA, and the cheques that they do sign are for salaries and aren’t budgeted in quite the same way. Our internal users will put up with sub-standard sofrware that people who have to sign a EULA and pay hard-green for will not.

The thing is that working for a company that doesn’t produce software as its primary function is still great fun. For much of what I’ve been doing for that last few years the business area I work in is only tangentially computer related. That is computers & technology are critical to the business because of the level of automation they provide not for any other reason. This makes the technology they use the means and not the ends.

This is both liberating and damning all at once. It’s liberating because I’m free to ignore decades of good software engineering practice if it is profitable and expedient to do so. Put trivially, if my software could directly generate revenues then time spent making it compile without warnings is time that it could be accumulating wealth. It’s damning because all those unfixed warnings are going to make an expensive mistake one day. I, as the programmer, have to choose where to draw the line.

Good quality software should be well-designed on very many levels: interaction, architecture, performance, etc. Anyone who buys software should demand good quality software or their money back. Therefore I would think (obviously I don’t know :-)) that being both free & damned isn’t necessarily a bad place to be. At least you’ll get the chance to make a bit of lucre while you’re free and when you’re in hell you’ll always have a job refactoring.


Constrained By Types (In Another Dimension)

I just read this. Which was interesting. I love the way that Steve has a simple point to make and spends 000s of words doing it, the posts always seem to ramble a bit (a little like mine!) but they’re usually fully of interesting tidbits and insights into software development so it’s usually worth spending 30 minutes or so on his issuances.

You can write C++ like straight C code if you like, using buffers and pointers and nary a user-defined type to be found. Or you can spend weeks agonizing over template metaprogramming with your peers, trying to force the type system to do something it’s just not powerful enough to express. Guess which group gets more actual work done? My bet would be the C coders. C++ helps them iron things out in sticky situations (e.g. data structures) where you need a little more structure around the public API, but for the most part they’re just moving data around and running algorithms, rather than trying to coerce their error-handling system to catch programmatic errors. It’s fun to try to make a bulletproof model, but their peers are making them look bad by actually deploying systems. In practice, trying to make an error-proof system is way more work than it’s worth.

This post raises a point that I hadn’t really considered before which is that perhaps we should consider static types to be a form of metadata, much like comments. The more static types you have the more constraining your model will be on the system that you’re creating. This is as it should be because that’s why you created those static types in the first place, right? But that model could just as well not exist. You could have created a system without all that new-fangled OOP crap and it might be a lot less complex. You could have the whole thing written in half-a-day and still be home in time for the footy.

A few years ago I was assigned to a trading system project that was to replace an existing legacy system. The existing trading system was single threaded, multi-user and suffered all sorts of performance & concurrency problems. One of it’s strengths, though, was that it was partly written in Tcl. Now Tcl isn’t one of the worlds greatest languages but it is dynamically typed and that gives it a certain flexibility. For instance, the shape and content of our core data was changing fairly constantly. Now, because that data was basically a bunch of name-value pairs inside the system it was possible to change the shape of the data in the system while it was running. I doubt that this ‘feature’ was ever consciously designed in to the legacy system from the beginning, but the flexibility and simplicity it gave was really very powerful.

When the replacement system came it was written in C++ and Java and had its own language embedded within it. It was a technical triumph in many ways and represented a tremendous leap forwards in many other ways. But the flexibility had gone because it would have taken extra effort to preserve that flexibility using statically typed languages. As a result the new system was faster and cleaner but more fragile and much harder to maintain. It was only when we started turning the legacy system off that it occurred to me that we had sacrificed something really quite important without really knowing it.

This flexibility, then, obviously has the downside that if our system is not very constrained it will most likely be a total mess. That was one of the drivers for replacing the legacy system in the first place. Moreover, this is especially likely to be true if there’s more than one person working on the system. Those static types clearly served a purpose after all in allowing us to categorise our thoughts about the behaviours of the system and then communicate them to others.

I suspect the solution is like almost everything, it’s a compromise. Experience will tell you when you need to constrain and when not to. Indeed, this is pretty close to the actual conclusion Steve comes to as well. In practice though I suspect that on large complex projects you would have to be very disciplined about the programming idioms you are going to use in the absence of those static types. It’s another dimension to the programming plane, and I don’t need to tell you that there are quite a few other dimensions already.

database unit testing

Database Unit Testing: It’s Tricky

I’ve been aware for a while that I really should be doing some database unit-testing along with the main unit-testing I’m already doing of the other code. With databases there’s a bunch of stuff I’d like to test apart from the logic inside any executable code:

  • Relationships – the database model should support certain types of queries, make sure that the more common ones work as needed
  • Query Execution Time – to verify that indices are present on SELECT’s and to monitor the cost of insertions and as a general (and fairly gross) monitor of performance
  • Arbitary Data Values – some data in the database is ‘special’. It’s always like that, it’s data that you don’t get from another source. It’s static data that makes your abstractions work. When it’s there everything is ok, when it’s gone you’re screwed
  • Constraints & Triggers – constraints and triggers occassionally get dropped when maintenance occurs when they don’t get put back things go south
  • Permissions – certain types of activity in a database should be prohibited ensure that these protections are in place for a class of users

There’s probably a lot more I could do too, but this will do to begin with. In the past I’ve spent significant investigative time hunting down problems that originated from some initial assumption being violated. Since I don’t like feeling violated, at least not on a Thursday, it seems like I should unit test what I can to get early warnings before the shit-storm hits.

So I did what any lazy, self-respecting developer would do I went looking for some code that someone else had written that I could steal. T-SQLUnit looked like the best match for my needs so I downloaded it and checked it out. Now, before I upset anyone I should say that T-SQLUnit is ok. But it suffers from a few fairly major drawbacks. There’s nothing wrong with TSQLUnit per-se it’s just that all database unit testing frameworks that are written in SQL are broken. Here’s a few lowlights:

  1. Results of expressions can not be inputs to stored procedures making traditional unit testing Assert statements awkward
  2. Catching all exceptions and reporting automatically that as a failure (ala jUnit) is messy requiring a BEGIN/END TRY around every test block

It’s the first one that makes life hard because all your assertions have to read something like:

IF @MyVar <> @TheirVar
     EXEC ts_failure 'My variable doesn't equal your variable'

When what you really want to write is something like:

EXEC ts_assert @MyVar <> @TheirVariable,  'My variable doesn't equal your variable'

I don’t know, perhaps I’m just not smart enough but I could not see anyway to make something like the above work without ending up with a syntax so verbose and ugly that even Eurocrats would balk a little. So bad that you might as well have just used a bunch of IF statements in the first place. Also, a direct consequence of not being able to have an ‘assert’ stored proc is that you can’t easily count the number of assertions you make to report them later. Now whilst this is just icing on the unit-test cake it’s still a nice feature to have and adds to the warm fuzz after a test run.

If that was hard then testing permissions related activities is next-to impossible. This is because your unit-testing framework is running as a particular user in a particular session. For you to be able to test particular permissions you might need to disconnect and reconnect as a different user. Well, it’s not impossible it’s just … a bit tricky

The obvious thing to do, then, is to completely give up on SQL unit testing frameworks and go back to your programming language of choice. As far as is possible you want to hide all the internals of dealing with the database and leave just the pieces you need to setup, run and teardown your tests. To do this I made a helper class to do all the heavy lifting by: connecting to the database, running any query, timing the execution, processing the results and storing them somewhere. Finally I made my class provide a function based query interface so that I could then write test code using NUnit style assertions against it. Creating this custom class took only a few hours. Once I’d created it I could hand all the testing framework stuff to my old-friend NUnit. This technique worked well for me and integrated nicely with my existing code tests.