database lisp

cl-mysql v0.2

I am pleased to announce that cl-mysql 0.2 is ready for use!

Here are some of the highlights of v0.2

  • Connection pooling – Thread safe allocation and release of connections from a central pool.
  • Use result/Store result – Ability to use mysql_use_result as well as mysql_store_result. This means that CL-MYSQL should be able to handle the processing of very large datasets without running out of memory.
  • Convenience functions/macros – with-rows / nth-row

The main difference between v0.1 and v0.2 is that version 0.1 didn’t really manage its connections. I decided that allowing the user to choose between pooled and non-pooled connections is a hassle. Much better then to allow the user to create as many connection pools as they want and allow them to specify the maximum and minimum number of connections that the pool can hold. After all, a single connection is simply a special case of a pool with only one connection.

However, in theory this could hurt performance when attempting to do large number of INSERT/UPDATE’s because every call would require the connection pool to be locked and a connection to be aquired. This could be overcome though by making use of the fact that CL-MYSQL will correctly pass multiple statements to the server so you could concatenate a large string of updates and execute them all at once.

The good news though is that the API has changed only very slightly in the optional arguments it accepts. However I have changed the way the result data comes back from query. Because CL-MSQL returns multiple result sets it’s necessary to place all of them into a sequence. Additionally, I did not like the way I was placing the column headers into the first item of the result data. It means you always have to allow for it. I considered doing it the way that CLSQL does it by returning the column data in a value struct but I find this awkward to manage. This is because every layer of the API (and client code) must multiple-value-bind the columns out and either repackage them as a sequence or create a new value structure to pass them up the call-chain.

Therefore I have changed the result sequence structure to be as follows:

query-result ::= (<result-set>*)
result-set ::= (<result-data> <column-data>)
result-data ::= (<row>*) | <rows-affected>
row ::= (<column>*)
column-data ::= ((<column-name> <column-type> <column-flags>)*)

I appreciate that this is a little complex, I did consider turning the result data into a struct but this complicates how the user processes the data. For this reason I have added: with-rows and nth-row to simplify the processing of this result data.

Finally, the whole thing is still only SBCL/x86 Linux compatible, that might change :-).

More information is available here. As always, any feedback is appreciated.

database rant

BizTalk Server 2006 (a.k.a The Beast)

I don’t really like to repeat myself but I’ve found another example of code that isn’t. It has a name, and that name is evil BizTalk. I’ve wanted to write about BizTalk for a while but I’ve been holding back because it’s a large and complex product that I wanted to do proper justice to.

Ok, so let’s get the nasty out of the way first. BizTalk is partly the reason that I’ve done very few blog posts recently. It is also the cause of an annoying pain at the base of my skull that didn’t go away until I stopped ‘doing’ BizTalk. There are many reasons why and I could go on and on about:

  • how it neatly hides important detail from you that you need to problem solve issues;
  • cryptic error messages;
  • property boxes in property boxes in property boxes;
  • lack of effective development environment;
  • high license cost;
  • … yada yada yada …;

Ironically at the same time I was using BizTalk I was also reading the design of everyday things. Whilst not strictly a book about technology, the ubiquity of computers these days makes it a compelling read for programmers like me. Anyway, the point is that BizTalk very neatly violates almost everything that Donald Norman holds dear.

If it’s so bad then why are you using it?

Good question. You read my mind. You see it solves a couple of problems for me, firstly a very expensive software product requires it out of the box so I really have no choice. However, and here’s the interesting part, it does two “big picture” things very well. Indeed, if you ignore the nitty gritty pain of what you had to do to get it to work it solves a couple issues of system integration reasonably well.

  1. Message Handling – the BizTalk server wants to receive a message and do something with it. It is able to take messages from a variety of sources: database, file drop, FTP, HTTP, SOAP, Email, etc. Once this message is received it can be routed to an ‘Orchestration’ which takes the message and applies some logic to it. This can involve changing the shape of the message and or sending it on to a variety of other systems.
  2. Business Focus – the fact that BizTalk server can take messages from a variety of sources means that it becomes the natural place to put things that ‘process external data’. This doesn’t sound like a big deal but when you’ve worked in companies that have attempted this solution without something like BizTalk you’ll know what a mess this becomes. Many programmers, many ideas, little consistency, integration headaches.

So it’s good right?

Well yes and no. I can’t pretend that I’ve even scratched the surface of what BizTalk does, it’s a beast. However, for what I need to do I’ve created about a half a dozen or so orchestrations and a few mappings and every one was as painful as the last. It just doesn’t seem to get any easier.

You see, the bits of BizTalk I like, are relatively quite small. The bits I don’t like are the bits that try and take programming control away from me by providing me with some half-baked UI that ultimately is going to produce me a piece of executable code. And that’s my point. If it’s going to produce executable code why not just give me some powerful libraries (which must exist anyway) and let me write it? Indeed, I’ve had a couple of people tell me that they don’t really like BizTalk either and when faced with a ‘BizTalk challenge’ their solution is to write a custom pipeline to handle it (for those not in the know, this solution gets the job done but it’s like buying a tractor to get your groceries with).

The part of BizTalk that I would pay money for is probably not worth that much. However, I’m smart enough (just) to know that if I wanted to roll my own which just contained the features I wanted I would still be doing it in 2009. So for now, I guess, The Beast and I will get along fine. If there’s ever a viable alternative The Beast and I will be parting ways.

database unit testing

Database Unit Testing: It’s Tricky

I’ve been aware for a while that I really should be doing some database unit-testing along with the main unit-testing I’m already doing of the other code. With databases there’s a bunch of stuff I’d like to test apart from the logic inside any executable code:

  • Relationships – the database model should support certain types of queries, make sure that the more common ones work as needed
  • Query Execution Time – to verify that indices are present on SELECT’s and to monitor the cost of insertions and as a general (and fairly gross) monitor of performance
  • Arbitary Data Values – some data in the database is ‘special’. It’s always like that, it’s data that you don’t get from another source. It’s static data that makes your abstractions work. When it’s there everything is ok, when it’s gone you’re screwed
  • Constraints & Triggers – constraints and triggers occassionally get dropped when maintenance occurs when they don’t get put back things go south
  • Permissions – certain types of activity in a database should be prohibited ensure that these protections are in place for a class of users

There’s probably a lot more I could do too, but this will do to begin with. In the past I’ve spent significant investigative time hunting down problems that originated from some initial assumption being violated. Since I don’t like feeling violated, at least not on a Thursday, it seems like I should unit test what I can to get early warnings before the shit-storm hits.

So I did what any lazy, self-respecting developer would do I went looking for some code that someone else had written that I could steal. T-SQLUnit looked like the best match for my needs so I downloaded it and checked it out. Now, before I upset anyone I should say that T-SQLUnit is ok. But it suffers from a few fairly major drawbacks. There’s nothing wrong with TSQLUnit per-se it’s just that all database unit testing frameworks that are written in SQL are broken. Here’s a few lowlights:

  1. Results of expressions can not be inputs to stored procedures making traditional unit testing Assert statements awkward
  2. Catching all exceptions and reporting automatically that as a failure (ala jUnit) is messy requiring a BEGIN/END TRY around every test block

It’s the first one that makes life hard because all your assertions have to read something like:

IF @MyVar <> @TheirVar
     EXEC ts_failure 'My variable doesn't equal your variable'

When what you really want to write is something like:

EXEC ts_assert @MyVar <> @TheirVariable,  'My variable doesn't equal your variable'

I don’t know, perhaps I’m just not smart enough but I could not see anyway to make something like the above work without ending up with a syntax so verbose and ugly that even Eurocrats would balk a little. So bad that you might as well have just used a bunch of IF statements in the first place. Also, a direct consequence of not being able to have an ‘assert’ stored proc is that you can’t easily count the number of assertions you make to report them later. Now whilst this is just icing on the unit-test cake it’s still a nice feature to have and adds to the warm fuzz after a test run.

If that was hard then testing permissions related activities is next-to impossible. This is because your unit-testing framework is running as a particular user in a particular session. For you to be able to test particular permissions you might need to disconnect and reconnect as a different user. Well, it’s not impossible it’s just … a bit tricky

The obvious thing to do, then, is to completely give up on SQL unit testing frameworks and go back to your programming language of choice. As far as is possible you want to hide all the internals of dealing with the database and leave just the pieces you need to setup, run and teardown your tests. To do this I made a helper class to do all the heavy lifting by: connecting to the database, running any query, timing the execution, processing the results and storing them somewhere. Finally I made my class provide a function based query interface so that I could then write test code using NUnit style assertions against it. Creating this custom class took only a few hours. Once I’d created it I could hand all the testing framework stuff to my old-friend NUnit. This technique worked well for me and integrated nicely with my existing code tests.


The Code That Was Not There and the Badly Fitting Suit

There seems to be a common, not unfounded, view that programming is really hard. The response from tool vendors is to produce products that make programming simpler. They do this by introducing a user interface that ‘simplifies’ what would ordinarily be a programming task into something that mere mortals can understand and program without understanding too many of the fundamentals of the task at hand. I think the view that such tools succeed at what they set out do is broadly untrue.

Up until this week I had intimate experience of only one such tool and that was BusinessObjects. Now I’m going back a bit here but in 1998 BusinessObjects made a tool that allowed a designer to take your ugly enterprise database and turn it into ‘Business Objects’. These objects contain a little more ‘information’ about the data you were looking at and made it possible to compose those objects in ways of your choosing, but crucially, they did this with the aid of graphical tools that didn’t make it look like you were really writing SQL queries at all. This then, in-turn lets a common Jo-user compose a view of that data for a report or particular query they need without having to learn SQL. In concept the idea has tremendous power because you can do things that you could never do before without a lot of expensive IT expertise. The reality, for us at least, was a little different and this was I think for two reasons.

Unapparent Inefficiencies

A normal SQL programmer would know about the indices and relationship cardinalities on a table, hence they would know which joins would suck. The abstraction provided by BusinessObjects would happily hide those inefficiencies of the underlying representation. That made it it really easy to kill the database with a single mouse-click and drag. You can’t just blame Jo-user for this either, when you hide the inefficiencies I would not be surprised if a seasoned programmer would sometimes make the same mistakes that Jo did.

Apparent deficiencies

BusinessObjects, of the time, was a language in itself. Indeed the BusinessObjects language is in fact an abstraction of another much more general purpose language that we know as: SQL. Programming languages that are developed for a specific domain (or Domain-Specific Languages) tend to exhibit their own behaviour. They make some things easy at the expense of making other things less so. The trick, with these tools, is to make sure that all the things you make hard are things people don’t really do very often anyway. The problem for us, at the time, was that we were putting a clean layer of abstraction on-top of a not-so-great database design. BusinessObjects wasn’t able to cope very well with those apparent deficiencies and so we had to go hunting for workarounds to put into place until we could get back and refactor the pimple on our software arse away.

In the end the project limped on for a bit and then I got moved onto another task and I lost track of it. Perhaps it lives on, but I doubt it. This week I discovered that Microsoft have a related but different purpose tool: SQL Server Integration Services (SSIS). Apparently it’s been going on for years under a previous guise of DTS but I’d never seen-nor-heard of it before. Anyway, I was initially very excited when I started working with it, I really believed that I could take my complicated data import requirements and transform them into a few diagrams and have all the work done for me. Yeah right. The reality was somewhat different, and like our friend BusinessObjects, SSIS coped badly with the unapparent inefficiencies and the apparent deficiencies.

The conclusion is that tools like this are complex and that complexity includes, but is not limited to, a lot of the complexity of the lower-level underlying representation that they must use. Often then, it will be far better to just use a general purpose tool (like SQL or Perl) to get you your reports or data transformations done. Don’t mess around with some factory-made ill fitting suit if you think you can tailor one yourself for less. No matter how pretty the buttons are.

In the end I surmised that using gawk and bash I could produce something equivalent to what I wanted to do in Integration Services in a fraction of the time and hence cost. If I’d used Perl or Python I could have done it even better and/or faster. I had been hoodwinked my friends. Hoodwinked into thinking that I could write a program that was not there and discovered in the end that it was far easier to just knuckle down and make my own suit and have it fit nice, than use the pretty off-the-shelf one with the gaping crotch.

The problem, it seems, is that there is a market for tools that allow us to create code that seemingly isn’t. We still believe, as I did, that tools will somehow save the day. Dammit, why didn’t someone remind me earlier that there is still not, and never has and never will be any silver bullet.

database sql

rozenshtein method for pivoting relational data

I came across this blog entry while trying to make up for the fact that SQL server 8 does not have the PIVOT statement that I needed. It’s taken from a book (that I don’t have) by Rozenshtein on advanced database queries.
The blog explanation is lengthy and a little confusing but the idea is deceptively simple. It hinges around being able make a column expression that can ‘select’ data.

database sql

secondary school mathematics that was almost useful

For the first time in my career I came across the need to make use of a PRODUCT() SQL aggregate function today. To my dismay SQL Server 8 doesn’t have such a function and so I figured out it can be simulated by use of logarithims and the SUM() function (since (log10(a) + log10(b))^10 = a * b) Therefore my query could read:

SELECT POWER(10, SUM(LOG10(value)))
FROM table

But this query doesn’t work if there are negative numbers in the PRODUCT() list because LOG10(x) where x<0 is undefined. Guess what, there’s a bunch of negative numbers in my list. Arghh …