database - hackinghat.com

cl-mysql v0.2

Steve Knight — Mon, 27 Apr 2009 06:35:11 +0000

I am pleased to announce that cl-mysql 0.2 is ready for use!

Here are some of the highlights of v0.2

Connection pooling – Thread safe allocation and release of connections from a central pool.
Use result/Store result – Ability to use mysql_use_result as well as mysql_store_result. This means that CL-MYSQL should be able to handle the processing of very large datasets without running out of memory.
Convenience functions/macros – with-rows / nth-row

The main difference between v0.1 and v0.2 is that version 0.1 didn’t really manage its connections. I decided that allowing the user to choose between pooled and non-pooled connections is a hassle. Much better then to allow the user to create as many connection pools as they want and allow them to specify the maximum and minimum number of connections that the pool can hold. After all, a single connection is simply a special case of a pool with only one connection.

However, in theory this could hurt performance when attempting to do large number of INSERT/UPDATE’s because every call would require the connection pool to be locked and a connection to be aquired. This could be overcome though by making use of the fact that CL-MYSQL will correctly pass multiple statements to the server so you could concatenate a large string of updates and execute them all at once.

The good news though is that the API has changed only very slightly in the optional arguments it accepts. However I have changed the way the result data comes back from query. Because CL-MSQL returns multiple result sets it’s necessary to place all of them into a sequence. Additionally, I did not like the way I was placing the column headers into the first item of the result data. It means you always have to allow for it. I considered doing it the way that CLSQL does it by returning the column data in a value struct but I find this awkward to manage. This is because every layer of the API (and client code) must multiple-value-bind the columns out and either repackage them as a sequence or create a new value structure to pass them up the call-chain.

Therefore I have changed the result sequence structure to be as follows:

query-result ::= (*)
result-set ::= ( )
result-data ::= (*) | 
row ::= (*)
column-data ::= ((  )*)

I appreciate that this is a little complex, I did consider turning the result data into a struct but this complicates how the user processes the data. For this reason I have added: with-rows and nth-row to simplify the processing of this result data.

Finally, the whole thing is still only SBCL/x86 Linux compatible, that might change :-).

More information is available here. As always, any feedback is appreciated.

The post cl-mysql v0.2 first appeared on hackinghat.com.

BizTalk Server 2006 (a.k.a The Beast)

Steve Knight — Fri, 12 Sep 2008 06:57:19 +0000

I don’t really like to repeat myself but I’ve found another example of code that isn’t. It has a name, and that name is ~~evil~~ BizTalk. I’ve wanted to write about BizTalk for a while but I’ve been holding back because it’s a large and complex product that I wanted to do proper justice to.

Ok, so let’s get the nasty out of the way first. BizTalk is partly the reason that I’ve done very few blog posts recently. It is also the cause of an annoying pain at the base of my skull that didn’t go away until I stopped ‘doing’ BizTalk. There are many reasons why and I could go on and on about:

how it neatly hides important detail from you that you need to problem solve issues;
cryptic error messages;
property boxes in property boxes in property boxes;
lack of effective development environment;
high license cost;
… yada yada yada …;

Ironically at the same time I was using BizTalk I was also reading the design of everyday things. Whilst not strictly a book about technology, the ubiquity of computers these days makes it a compelling read for programmers like me. Anyway, the point is that BizTalk very neatly violates almost everything that Donald Norman holds dear.

If it’s so bad then why are you using it?

Good question. You read my mind. You see it solves a couple of problems for me, firstly a very expensive software product requires it out of the box so I really have no choice. However, and here’s the interesting part, it does two “big picture” things very well. Indeed, if you ignore the nitty gritty pain of what you had to do to get it to work it solves a couple issues of system integration reasonably well.

Message Handling – the BizTalk server wants to receive a message and do something with it. It is able to take messages from a variety of sources: database, file drop, FTP, HTTP, SOAP, Email, etc. Once this message is received it can be routed to an ‘Orchestration’ which takes the message and applies some logic to it. This can involve changing the shape of the message and or sending it on to a variety of other systems.
Business Focus – the fact that BizTalk server can take messages from a variety of sources means that it becomes the natural place to put things that ‘process external data’. This doesn’t sound like a big deal but when you’ve worked in companies that have attempted this solution without something like BizTalk you’ll know what a mess this becomes. Many programmers, many ideas, little consistency, integration headaches.

So it’s good right?

Well yes and no. I can’t pretend that I’ve even scratched the surface of what BizTalk does, it’s a beast. However, for what I need to do I’ve created about a half a dozen or so orchestrations and a few mappings and every one was as painful as the last. It just doesn’t seem to get any easier.

You see, the bits of BizTalk I like, are relatively quite small. The bits I don’t like are the bits that try and take programming control away from me by providing me with some half-baked UI that ultimately is going to produce me a piece of executable code. And that’s my point. If it’s going to produce executable code why not just give me some powerful libraries (which must exist anyway) and let me write it? Indeed, I’ve had a couple of people tell me that they don’t really like BizTalk either and when faced with a ‘BizTalk challenge’ their solution is to write a custom pipeline to handle it (for those not in the know, this solution gets the job done but it’s like buying a tractor to get your groceries with).

The part of BizTalk that I would pay money for is probably not worth that much. However, I’m smart enough (just) to know that if I wanted to roll my own which just contained the features I wanted I would still be doing it in 2009. So for now, I guess, The Beast and I will get along fine. If there’s ever a viable alternative The Beast and I will be parting ways.

The post BizTalk Server 2006 (a.k.a The Beast) first appeared on hackinghat.com.

Database Unit Testing: It’s Tricky

Steve Knight — Thu, 07 Feb 2008 22:11:57 +0000

I’ve been aware for a while that I really should be doing some database unit-testing along with the main unit-testing I’m already doing of the other code. With databases there’s a bunch of stuff I’d like to test apart from the logic inside any executable code:

Relationships – the database model should support certain types of queries, make sure that the more common ones work as needed
Query Execution Time – to verify that indices are present on SELECT’s and to monitor the cost of insertions and as a general (and fairly gross) monitor of performance
Arbitary Data Values – some data in the database is ‘special’. It’s always like that, it’s data that you don’t get from another source. It’s static data that makes your abstractions work. When it’s there everything is ok, when it’s gone you’re screwed
Constraints & Triggers – constraints and triggers occassionally get dropped when maintenance occurs when they don’t get put back things go south
Permissions – certain types of activity in a database should be prohibited ensure that these protections are in place for a class of users

There’s probably a lot more I could do too, but this will do to begin with. In the past I’ve spent significant investigative time hunting down problems that originated from some initial assumption being violated. Since I don’t like feeling violated, at least not on a Thursday, it seems like I should unit test what I can to get early warnings before the shit-storm hits.

So I did what any lazy, self-respecting developer would do I went looking for some code that someone else had written that I could steal. T-SQLUnit looked like the best match for my needs so I downloaded it and checked it out. Now, before I upset anyone I should say that T-SQLUnit is ok. But it suffers from a few fairly major drawbacks. There’s nothing wrong with TSQLUnit per-se it’s just that all database unit testing frameworks that are written in SQL are broken. Here’s a few lowlights:

Results of expressions can not be inputs to stored procedures making traditional unit testing Assert statements awkward
Catching all exceptions and reporting automatically that as a failure (ala jUnit) is messy requiring a BEGIN/END TRY around every test block

It’s the first one that makes life hard because all your assertions have to read something like:

IF @MyVar <> @TheirVar
THEN
     EXEC ts_failure 'My variable doesn't equal your variable'
END

When what you really want to write is something like:

EXEC ts_assert @MyVar <> @TheirVariable,  'My variable doesn't equal your variable'

I don’t know, perhaps I’m just not smart enough but I could not see anyway to make something like the above work without ending up with a syntax so verbose and ugly that even Eurocrats would balk a little. So bad that you might as well have just used a bunch of IF statements in the first place. Also, a direct consequence of not being able to have an ‘assert’ stored proc is that you can’t easily count the number of assertions you make to report them later. Now whilst this is just icing on the unit-test cake it’s still a nice feature to have and adds to the warm fuzz after a test run.

If that was hard then testing permissions related activities is next-to impossible. This is because your unit-testing framework is running as a particular user in a particular session. For you to be able to test particular permissions you might need to disconnect and reconnect as a different user. Well, it’s not impossible it’s just … a bit tricky

The obvious thing to do, then, is to completely give up on SQL unit testing frameworks and go back to your programming language of choice. As far as is possible you want to hide all the internals of dealing with the database and leave just the pieces you need to setup, run and teardown your tests. To do this I made a helper class to do all the heavy lifting by: connecting to the database, running any query, timing the execution, processing the results and storing them somewhere. Finally I made my class provide a function based query interface so that I could then write test code using NUnit style assertions against it. Creating this custom class took only a few hours. Once I’d created it I could hand all the testing framework stuff to my old-friend NUnit. This technique worked well for me and integrated nicely with my existing code tests.

The post Database Unit Testing: It’s Tricky first appeared on hackinghat.com.

The Code That Was Not There and the Badly Fitting Suit

Steve Knight — Fri, 23 Nov 2007 05:53:02 +0000

There seems to be a common, not unfounded, view that programming is really hard. The response from tool vendors is to produce products that make programming simpler. They do this by introducing a user interface that ‘simplifies’ what would ordinarily be a programming task into something that mere mortals can understand and program without understanding too many of the fundamentals of the task at hand. I think the view that such tools succeed at what they set out do is broadly untrue.

Up until this week I had intimate experience of only one such tool and that was BusinessObjects. Now I’m going back a bit here but in 1998 BusinessObjects made a tool that allowed a designer to take your ugly enterprise database and turn it into ‘Business Objects’. These objects contain a little more ‘information’ about the data you were looking at and made it possible to compose those objects in ways of your choosing, but crucially, they did this with the aid of graphical tools that didn’t make it look like you were really writing SQL queries at all. This then, in-turn lets a common Jo-user compose a view of that data for a report or particular query they need without having to learn SQL. In concept the idea has tremendous power because you can do things that you could never do before without a lot of expensive IT expertise. The reality, for us at least, was a little different and this was I think for two reasons.

Unapparent Inefficiencies

A normal SQL programmer would know about the indices and relationship cardinalities on a table, hence they would know which joins would suck. The abstraction provided by BusinessObjects would happily hide those inefficiencies of the underlying representation. That made it it really easy to kill the database with a single mouse-click and drag. You can’t just blame Jo-user for this either, when you hide the inefficiencies I would not be surprised if a seasoned programmer would sometimes make the same mistakes that Jo did.

Apparent deficiencies

BusinessObjects, of the time, was a language in itself. Indeed the BusinessObjects language is in fact an abstraction of another much more general purpose language that we know as: SQL. Programming languages that are developed for a specific domain (or Domain-Specific Languages) tend to exhibit their own behaviour. They make some things easy at the expense of making other things less so. The trick, with these tools, is to make sure that all the things you make hard are things people don’t really do very often anyway. The problem for us, at the time, was that we were putting a clean layer of abstraction on-top of a not-so-great database design. BusinessObjects wasn’t able to cope very well with those apparent deficiencies and so we had to go hunting for workarounds to put into place until we could get back and refactor the pimple on our software arse away.

In the end the project limped on for a bit and then I got moved onto another task and I lost track of it. Perhaps it lives on, but I doubt it. This week I discovered that Microsoft have a related but different purpose tool: SQL Server Integration Services (SSIS). Apparently it’s been going on for years under a previous guise of DTS but I’d never seen-nor-heard of it before. Anyway, I was initially very excited when I started working with it, I really believed that I could take my complicated data import requirements and transform them into a few diagrams and have all the work done for me. Yeah right. The reality was somewhat different, and like our friend BusinessObjects, SSIS coped badly with the unapparent inefficiencies and the apparent deficiencies.

The conclusion is that tools like this are complex and that complexity includes, but is not limited to, a lot of the complexity of the lower-level underlying representation that they must use. Often then, it will be far better to just use a general purpose tool (like SQL or Perl) to get you your reports or data transformations done. Don’t mess around with some factory-made ill fitting suit if you think you can tailor one yourself for less. No matter how pretty the buttons are.

In the end I surmised that using gawk and bash I could produce something equivalent to what I wanted to do in Integration Services in a fraction of the time and hence cost. If I’d used Perl or Python I could have done it even better and/or faster. I had been hoodwinked my friends. Hoodwinked into thinking that I could write a program that was not there and discovered in the end that it was far easier to just knuckle down and make my own suit and have it fit nice, than use the pretty off-the-shelf one with the gaping crotch.

The problem, it seems, is that there is a market for tools that allow us to create code that seemingly isn’t. We still believe, as I did, that tools will somehow save the day. Dammit, why didn’t someone remind me earlier that there is still not, and never has and never will be any silver bullet.

The post The Code That Was Not There and the Badly Fitting Suit first appeared on hackinghat.com.

rozenshtein method for pivoting relational data

Steve Knight — Tue, 21 Nov 2006 07:27:19 +0000

I came across this blog entry while trying to make up for the fact that SQL server 8 does not have the PIVOT statement that I needed. It’s taken from a book (that I don’t have) by Rozenshtein on advanced database queries.
The blog explanation is lengthy and a little confusing but the idea is deceptively simple. It hinges around being able make a column expression that can ‘select’ data.

Making a numeric column select data in a row is reasonably straightforward. You need a column expression where the data in the row will return zero if it is non-matching and 1 if it is matching. You can then simply multiply the column value by the expression.

The particular problem I needed to solve was how to present a year report where each column contained the total number of transactions done in a month.

Therefore I had columns like ‘Now’, ‘1 month ago’, ‘2 months ago’, …, ‘5 months ago’

So it seems like I need to work out for each piece of data which month it falls in. This was easy:

 
SELECT MONTH(date) 
FROM table

Now all I have to do is to figure out the distance from this month to now for the last year. So something like:

SELECT DATEDIFF(month, date, GETDATE()) 
FROM TABLE 
WHERE date >= DATEADD(month, -11, date)

So how does this help? Well all rows in my table for the current month and year will return zero, all preceeding months will return a positive number of months from now. What we’d rather have though is a function that returns 1 for this month and zero for all others. If we could reduce this to returning a 0 for the current month and a 1 for the other months then we’d only need to invert this to get the function we need. We can do this by taking the 1-ABS(SIGN()) of our DATEDIFF() result. The 1-ABS(SIGN()) term is going to make everything that is the current month return 1 and everything that is a different month return zero. This gives us exactly what we need because we can then multiply this result by whatever quantity we like to select it (i.e. 1) or remove it (i.e. 0). You can see this at work in the following table:


x SIGN(x) ABS(x) 1-x
-2 -1 1 0
-1 -1 1 0
0 0 0 1
1 1 1 0
2 1 1 0

So now we just need to take that and apply it to all the months we want to pivot for. Of course we’re aggregating data here so we’re going to additionally have to group by the month and sum it to get the answer we want. So, say we want 5 months worth, the final SQL looks like this:

SELECT SUM(val * 1-ABS(SIGN(DATEDIFF(month, date, GETDATE()) - 0)) Now
      SUM(val * 1-ABS(SIGN(DATEDIFF(month, date, GETDATE()) - 1)) Month1
      SUM(val * 1-ABS(SIGN(DATEDIFF(month, date, GETDATE()) - 2)) Month2
      SUM(val * 1-ABS(SIGN(DATEDIFF(month, date, GETDATE()) - 3)) Month3
      SUM(val * 1-ABS(SIGN(DATEDIFF(month, date, GETDATE()) - 4)) Month4
      SUM(val * 1-ABS(SIGN(DATEDIFF(month, date, GETDATE()) - 5)) Month5
FROM TABLE 
WHERE date >= DATEADD(month, -11, date)
GROUP BY MONTH(date)

That’s it. Note that the only offensive thing about this is that the column names aren’t very descriptive. I had the benefit of doing this in SQL server 8 so I could build queries inside the stored procedure and build descriptive column names like ‘Jan’, ‘Feb’, ‘Mar’, …. On the whole this is usually a bad thing to do inside a stored procedure because it means that the stored procedure can’t be fully compiled until it is run which is one of the reasons for using stored procedures in the first place! If performance was a consideration and this stored procedure was being called a lot I would probably choose to have the client application rename the columns.

The post rozenshtein method for pivoting relational data first appeared on hackinghat.com.

secondary school mathematics that was almost useful

Steve Knight — Fri, 17 Nov 2006 09:12:18 +0000

For the first time in my career I came across the need to make use of a PRODUCT() SQL aggregate function today. To my dismay SQL Server 8 doesnâ€t have such a function and so I figured out it can be simulated by use of logarithims and the SUM() function (since (log10(a) + log10(b))^10 = a * b) Therefore my query could read:

SELECT POWER(10, SUM(LOG10(value)))
FROM table

But this query doesnâ€t work if there are negative numbers in the PRODUCT() list because LOG10(x) where x<0 is undefined. Guess what, there’s a bunch of negative numbers in my list. Arghh … So this complicates matters a bit. But not very much. We can simply do the same as before but now on the ABS() value of the number but additionally count the number of times we have -ve numbers and then if that count is odd then the result is negative. Otherwise it’s positive. So to do this we will need to:

SELECT COUNT(CASE WHEN CONVERT(INT,SIGN(value))<0 THEN 1 ELSE NULL END)
FROM table

Which will give us a count of the number of negative values. If we take the modulo two of this number then we will get a zero if the overall result should be positive (i.e. it's even) and a one if it should be negative (i.e. it's odd). The rest is easy. Since, for y=-2x+1 then when x=0 or x=1 we will have 1 and -1 which we can multiply our result by to get the correct sign! Here is the full query:

SELECT POWER(10, SUM(LOG10(ABS(value)))) *
(-2 * (COUNT(CASE WHEN CONVERT(INT,SIGN(value))<0 THEN 1 ELSE NULL END) % 2) + 1)
FROM table

I wish I could thank my secondary school mathematics teacher Mrs Bishop for this mathematical insight. But unfortunately she was crap, so I won't. Finally, why was it only almost useful? Well ingenious as I think it is it's also a bit twisted. A lot of clarity has been sacrificed for the sake of a performance gain and this isn't usually good. If the tables were very large then it might be worth using the above instead of the cursor that I was trying to replace. But only then. Which begs the question why don't SQL vendors supply a PRODUCT() function? Got to be almost a cut and paste job. I see that in SQL Server 2005 one can make one's own aggregates via the CLR. I almost like this idea but I also fear all manner of nasty / filthy code being inserted into the server. How long before someone writes a SQL Server 2005 virus?

The post secondary school mathematics that was almost useful first appeared on hackinghat.com.