rozenshtein method for pivoting relational data

Steve Knight — Tue, 21 Nov 2006 07:27:19 +0000

I came across this blog entry while trying to make up for the fact that SQL server 8 does not have the PIVOT statement that I needed. It’s taken from a book (that I don’t have) by Rozenshtein on advanced database queries.
The blog explanation is lengthy and a little confusing but the idea is deceptively simple. It hinges around being able make a column expression that can ‘select’ data.

Making a numeric column select data in a row is reasonably straightforward. You need a column expression where the data in the row will return zero if it is non-matching and 1 if it is matching. You can then simply multiply the column value by the expression.

The particular problem I needed to solve was how to present a year report where each column contained the total number of transactions done in a month.

Therefore I had columns like ‘Now’, ‘1 month ago’, ‘2 months ago’, …, ‘5 months ago’

So it seems like I need to work out for each piece of data which month it falls in. This was easy:

 
SELECT MONTH(date) 
FROM table

Now all I have to do is to figure out the distance from this month to now for the last year. So something like:

SELECT DATEDIFF(month, date, GETDATE()) 
FROM TABLE 
WHERE date >= DATEADD(month, -11, date)

So how does this help? Well all rows in my table for the current month and year will return zero, all preceeding months will return a positive number of months from now. What we’d rather have though is a function that returns 1 for this month and zero for all others. If we could reduce this to returning a 0 for the current month and a 1 for the other months then we’d only need to invert this to get the function we need. We can do this by taking the 1-ABS(SIGN()) of our DATEDIFF() result. The 1-ABS(SIGN()) term is going to make everything that is the current month return 1 and everything that is a different month return zero. This gives us exactly what we need because we can then multiply this result by whatever quantity we like to select it (i.e. 1) or remove it (i.e. 0). You can see this at work in the following table:


x SIGN(x) ABS(x) 1-x
-2 -1 1 0
-1 -1 1 0
0 0 0 1
1 1 1 0
2 1 1 0

So now we just need to take that and apply it to all the months we want to pivot for. Of course we’re aggregating data here so we’re going to additionally have to group by the month and sum it to get the answer we want. So, say we want 5 months worth, the final SQL looks like this:

SELECT SUM(val * 1-ABS(SIGN(DATEDIFF(month, date, GETDATE()) - 0)) Now
      SUM(val * 1-ABS(SIGN(DATEDIFF(month, date, GETDATE()) - 1)) Month1
      SUM(val * 1-ABS(SIGN(DATEDIFF(month, date, GETDATE()) - 2)) Month2
      SUM(val * 1-ABS(SIGN(DATEDIFF(month, date, GETDATE()) - 3)) Month3
      SUM(val * 1-ABS(SIGN(DATEDIFF(month, date, GETDATE()) - 4)) Month4
      SUM(val * 1-ABS(SIGN(DATEDIFF(month, date, GETDATE()) - 5)) Month5
FROM TABLE 
WHERE date >= DATEADD(month, -11, date)
GROUP BY MONTH(date)

That’s it. Note that the only offensive thing about this is that the column names aren’t very descriptive. I had the benefit of doing this in SQL server 8 so I could build queries inside the stored procedure and build descriptive column names like ‘Jan’, ‘Feb’, ‘Mar’, …. On the whole this is usually a bad thing to do inside a stored procedure because it means that the stored procedure can’t be fully compiled until it is run which is one of the reasons for using stored procedures in the first place! If performance was a consideration and this stored procedure was being called a lot I would probably choose to have the client application rename the columns.

The post rozenshtein method for pivoting relational data first appeared on hackinghat.com.

secondary school mathematics that was almost useful

Steve Knight — Fri, 17 Nov 2006 09:12:18 +0000

For the first time in my career I came across the need to make use of a PRODUCT() SQL aggregate function today. To my dismay SQL Server 8 doesnâ€t have such a function and so I figured out it can be simulated by use of logarithims and the SUM() function (since (log10(a) + log10(b))^10 = a * b) Therefore my query could read:

SELECT POWER(10, SUM(LOG10(value)))
FROM table

But this query doesnâ€t work if there are negative numbers in the PRODUCT() list because LOG10(x) where x<0 is undefined. Guess what, there’s a bunch of negative numbers in my list. Arghh … So this complicates matters a bit. But not very much. We can simply do the same as before but now on the ABS() value of the number but additionally count the number of times we have -ve numbers and then if that count is odd then the result is negative. Otherwise it’s positive. So to do this we will need to:

SELECT COUNT(CASE WHEN CONVERT(INT,SIGN(value))<0 THEN 1 ELSE NULL END)
FROM table

Which will give us a count of the number of negative values. If we take the modulo two of this number then we will get a zero if the overall result should be positive (i.e. it's even) and a one if it should be negative (i.e. it's odd). The rest is easy. Since, for y=-2x+1 then when x=0 or x=1 we will have 1 and -1 which we can multiply our result by to get the correct sign! Here is the full query:

SELECT POWER(10, SUM(LOG10(ABS(value)))) *
(-2 * (COUNT(CASE WHEN CONVERT(INT,SIGN(value))<0 THEN 1 ELSE NULL END) % 2) + 1)
FROM table

I wish I could thank my secondary school mathematics teacher Mrs Bishop for this mathematical insight. But unfortunately she was crap, so I won't. Finally, why was it only almost useful? Well ingenious as I think it is it's also a bit twisted. A lot of clarity has been sacrificed for the sake of a performance gain and this isn't usually good. If the tables were very large then it might be worth using the above instead of the cursor that I was trying to replace. But only then. Which begs the question why don't SQL vendors supply a PRODUCT() function? Got to be almost a cut and paste job. I see that in SQL Server 2005 one can make one's own aggregates via the CLR. I almost like this idea but I also fear all manner of nasty / filthy code being inserted into the server. How long before someone writes a SQL Server 2005 virus?

The post secondary school mathematics that was almost useful first appeared on hackinghat.com.

sql - hackinghat.com

rozenshtein method for pivoting relational data

secondary school mathematics that was almost useful