Friday 20 January 2012

SQL Server - Join the UNION

Wow, no posts for months then two in a week. What's going on?

Well, the prompt for this post occurred whilst going through another developer's code in order to performance tune it (got it down from over half-an-hour to 1 second btw) and found a derived table using code similar to the following:

 SELECT *
 FROM [dbo].[taba]
UNION ALL
 SELECT *
 FROM [dbo].[tabb]

Well, first off I would like to say that you should never use "SELECT *" other than for truely adhoc queries, ever. Never ever. Did I say ever? Just to be clear then, never ever. Okay, got it? One more time then, never ever use "SELECT *" other than for adhoc queries, ever.

The problem with SELECT * is that what happens when the underlying table is modified? A new column is added, or removed. What about if an update re-orders the columns? This last one I'll come back to but note that depending on the context of the full query, the "SELECT *" part can fail in different ways. Inserting into a given table will fail if a column is added or removed. You may get incorrect information if the columns are re-ordered or it may fail if the data types are incompatible making this especially dangerous. You  might argue that a "SELECT * INTO" statement is okay but you once again are either copying data that you don't need, in the case where a column is added, or may just remove a required column without knowing it if a column is dropped from the underlying table.

Trust me on this, production systems should not contain "SELECT *" queries. And don't use them in views either.

Anyhoo, back to the reason for the post. So coming back to the re-ordered columns part. What do you think happens in the UNION ALL statement above if the columns in [taba] are in a different order to [tabb]? Let's assume compatible data types.

Well, try it...

-- DROP TABLE [#t1], [#t2];

CREATE TABLE [#t1] ([col1] int, [col2] int);

CREATE TABLE [#t2] ([col2] int, [col1] int);


INSERT INTO [#t1] ([col1], [col2]) VALUES (1, 10);
INSERT INTO [#t2] ([col1], [col2]) VALUES (2, 20);



 SELECT *
 FROM [#t1]
UNION ALL
 SELECT *
 FROM [#t2];

Expected behaviour? Would you like to see this in your production systems? I'll leave you with that thought.

No comments:

Post a Comment