BI Log

Wednesday, 25 January 2012

SQL Server - Primary Keys

I will say that it's amazing what you find out when you try things a different way. Presumably we all know that columns taking part in a primary key cannot be NULL but have you ever tried the different ways of creating them and seeing what happens? Take the following for instance:

CREATE TABLE [#tab1]
(
        [my_key_col1] INT PRIMARY KEY CLUSTERED,
        [multiplier]  FLOAT
);

Note how the default for a column is to be nullable when not specified yet the statement succeeds. Looking at the table definition (i.e. change into [tempdb], highlight the name of the table and press ALT-F1) you can see that the column taking part in the primary key has actually been set to NOT NULL. So how about this way:

CREATE TABLE [#tab2]
(
        [my_key_col1] INT,
        [multiplier]  FLOAT,
                PRIMARY KEY CLUSTERED([my_key_col1])
);

This works as well, also setting the column taking part in the primary key to NOT NULL. Clearly SQL Server will set any columns taking part in a primary key to be not nullable when "nullability" is not specified in the column definition. i.e. Leaving it to use the defaults. If you do specifically set the column to be NOT NULL then the create statement will fail. Indeed, books online for SQL Server 2008 for the CREATE TABLE construct states:

All columns defined within a PRIMARY KEY constraint must be defined as NOT NULL.
If nullability is not specified, all columns participating in a PRIMARY KEY constraint have their nullability set to NOT NULL.

Now what about the following?

CREATE TABLE [#tab3]
(
        [my_key_col1] INT,
        [multiplier]  FLOAT
);
ALTER TABLE [#tab3] ADD PRIMARY KEY CLUSTERED([my_key_col1]);

As expected this fails to add the primary key because the "nullability" of the column has been defined at the time the primary key is to be added. Once the table has been materialised all aspects of its definition have been put in place. Thus trying to add a primary key to columns defined as nullable must fail.

I always explicitly specify the "nullability" of columns in the definition of all tables I create, including temporary tables, which is why the first two examples surprised me. I was not expecting SQL Server to override the defaults. Hmmmm, just goes to show that we are never too old to learn eh?

Now before we finish, let's clean up after ourselves...

DROP TABLE [#tab1], [#tab2], [#tab3];

Friday, 20 January 2012

SQL Server - Join the UNION

Wow, no posts for months then two in a week. What's going on?

Well, the prompt for this post occurred whilst going through another developer's code in order to performance tune it (got it down from over half-an-hour to 1 second btw) and found a derived table using code similar to the following:

 SELECT *

 FROM [dbo].[taba]

UNION ALL

 SELECT *

 FROM [dbo].[tabb]

Well, first off I would like to say that you should never use "SELECT *" other than for truely adhoc queries, ever. Never ever. Did I say ever? Just to be clear then, never ever. Okay, got it? One more time then, never ever use "SELECT *" other than for adhoc queries, ever.

The problem with SELECT * is that what happens when the underlying table is modified? A new column is added, or removed. What about if an update re-orders the columns? This last one I'll come back to but note that depending on the context of the full query, the "SELECT *" part can fail in different ways. Inserting into a given table will fail if a column is added or removed. You may get incorrect information if the columns are re-ordered or it may fail if the data types are incompatible making this especially dangerous. You might argue that a "SELECT * INTO" statement is okay but you once again are either copying data that you don't need, in the case where a column is added, or may just remove a required column without knowing it if a column is dropped from the underlying table.

Trust me on this, production systems should not contain "SELECT *" queries. And don't use them in views either.

Anyhoo, back to the reason for the post. So coming back to the re-ordered columns part. What do you think happens in the UNION ALL statement above if the columns in [taba] are in a different order to [tabb]? Let's assume compatible data types.

Well, try it...

-- DROP TABLE [#t1], [#t2];

CREATE TABLE [#t1] ([col1] int, [col2] int);

CREATE TABLE [#t2] ([col2] int, [col1] int);


INSERT INTO [#t1] ([col1], [col2]) VALUES (1, 10);
INSERT INTO [#t2] ([col1], [col2]) VALUES (2, 20);



 SELECT *

 FROM [#t1]

UNION ALL

 SELECT *

 FROM [#t2];

Expected behaviour? Would you like to see this in your production systems? I'll leave you with that thought.