BI Log: January 2012

Wednesday 25 January 2012

SQL Server - Primary Keys

I will say that it's amazing what you find out when you try things a different way. Presumably we all know that columns taking part in a primary key cannot be NULL but have you ever tried the different ways of creating them and seeing what happens? Take the following for instance:

CREATE TABLE [#tab1]
(
        [my_key_col1] INT PRIMARY KEY CLUSTERED,
        [multiplier]  FLOAT
);

Note how the default for a column is to be nullable when not specified yet the statement succeeds. Looking at the table definition (i.e. change into [tempdb], highlight the name of the table and press ALT-F1) you can see that the column taking part in the primary key has actually been set to NOT NULL. So how about this way:

CREATE TABLE [#tab2]
(
        [my_key_col1] INT,
        [multiplier]  FLOAT,
                PRIMARY KEY CLUSTERED([my_key_col1])
);

This works as well, also setting the column taking part in the primary key to NOT NULL. Clearly SQL Server will set any columns taking part in a primary key to be not nullable when "nullability" is not specified in the column definition. i.e. Leaving it to use the defaults. If you do specifically set the column to be NOT NULL then the create statement will fail. Indeed, books online for SQL Server 2008 for the CREATE TABLE construct states:

All columns defined within a PRIMARY KEY constraint must be defined as NOT NULL.
If nullability is not specified, all columns participating in a PRIMARY KEY constraint have their nullability set to NOT NULL.

Now what about the following?

CREATE TABLE [#tab3]
(
        [my_key_col1] INT,
        [multiplier]  FLOAT
);
ALTER TABLE [#tab3] ADD PRIMARY KEY CLUSTERED([my_key_col1]);

As expected this fails to add the primary key because the "nullability" of the column has been defined at the time the primary key is to be added. Once the table has been materialised all aspects of its definition have been put in place. Thus trying to add a primary key to columns defined as nullable must fail.

I always explicitly specify the "nullability" of columns in the definition of all tables I create, including temporary tables, which is why the first two examples surprised me. I was not expecting SQL Server to override the defaults. Hmmmm, just goes to show that we are never too old to learn eh?

Now before we finish, let's clean up after ourselves...

DROP TABLE [#tab1], [#tab2], [#tab3];

Friday 20 January 2012

SQL Server - Join the UNION

Wow, no posts for months then two in a week. What's going on?

Well, the prompt for this post occurred whilst going through another developer's code in order to performance tune it (got it down from over half-an-hour to 1 second btw) and found a derived table using code similar to the following:

 SELECT *

 FROM [dbo].[taba]

UNION ALL

 SELECT *

 FROM [dbo].[tabb]

Well, first off I would like to say that you should never use "SELECT *" other than for truely adhoc queries, ever. Never ever. Did I say ever? Just to be clear then, never ever. Okay, got it? One more time then, never ever use "SELECT *" other than for adhoc queries, ever.

The problem with SELECT * is that what happens when the underlying table is modified? A new column is added, or removed. What about if an update re-orders the columns? This last one I'll come back to but note that depending on the context of the full query, the "SELECT *" part can fail in different ways. Inserting into a given table will fail if a column is added or removed. You may get incorrect information if the columns are re-ordered or it may fail if the data types are incompatible making this especially dangerous. You might argue that a "SELECT * INTO" statement is okay but you once again are either copying data that you don't need, in the case where a column is added, or may just remove a required column without knowing it if a column is dropped from the underlying table.

Trust me on this, production systems should not contain "SELECT *" queries. And don't use them in views either.

Anyhoo, back to the reason for the post. So coming back to the re-ordered columns part. What do you think happens in the UNION ALL statement above if the columns in [taba] are in a different order to [tabb]? Let's assume compatible data types.

Well, try it...

-- DROP TABLE [#t1], [#t2];

CREATE TABLE [#t1] ([col1] int, [col2] int);

CREATE TABLE [#t2] ([col2] int, [col1] int);


INSERT INTO [#t1] ([col1], [col2]) VALUES (1, 10);
INSERT INTO [#t2] ([col1], [col2]) VALUES (2, 20);



 SELECT *

 FROM [#t1]

UNION ALL

 SELECT *

 FROM [#t2];

Expected behaviour? Would you like to see this in your production systems? I'll leave you with that thought.

Wednesday 18 January 2012

SQL Server - Validation During Function Creation

Now here's a little thing I didn't know until recently. Seems like this is happening a lot lately and I really should get around to documenting these at some point. Anyhoo, my latest finding is as follows...

We have a process that, every time a script is checked into our version control system, rebuilds all of the databases from scratch, starting from Release 1 and then applying all updates in order. This normally ensures a problem free release into production, however, this most recent time was different. This time the release into production failed when trying to create a function in one of the databases. So what was different between production and the build process? If I mention that we have a stored procedure that creates a temporary table that is used by the function, does that prompt any ideas of what went wrong?

Well, here's the thing...
For inline table functions, attempting to create one when the underlying object does not exist will fail. However, for scalar and multi-statement table functions things are a little different. If the underlying objects do not exist the functions will be created with no issue. However, if the underlying objects do exist, a check is performed to ensure that their definition matches that expected by the function.

So in our case, because the system is live in production the SP has been called many times and thus the table has been created and dropped many times. However, in the new update the structure of this table has been changed. Now because the SP is never actually called during the build process no table has been created and the function can be deployed. In production however, the SP has been called and the old version of the table does exist, causing the installation to fail.

Stored procedures exhibit this same behaviour but don't have the differences related to inline and multi-statement/scalar functions.

Try it yourself:

/**
* Create functions.
*/
-- Successful.
CREATE FUNCTION [dbo].[udf_scalar_works]()
RETURNS INT
AS
BEGIN
RETURN ( SELECT TOP 1 [col2] FROM [dbo].[bar] );
END;
GO

-- Successful.
CREATE FUNCTION [dbo].[udf_multi_works]()
RETURNS @x TABLE (aaa INT)
AS
BEGIN
INSERT INTO @x (aaa)
SELECT [col2]
FROM [dbo].[bar];

RETURN;
END;
GO

-- Error: Invalid object name 'dbo.bar'.
CREATE FUNCTION [dbo].[udf_inline_doesnt_work]()
RETURNS TABLE
AS
RETURN ( SELECT [col2]
           FROM [dbo].[bar] );
GO

/**
* Create table without the required column.
*/
CREATE TABLE [dbo].[bar] ( [col1] INT );GO

/**
* Try create functions again.
*/
-- Invalid column name 'col2'.
CREATE FUNCTION [dbo].[udf_scalar_doesnt_work]()
RETURNS INT
AS
BEGIN
RETURN ( SELECT TOP 1 [col2] FROM [dbo].[bar] );
END;
GO

-- Error: Invalid column name 'col2'.
CREATE FUNCTION [dbo].[udf_multi_doesnt_work]()
RETURNS @x TABLE (aaa INT)
AS
BEGIN
INSERT INTO @x (aaa)
SELECT [col2]
FROM [dbo].[bar];

RETURN;
END;
GO

-- Error: Invalid column name 'col2'.
CREATE FUNCTION [dbo].[udf_inline_doesnt_work]()
RETURNS TABLE
AS
RETURN ( SELECT [col2]
           FROM [dbo].[bar] );
GO