Back to the Basics of SQL Server

Back to the SQL Server Basics with Wayne

The Myth

I recently was attending a presentation where the presenter stated that the statistic with stats_id = 1 is always for the primary key (PK). The presenter made this statement based on the behavior that he has seen. However, is it true?

The relationship between index_id and stats_id

The first thing that needs understood is the relationship between indexes and statistics. In the sys.stats documentation, it states that every index has a statistics row with index_id = stats_id, as seen in this screen shot from that article:

Is the index with index_id = 1 special?

So now we need to consider whether there is any special meaning to an index with an index_id of 1? The index_id values are in the sys.indexes documentation, as seen in this screen shot:

 

Here we can see that index_id 1 means that the index is a clustered index. Note that there is no mention of a primary key. So where did this observed behavior come from? We need to dig a bit further into how SQL Server makes a primary key constraint.

Primary Key constraints

In the constraint documentation, in the remarks section, the second paragraph states that the primary key and unique constraints have an index created. In fact, unique indexes enforce these two constraints. At the top of this article, it discusses when those constraints create clustered or non-clustered indexes:


PK defaults to clustered. If there is a clustered index already, PK defaults to non-clustered. Note that this is the default behavior. You can specify that it should be created as non-clustered.

Now we can understand why the presenter’s observed behavior was that a PK always creates a statistic with stats_id of 1. This is the default behavior if that table doesn’t already have a clustered index.

Testing

Let’s run through a few examples showing these conditions. Let’s create a table with a million rows, and run some tests against it:

Now that we have the test data, we will run a simple query to create a statistic:

This shows the following statistic:

This doesn’t say anything about the PK being stats_id 1 since there isn’t a PK. So, let’s create a clustered index, then add a PK:

Here we can see that the clustered index (IX1) does have stats_id of 1, and the PK has stats_id of 3. This proves that when there is an existing clustered index, creating the PK will create a statistic with a stats_id other than 1.

Sidebar…

If you were to look at sys.indexes, you would see that these two indexes use index_id values of 1 and 3. The value 2 is skipped. It’s not because there used to be an index that was deleted after the index_id 3 index was created. It’s simply because of the relationship that index_id = stats_id, and there is already a statistic with stats_id = 2. When creating the index for the primary key, index_id 2 had to be skipped.

Now let’s recreate the test table (run the above code), and then add a non-clustered PK.

Which returns:

This shows that the non-clustered PK (index_id > 1) created a statistics with a corresponding stats_id that is also > 1.

The only thing left to prove is that creating a PK where there is not a clustered index, and allowing the use of defaults, will create a stats_id = 1:

This shows us that the statistic for the PK has a stats_id of 1:

Summary

When creating a primary key, it creates an underlying index and statistic for that index. The index will be the clustered index on the table if there is not a clustered index on the table already, and if the statement to create the primary key did not specify non-clustered. If there is an existing clustered index on the table, or it was specified to create a non-clustered PK, then a non-clustered index is created for the PK. The clustered index will have an index_id (and stats_id) of 1, and all other non-clustered indexes will have a >1 value.

I’ve talked about other clustered index myths here, here, here and here.