The Myth

One misconception that I see a lot deals with how data is stored in a clustered index. Specifically – is the data in a clustered index stored on the page in physical order? Most people will say “Yes”, and they get this from the definition (see BOL: Clustered and Nonclustered Indexes Described), which states:

Clustered indexes sort and store the data rows in the table or view based on their key values.

Does this mean that the data is stored in strict physical order, that is Row1 is always followed by Row2 which is always followed by Row3? Well, let us investigate this in a bit more depth.

How does a page work?

The first thing to understand is the layout of a data page. (A data page is what SQL Server uses to store the data in a heap (a table without a clustered index), and in the leaf level of a clustered index. If we look at BOL (Understanding Pages and Extents), it has a nice little diagram that shows the basic layout of a data page:


The key thing to point out in the data page is that the Row Offset Array (also known as the Slot Array) contains the offset on the page where that data row exists. The Slot Array will always be in the index key order, but depending on the data and how it changes, the actual row of data can move around on the page. (And yes, I do see that the sentence immediately prior to the picture that states “The entries in the row offset table are in reverse sequence from the sequence of the rows on the page.” But, as we will soon find out, this is only true when records are initially added to this page.)

Time to test!

So, let’s see what is happening. Let’s start off by creating a new database to play around in, and making a table in this database to use:

Next, insert 5 rows into the PageTest table. (Each row will use 500 of the 1000 characters available in the Col1 column.)

The next step is to get the page id for this table in the database:

Get the PagePID value for the row with a NOT NULL IAMFID, and examine the contents of that page using the DBCC PAGE command (Note that for my example, the PagePID has a value of 264 – if you get a different value, use your value in all of the subsesquent DBCC PAGE commands. Additionally, DBCC PAGE sends its results to the SQL Log. To get the results to return to the screen, turn trace flag 3604 on as shown below.)

(Note: DBCC PAGE is not documented in BOL, but you can find out how it works by clicking here.)

As you examine the contents of the page, pay particular attention to the slot array, and the page offset for where the row data starts at:

The next step is to update the data in the row where RowID=3 to have 1000 characters in the Col1 column:
(Note that the array is zero-based, so the third row is in Slot 2.)

Finally, rerun the DBCC PAGE command and look at the slot array and the row offsets:

What this demonstrates is that even though the assigned location did not change (Row 3 is still in slot 2), we can tell that it’s length has been updated, and that its offset location on the page has been moved to be after the other rows on this page. This leaves a gap in the page where the row was.

Filling the gap

This gap can be used by SQL if it attempts to add a new row into this page, and there is not enough room at the end of the page, but there is room on the page to fit this row. SQL will move all of the existing rows up to utilize the unused space, and add the new row at the end of the page where the consolidated unused space is now at. Let’s watch this in action by first adding several rows of data to the table (but just short of the number necessary to cause this consolidation) and examining the page again:

And the slots and offsets are:

Add one more row and examine the page:

 

When this row was added, there was not enough space at the end of the page. There was enough unused space on this page to hold the row, so the unused space was consolidated together at the end of the page by moving the rows subsequent to the unused space up. Finally, the row was added at the end of the page. Of particular note is that the rows on the page were not sorted by the slot array; all that was performed was that the unused space was moved to the end of the page.

What we have looked at so far is modifying the non-key data that causes the row location in the page to change. Based on what has been seen so far, we can see that the actual data was inserted at the unused space at the end of the page. The slot array is kept in the proper physical order. The key data hasn’t been changed, and the keys were automatically generated through an IDENTITY column, causing the slot array to be populated in the proper order. But things don’t always work this way.

Inserting non-sequential key data

Let’s look into what happens if we insert the data that is not in the order of the key – something that is likely to happen with a natural key that doesn’t have an ever-increasing value.

For this, let’s drop the table, and recreate it without the identity column. Insert some data with a gap in the key. Examine the slot array on the page, then insert the missing value and examine the slot array again.

We can see that the slot array was sorted to be in the proper key value order. However, the actual data for this row was put into the unused space at the end of the page, and the slot array for this row reflects this in the offset location.

Summary

The slot array is always maintained in key value order, and the rows on each page are those that should be on that page as reflected by the key value of the row. However, the physical storage of the row on that page is not necessarily maintained in key value order. Changes to data so that the data no longer fits into its original space, or inserts that are not in the key value order can cause the actual data to not be physically stored in the key order and will be stored at the beginning of the unused space at the end of the page. When the unused space is consolidated, existing rows are moved up the page, but they are not resorted.