Wayne Sheffield

My blog about SQL Server

Next up on the SQL Server 2012 Performance testing circuit: Gap Detection. You know, where you find missing values from a table of sequential numbers.

SQL Server 2012 introduces the LEAD function, which I’ve previously blogged about here. In that blog post, I covered how to do Gap Detection using the LEAD function. Now, it’s time to put it to the performance test.

The Test

For this test, I want a large table, with at least a million rows. Obviously, gaps are needed. We need to start the table off with a gap. There needs to be gaps throughout the table. Some as small as one row, and others need to encompass many rows. Each method to test needs to identify the starting point and ending point of a gap.

The table was created with the following code:

IF OBJECT_ID('dbo.GapTest','U') IS NOT NULL DROP TABLE dbo.GapTest;
CREATE TABLE dbo.GapTest (N BIGINT CONSTRAINT PK__GapTest PRIMARY KEY CLUSTERED);
-- start with a number to make a large gap at the start of the table
INSERT INTO dbo.GapTest (N) VALUES (11000000);  
 
-- insert rows to make 2 million
WITH
TENS      (N) AS (SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL
                  SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0),
THOUSANDS (N) AS (SELECT 1 FROM TENS t1 CROSS JOIN TENS t2 CROSS JOIN TENS t3),
MILLIONS  (N) AS (SELECT 1 FROM THOUSANDS t1 CROSS JOIN THOUSANDS t2),
TenMillion(N) AS (SELECT 1 FROM MILLIONS CROSS JOIN TENS),
TALLY     (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM TenMillion)
INSERT INTO dbo.GapTest (N)
SELECT TOP (1999999) (82011000000000) + N
  FROM TALLY;
 
-- delete some records to make gaps
DELETE dbo.GapTest
 WHERE (N BETWEEN 82011000402000 AND 82011000500000
        AND N % 2000 = 0)  -- delete one row every 2000 rows
    OR (N BETWEEN 82011000600001 AND 82011000700000); -- delete this larger gap of 100,000 rows

This code starts off with a starting gap of eleven million. Next, more records are added to this to bring the table up to two million rows. Finally, many records are deleted, giving many gaps of one row, and an additional large gap of 100,000 rows.

The Contenders

The first method to test is the LEAD function. The code for doing this is very simple, and hopefully, extremely efficient and fast:

WITH cte AS
(
SELECT curr = N,
       nxt  = LEAD(N, 1) OVER (ORDER BY N)
  FROM dbo.GapTest
)
SELECT [Start Of Gap] = curr+1,
       [End Of Gap] = nxt-1
  FROM cte
 WHERE nxt-curr > 1;
GO

The second contender is the method that Itzik Ben-Gan published in SQL Server MVP Deep Dives:

SELECT StartRange = N+1,
       EndRange = (SELECT min(lt2.N)
                     FROM dbo.GapTest lt2
                    WHERE lt2.N > lt1.N) - 1
  FROM dbo.GapTest lt1
 WHERE NOT EXISTS
       (SELECT N
          FROM dbo.GapTest lt3
         WHERE N = lt1.N+1)
   AND N < (SELECT max(N) FROM dbo.GapTest lt4);

The third contender is the method that I learned from Jeff Moden on SQLServerCentral.com:

SELECT GapStart = ISNULL((SELECT MAX(lo.N+1)
                      FROM dbo.GapTest lo
                     WHERE lo.N < hi.N),1),
        GapEnd = hi.N - 1
   FROM dbo.GapTest hi
  WHERE hi.N NOT IN (SELECT N + 1 FROM dbo.GapTest)
    AND hi.N > (SELECT MIN(N) FROM dbo.GapTest);

Looking at the code from Jeff and Itzik, you can see that they are very similar, though they do have their differences. Notably, they each have three sub-queries against the table, so it looks like the Reads will be higher than what the LEAD function will have. So, let the battle of the performance gurus vs. SQL Server 2012 begin!

Running the Test

Each of the contenders code is put into a stored procedure, and modified to dump the results to a temporary table instead of returning the output to the screen. The following code is used to generate IO and Time statistics, and to run each method ten times:

SET STATISTICS IO,TIME ON;
GO
EXECUTE dbo.GapTest_LEAD;
EXECUTE dbo.GapTest_Itzik;
EXECUTE dbo.GapTest_Jeff;
GO 10
SET STATISTICS IO,TIME OFF;
GO

The Results

The statistics returned were made into a table and copied into Excel where a line chart was made showing how well each performed. (Units are milli-seconds.)

Well, right off I can see that LEAD does not perform as fast and efficient as I had hoped for. The methods from Itzik and Jeff are extremely close – a virtual tie. So, I took the average of the ten runs and charted that as a bar graph:

Here you can see that they are indeed extremely close – a difference of only 16 milli-seconds. The code from either one of these performance gurus runs 3.5 times faster than what the LEAD function from SQL Server 2012 gives us.

The other statistics are interesting: The LEAD function has 1 scan with 4014 logical reads; the gurus both have 55 scans with 8189 logical reads. So, my guess about more reads was dead on. Unfortunately, this did not lend itself to a performance improvement.

The Wrap Up

Well, this is pretty obvious. The code from either Itzik or Jeff will run astonishingly fast. They are so close, that I won’t even declare one of them a winner. Use either one.

The Invitation

If you have some code that you think is a contender for these, please post a reply with it. I’ll add it to the test suite, and update this post to show how it fares if it’s close or better than what these methods from Itzik or Jeff do.

This month, T-SQL Tuesday is being hosted by Nigel Sammy. Since the RTM of SQL Server 2012 just occurred, he wants to know: “What do you think is a useful feature of SQL Server 2012?”

Well, if you’ve been following my blog, then this should not be much of a surprise to you. My #1 useful feature of SQL Server 2012 is the enhancements made around the OVER clause.

Why the OVER clause?

One thing that I have been following for a few years now is the various methods of performing running totals – a calculation of the total from the previous row plus a value from the current row. I’ve seen several methods, from cursors to utilizing clustered index scans to self-joins. There are pros and cons with using each method, but regardless of how you did it, there were shortcomings to overcome.

How does the OVER clause solve this?

When Microsoft introduced the OVER clause in SQL Server 2005, its use in aggregate functions was pretty limited – you could use the aggregate function with just the PARTITION BY clause. In SQL Server 2012, with the enhancements to the OVER clause following the ANSI standard, we can add to this the use of the ORDER BY clause, and the ROWS|RANGE clause. With this combination, a running total (heck, a running aggregation on most of the aggregate functions) is extremely easy – specify when the aggregation starts over with the PARTITION BY clause; use the ORDER BY clause to specify what order to evaluate the rows; and use the ROWS|RANGE clause to specify which rows to use in the calculation. By specifying the first row through the current row, you have a running aggregation. My testing shows that it is indeed very fast. And testing by others shows this also.

But wait! – There’s More

The new Analytic functions also require the OVER clause, and some of them are bound to be extremely useful as well. I’ve covered the Analytic Functions in a series on my blog,  but the ones I find the most promise for are LAG/LEAD, and FIRST_VALUE/LAST_VALUE.

Conclusion

I think that the OVER clause enhancements, plus the new analytic functions, are a game changer. They are fast; set-based; documented; simple to use. While there might be some methods out there that can still perform slightly better, I see these enhancements taking over. The only con… you need SQL Server 2012.

On Saturday, April 14, 2012, I’ll be presenting “Table Vars and Temp Tables – What you NEED to know!” at SQL Saturday #111 in Alpharetta, GA (just north of Atlanta). Forget the tax man… come out and join me in this code filled session, where we’ll discover the differences and similarities of Temporary Tables and Table Variables, dispel some widespread myths about each, and answer the most important questions of them all, “When do I use one or the other and what are the various impacts of doing so?”

SQL Server 2012 introduces several new date/time functions that allow you to build a date/time from the individual parts of a date/time value. As I was experimenting with them, I was thinking that this is pretty neat, and bound to be very useful. And then I started wondering how well it performed.

Before I had time to test it, I was working on something else where I needed to truncate a date/time value to the first day of the month. I immediately whipped out my toolbelt – it has the routine that I learned from Lynn Pettis a few years back on SQLServerCentral.com from his blog post on Common Date Routines. This routine does the DateAdd(MONTH, DateDiff(MONTH, ’1900-01-01′, @Date), ’1900-01-01′) trick so see how many month boundaries exist from a known date to the tested date, and then to add this number of month boundaries back to the known date. With the month datepart being used, the result ends up being the first of the month of the tested date. Using other dateparts gives us other dates with remainders being truncated.

As I was looking at that wonderfully efficient code, I realized that I could get the same results with the DATETIMEFROMPARTS function in SQL Server 2012. So I coded an example, and it worked beautifully.

And now, I had a burning question. Performance-wise, just how do these different methods of truncating dates compare against each other? So, stop everything that I’m doing, and code a test.

First things first. A performance test requires a bunch of test data. And I mean a bunch. So, I grabbed some code from an article that Jeff Moden wrote on SQLServerCentral.com at the beginning of 2011. Sure enough, there’s a million row dataset of random dates… just what I’m looking for. So, by borrowing some code from Jeff, I have my test data:

Test Data   
SELECT TOP 1000000
       RowNum   = IDENTITY(INT,1,1),
       SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME)
  INTO dbo.DateTest
  FROM sys.All_Columns t1,
       sys.All_Columns t2;

Next up is to generate a test and to produce timings. In the following code, I truncate each date to the first day of the year, month and quarter, and then I truncate the time down to the hour, minute, and second. I perform this conversion twice, the first time using Lynn’s DateAdd/DateDiff code, and the second time by stripping out the various parts of the date, and putting it back together with the DATETIMEFROMPARTS function. Both queries dump the results into temp tables. I run this block of code with SET STATISTICS TIME ON just before each query to capture the time to run each query, and that batch is performed 25 times.

For the DateAdd/DateDiff, each calculation is counting the number of boundaries being crossed. Each boundary that we’re truncating the date to has to be counted individually. To use the SQL Server 2012 DATETIMEFROMPARTS function, we first have to extract each appropriate part before they can be passed back into the function to build the date; this code extracts it out once for each record in the CROSS APPLY, and then uses that value in each of the functions.

First up, the code:

DECLARE @Base   DATETIME = '1900-01-01',
        @Base2  DATETIME = '2000-01-01';
IF OBJECT_ID('tempdb..#test1','U') IS NOT NULL DROP TABLE #test1;
IF OBJECT_ID('tempdb..#test2','U') IS NOT NULL DROP TABLE #test2;
 
RAISERROR ('DateAdd/DateDiff test', 10, 1) WITH NOWAIT;
SET STATISTICS TIME ON;
SELECT  SomeDate,
        DATEADD(YEAR,    DATEDIFF(YEAR,    @Base,  SomeDate), @Base)  AS [FirstDayOfYear],
        DATEADD(MONTH,   DATEDIFF(MONTH,   @Base,  SomeDate), @Base)  AS [FirstDayOfMonth],
        DATEADD(QUARTER, DATEDIFF(QUARTER, @Base,  SomeDate), @Base)  AS [FirstDayOfQuarter],
        DATEADD(HOUR,    DATEDIFF(HOUR,    @Base,  SomeDate), @Base)  AS StartOfHour,
        DATEADD(MINUTE,  DATEDIFF(MINUTE,  @Base,  SomeDate), @Base)  AS StartOfMinute,
        DATEADD(SECOND,  DATEDIFF(SECOND,  @Base2, SomeDate), @Base2) AS StartOfSecond
  INTO  #test1
  FROM  dbo.DateTest;
SET STATISTICS TIME OFF;
 
RAISERROR ('DateTimeFromParts test', 10, 1) WITH NOWAIT;
SET STATISTICS TIME ON;
SELECT  SomeDate,
        DATETIMEFROMPARTS(ca.Yr, 1,      1, 0, 0, 0, 0)                  AS [FirstDayOfYear],
        DATETIMEFROMPARTS(ca.Yr, ca.Mth, 1, 0, 0, 0, 0)                  AS [FirstDayOfMonth],
        DATETIMEFROMPARTS(ca.Yr, ca.Qtr, 1, 0, 0, 0, 0)                  AS [FirstDayOfQuarter],
        DATETIMEFROMPARTS(ca.Yr, ca.Mth, ca.Dy, ca.Hr, 0, 0, 0)          AS StartOfHour,
        DATETIMEFROMPARTS(ca.Yr, ca.Mth, ca.Dy, ca.Hr, ca.Mn, 0, 0)      AS StartOfMinute,
        DATETIMEFROMPARTS(ca.Yr, ca.Mth, ca.Dy, ca.Hr, ca.Mn, ca.Sec, 0) AS StartOfSecond
INTO #test2
FROM dbo.DateTest dt
     CROSS APPLY (SELECT DATEPART(YEAR,   dt.SomeDate) AS Yr,
                         DATEPART(MONTH,  dt.SomeDate) AS Mth,
                         ((CEILING(MONTH(dt.SomeDate)/3.0)*3)-2) AS Qtr,
                         DATEPART(DAY,    dt.SomeDate) AS Dy,
                         DATEPART(HOUR,   dt.SomeDate) AS Hr,
                         DATEPART(MINUTE, dt.SomeDate) AS Mn,
                         DATEPART(SECOND, dt.SomeDate) AS Sec ) ca;
 
SET STATISTICS TIME OFF;
GO 25

I then took the results, and used a split function to get the timings from each query into a result set. This was coped into Excel, where the following line graphs tell the rest of the story:

 

 

I think I’ll continue to use Lynn’s DateAdd/DateDiff method when I need to truncate a date.

(Test environment: Windows 7 x64; SQL Server 2012 x64; 8GB ram)

This month, Argenis Fernandez (blog, @DBArgenis) is hosting our monthly T-SQL Tuesday, and he wants to know: Are you specialized? On something? Or anything at all? Has that been a good or a bad thing? Why?

I do specialize, in SQL Server, in development and administration areas.

Why I decided to specialize

My decision to specialize occurred many years ago. I was working as a long-term consultant, programming in Delphi (what I still feel is the best programming language ever) and using SQL Server 6.5 as the back-end database. Of course, this meant that I fell into the accidental DBA role that so many of us have started our SQL Server careers in. After being in this long term consultant role for almost 8 years, the company made what I refer to as a “penny-wise, but pound-foolish” decision to eliminate the consultant line item on their expense sheet, and to replace me with a cheaper programmer. (They couldn’t find any Delphi programmers willing to work for as little as they were offering, so they then decided to convert the application from Delphi to Visual Basic, where programmers were a dime a dozen. At last check, they had three programmers working on just the application side, plus database support, and they are now paying more in salary for them than what they were for me as a consultant.)

Why I decided to specialize in SQL Server

Well, the Delphi market has never been strong in the United States, and during this time that I was consulting I had noticed it dropping even further, what with companies switching to the .NET platform. So, an analysis of my skills was pretty easy: a programming language that is becoming obsolete (especially in the geographic area where I work), and SQL Server. Now, I feel that I’m a good programmer, so I know that I could learn .Net and C# myself… but I really didn’t want to both learn and keep up with all the advances in this language, plus all of the advances in SQL Server. And thus, the decision was made to specialize in SQL Server.

My progression to specialize

With having made my decision to specialize in SQL Server, I then focused on learning all that I could about it. One key area to me in this was in taking and obtaining some of the various certification exams and certifications. (Even though the certification process has a bad reputation in the community, let’s face it: prospective employers like to see them, and I needed a job – I needed that edge over other candidates.) As I would study the areas being testing, and practice the skills being taught, I learned so much more. When reaching out to the internet to understand some of these items, I encountered the wonderful SQL community that we are all a part of. As good as I thought I was, I quickly learned that there was a lot more to this path than what I had envisioned – and everything that I was learning intrigued me even more. Even though I didn’t pass every certification exam on the first try, I strove on. As I earned these certifications, I was able to land jobs that enabled me to focus on improving my SQL skills even further.

The best side-effects of my decision to specialize

Most importantly, and not beating around the bush, specializing in SQL Server brings in a pretty comfortable income. I’m able to take care of my family, and to do things with them that we otherwise wouldn’t be able to do.

Secondly is the connections that I’ve been able to make with the SQL Community. The encouragement and friendships that I’ve made with all of you’ll has been extremely up-lifting. The sharing between people, both about SQL and personally, is very special to me.

And this leads directly into the next best reason – through the SQL Community, I’ve made several good friends. My life is immensely better for having them in it.

What’s next?

Several years ago, due to the encouragement of the SQL community, I pushed beyond my comfort zone and starting writing some articles. And I found out that I like writing articles, and am pretty good at it. And then, with more encouragement from the community (specifically those of you that pushed me really hard), I took one of my popular articles and made a presentation out of it. I have to tell you that I was very scared… remembrances of my poor speaking while in the Navy a lifetime ago had me terrified. And as I stepped out of my comfort zone yet again, and started doing SQL presentations at user groups and SQL Saturdays, I found out yet once again that this is something that I like and enjoy very much. So much so that I can envision myself in the future, becoming a traveling trainer in SQL – I really do enjoy teaching people, and seeing them learn something from what I’m saying.

Why I’m glad I specialize in SQL Server

So, in summary, there are several reasons why I’m glad that I made the decision to specialize in SQL Server:

  1. SQL Server – it’s a fabulous product, and I find out more and more about it all the time.
  2. Money – I’m able to provide for my family.
  3. Job Satisfaction – I really love the work that I do with SQL Server.
  4. Community – what can I say, we all know and love the SQL Community for what it is, and for how it is so different from other online communities for technology. It’s one of a kind, to be emulated by all. I’m honored to be a part of it.
  5. Friendships – I’ve met and become friends with so many great people that I otherwise would never have met.
  6. Training – I don’t think that I would have started writing or teaching without having immersed myself into SQL Server by specializing in it. And to think that I can envision myself in several years doing just this as something to do in retirement – it just wouldn’t have been possible without having made the decision to specialize in SQL Server.

So, my decision to specialize has had a dramatic effect on my life. I wouldn’t be where I am today, or have the plans that I have for the future, without it. I feel like I am living the blessing spoken by Mr Spock so frequently, and which I now wish upon all of you: “Live Long, and Prosper”.

On Saturday, March 10, 2012, I’ll be presenting “Table Vars and Temp Tables – What you NEED to know!” at SQL Saturday #110 in Tampa, FL. Come out and join me in this code filled session, where we’ll discover the differences and similarities of Temporary Tables and Table Variables, dispel some widespread myths about each, and answer the most important questions of them all, “When do I use one or the other and what are the various impacts of doing so?”

I’m really blessed to have been selected for this event – there were hundreds of sessions submitted by many presenters. Let’s hope that they really like this session so that I will be invited back.

Personally, I’m very happy about this for another reason… my wonderful mother-in-law lives in the area, so I’ll get to visit with her for the weekend also. (And for those questioning… yes, she is that great!)

Table-Valued Functions. What a wonderful addition to SQL they make. They take parameters, do some work, and return a result set that can be used in queries. You can select directly against them, or utilize them with the APPLY operator. These are truly versatile additions to SQL -and  since you can pass parameters to them, they are like a parametrized view. And we have two different types to work with: Inline Table-Valued Functions (ITVF) and Multi-Statement Table-Valued Functions (MTVF).

But how do they compare with each other? Well, let’s start off by looking at the syntax of each:

CREATE FUNCTION Util.MyITVFunction (@Parameters INT)
RETURNS TABLE
AS
RETURN
SELECT TOP (@Parameters) N
  FROM Util.Tally
 ORDER BY N;
GO
CREATE FUNCTION Util.MyMTVFunction (@Parameters INT)
RETURNS @FunctionResultTableVariable TABLE (N INT)
AS
BEGIN
  INSERT INTO @FunctionResultTableVariable (N)
  SELECT TOP (@Parameters) N
    FROM Util.Tally
   ORDER BY N;
  RETURN;
END
GO

The changes in the syntax are that MTVFs must first declare a table variable that is to be returned. Secondly, the MTVF must have a BEGIN/END block. Third, inside the BEGIN/END block you need code that populates the table variable. And finally, you return from the function. In comparison, the ITVF just returns a select statement – there is no table variable to mess around with, no inserts, no code blocks. Just a SELECT statement.

So, how do these perform? For this comparison, let’s use the example that Microsoft supplies in Books Online (BOL) for the APPLY operator (http://technet.microsoft.com/en-us/library/ms175156.aspx):

First, make and populate two tables: Employees and Departments:

--Create Employees table and insert values.
CREATE TABLE Employees
(
    empid   int         NOT NULL
    ,mgrid   int         NULL
    ,empname varchar(25) NOT NULL
    ,salary  money       NOT NULL
    CONSTRAINT PK_Employees PRIMARY KEY(empid)
);
GO
INSERT INTO Employees VALUES(1 , NULL, 'Nancy'   , $10000.00);
INSERT INTO Employees VALUES(2 , 1   , 'Andrew'  , $5000.00);
INSERT INTO Employees VALUES(3 , 1   , 'Janet'   , $5000.00);
INSERT INTO Employees VALUES(4 , 1   , 'Margaret', $5000.00);
INSERT INTO Employees VALUES(5 , 2   , 'Steven'  , $2500.00);
INSERT INTO Employees VALUES(6 , 2   , 'Michael' , $2500.00);
INSERT INTO Employees VALUES(7 , 3   , 'Robert'  , $2500.00);
INSERT INTO Employees VALUES(8 , 3   , 'Laura'   , $2500.00);
INSERT INTO Employees VALUES(9 , 3   , 'Ann'     , $2500.00);
INSERT INTO Employees VALUES(10, 4   , 'Ina'     , $2500.00);
INSERT INTO Employees VALUES(11, 7   , 'David'   , $2000.00);
INSERT INTO Employees VALUES(12, 7   , 'Ron'     , $2000.00);
INSERT INTO Employees VALUES(13, 7   , 'Dan'     , $2000.00);
INSERT INTO Employees VALUES(14, 11  , 'James'   , $1500.00);
GO
--Create Departments table and insert values.
CREATE TABLE Departments
(
    deptid    INT NOT NULL PRIMARY KEY
    ,deptname  VARCHAR(25) NOT NULL
    ,deptmgrid INT NULL REFERENCES Employees
);
GO
INSERT INTO Departments VALUES(1, 'HR',           2);
INSERT INTO Departments VALUES(2, 'Marketing',    7);
INSERT INTO Departments VALUES(3, 'Finance',      8);
INSERT INTO Departments VALUES(4, 'R&D',          9);
INSERT INTO Departments VALUES(5, 'Training',     4);
INSERT INTO Departments VALUES(6, 'Gardening', NULL);

In this example, most (but not all) of the departments in the Departments table have a manager ID that corresponds to an employee in the Employees table. The following table-valued function accepts an employee ID as an argument and returns that employee and all of his/her subordinates.

CREATE FUNCTION dbo.fn_getsubtree(@empid AS INT)
    RETURNS @TREE TABLE
(
    empid   INT NOT NULL
    ,empname VARCHAR(25) NOT NULL
    ,mgrid   INT NULL
    ,lvl     INT NOT NULL
)
AS
BEGIN
  WITH Employees_Subtree(empid, empname, mgrid, lvl)
  AS
  (
    -- Anchor Member (AM)
    SELECT empid, empname, mgrid, 0
    FROM Employees
    WHERE empid = @empid
 
    UNION all
 
    -- Recursive Member (RM)
    SELECT e.empid, e.empname, e.mgrid, es.lvl+1
    FROM Employees AS e
      JOIN Employees_Subtree AS es
        ON e.mgrid = es.empid
  )
  INSERT INTO @TREE
    SELECT * FROM Employees_Subtree;
 
  RETURN
END
GO

To return all of the subordinates in all levels for the manager of each department, use the following query:

SELECT D.deptid,
       D.deptname,
       D.deptmgrid,
       ST.empid,
       ST.empname,
       ST.mgrid
  FROM Departments AS D
       CROSS APPLY fn_getsubtree(D.deptmgrid) AS ST;

Which returns this result set:

deptid deptname deptmgrid empid empname mgrid
1 HR 2 2 Andrew 1
1 HR 2 5 Steven 2
1 HR 2 6 Michael 2
2 Marketing 7 7 Robert 3
2 Marketing 7 11 David 7
2 Marketing 7 12 Ron 7
2 Marketing 7 13 Dan 7
2 Marketing 7 14 James 11
3 Finance 8 8 Laura 3
4 R&D 9 9 Ann 3
5 Training 4 4 Margaret 1
5 Training 4 10 Ina 4

This is a MTVF. Now, let’s convert it to an ITVF by removing the table variable declaration, the begin/end block, the insert statement, and move the RETURN to the start:

CREATE FUNCTION dbo.fn_getsubtreeITVF(@empid AS INT)
    RETURNS TABLE
AS
RETURN
  WITH Employees_Subtree(empid, empname, mgrid, lvl)
  AS
  (
    -- Anchor Member (AM)
    SELECT empid, empname, mgrid, 0
    FROM Employees
    WHERE empid = @empid
 
    UNION all
 
    -- Recursive Member (RM)
    SELECT e.empid, e.empname, e.mgrid, es.lvl+1
    FROM Employees AS e
      JOIN Employees_Subtree AS es
        ON e.mgrid = es.empid
  )
    SELECT * FROM Employees_Subtree;
GO

As you would expect, this function returns exactly the same result set. So, let’s look at how they perform.

Let’s get the statistics of each by running SET STATISTICS IO,TIME ON before the two queries. We’ll also grab the actual execution plans, and capture the activity with Profiler. The following statistics are returned:

MTVF:

Table '#15502E78'. Scan count 6, logical reads 6...
Table 'Departments'. Scan count 1, logical reads 2...
SQL Server Execution Times:
CPU time = 15 ms,  elapsed time = 12 ms.

ITVF:

Table 'Worktable'. Scan count 7, logical reads 85...
Table 'Employees'. Scan count 1, logical reads 35...
Table 'Departments'. Scan count 1, logical reads 2...
SQL Server Execution Times:
CPU time = 0 ms,  elapsed time = 114 ms.

Note that for the MTVF, that the Employees table doesn’t appear to have been touched. Instead, what we are seeing is some read activity on a table variable – some very low read activity. Also notice that the elapsed CPU time for the MTVF is substantially greater than the ITVF. (The total elapsed time is related to how long it takes to return the information to the client, so I disregard this value.) Since the function is being called 6 times, the table variable is built, populated, and then read from 6 times, hence the value of 6. But how many reads were being performed against the Employees table? We know that the function is accessing the Employees table, but we have no clue as to what the IO statistics are for that table. Instead, the statistics show only the reading of the data from the MTVF table variable, not the reading and inserting of data into it.

In looking at the statistics, it appears that the ITVF is doing a lot more work than the MTVF – but it is running considerably faster. (For this small set of data, both are extremely fast, but you can see that the ITVF is so fast that it can’t be registered at the millisecond level.)

Let’s compare this to what Profiler caught:

As you can see, they have the same CPU time that the statistics caught. But here we can see the total read activity that the MTVF is doing, and that the ITVF is doing 2/3 the reads of the MTVF. Again, I’m ignoring the Duration column since it can be impacted by other activity going on across the network / client computer.

Finally, let’s look at the execution plans:

MTVF:

ITVF:

In the MTVF, you see only an operation called “Table Valued Function”. Everything that it is doing is essentially a black box – something is happening, and data gets returned. For MTVFs, SQL can’t “see” what it is that the MTVF is doing since it is being run in a separate context. What this means is that SQL has to run the MTVF as it is written, without being able to make any optimizations in the query plan to optimize it.

In the ITVF, everything that it is doing is being shown… just like a view, it’s activity is being “inlined” into the query plan. Since SQL now can see everything that is going on (across the entire query), it can make optimizations in the query plan to be more efficient.

This example is just on a small handful of records in both tables, but we can already see a performance difference. When you expand these tables to tens of thousands of records, this difference is really magnified. I have seen performance improvements that take a query from running in tens of minutes to seconds, simply by converting a MTVF to an ITVF.

Another thing to note in these execution plans – look at the query cost (relative to the batch) percentages. They show that the MTVF is 20%, and the ITVF is 80%. So, that portion of the execution plan is also misleading.

By the way… scalar functions work nearly essentially the same as MTVFs. You can get a similar substantial performance boost by converting these to ITVFs, or possibly by just JOINing to the table.

So, in closing, use those ITVFs. And try to keep away from MTVFs and scalar functions. Those functions are necessary sometimes, but use the ITVFs if possible… your SQL Server will thank you.

PS – another closing note. When you have MTVFs, remember that both the IO statistics, and the execution plan total cost percentages are misleading.

I’m honored to have been selected to present two sessions at SQL Saturday #96 in our nation’s capital, Washington DC on Saturday, November 5, 2011.

The first session is in room 7091 at 1:15pm: “Table Vars and Temp Tables – What you NEED to know!” Come join me in this code filled session, where we’ll discover the differences and similarities of Temporary Tables and Table Variables, investigate some widespread myths about each, and answer the most important questions of them all, “When do I use one or the other and what are the various impacts of doing so?”

The second session is in room 5184 at 3:45pm: “Banish RBAR!” Come join me in examining several recent additions to SQL Server that can greatly improve the performance of your code. Learn how the APPLY operator works; iterate through incoming data just once with the MERGE statement; “slice and dice” your data with the Windowing (ranking) functions; re-write your multi-statement table-valued functions to inline functions to help the optimizer and speed up your queries; learn how to create a grouped delimited list – without loops! Code demos are throughout this session.

If you’re in the area, I hope to see you in one or both of these sessions!

On Saturday, October 29, 2011 at 2:45 pm, I’ll be presenting “Table Vars and Temp Tables – What you NEED to know!” at SQL Saturday #87 in Louisville, KY. Come out and join me in this code filled session, where we’ll discover the differences and similarities of Temporary Tables and Table Variables, dispel some widespread myths about each, and answer the most important questions of them all, “When do I use one or the other and what are the various impacts of doing so?”


TC Winner
Actually, I only submitted the winning entry; Denali CTP3 SQL Server 2012 is the real winner. After having a blog posted here recently about running totals performance in Denali CTP3 SQL Server 2012, I happened to run across this T-SQL Challenge for doing running totals. On a lark, I posted a Denali CTP3 SQL Server 2012 solution. I fully expected it to be discounted (on a not released version of SQL, other solutions would be faster, heck, anything)… but this simple solution turned out to completely trounce all of the other solutions and it was selected as the top winner! The SQL Server 2012 solution came in with 529 reads, no writes… some of the other solutions that produced the correct answer (and thus became a “winning” solution) came in with MILLIONS of reads, and some with THOUSANDS of writes. Duration? The closest competitor came in at twice the time. Honestly, I’d hate to see what the performance is of some of the other solutions that produced the correct results, but were not performing well enough to make it into the top echelon.

What I’d like is for you to first check out the specifics of the challenge here, then check out all of the winning solutions here. Check out how simply and elegantly SQL Server 2012 lets us do this rather complicated calculation, and the performance stats it produced. Then, check out all of the other solutions. Look at how complicated the code can get, and glance briefly (a longer look might frighten you!) at the performance stats of those solutions. Remember, these other winning solutions are the best performing solution that can be devised on prior versions of SQL Server within the rules of the challenge that produces the correct results.

So, for all those fine folks at Microsoft that implemented the ANSI windowing: thank you for what you did. This just goes to show that this can now enable you to write simpler, faster code quicker than ever before. Thanks!!!