Table-Valued Functions. What a wonderful addition to SQL they make. They take parameters, do some work, and return a result set that can be used in queries. You can select directly against them, or utilize them with the APPLY operator. These are truly versatile additions to SQL -and since you can pass parameters to them, they are like a parametrized view. And we have two different types to work with: Inline Table-Valued Functions (ITVF) and Multi-Statement Table-Valued Functions (MTVF).
But how do they compare with each other? Well, let’s start off by looking at the syntax of each:
1 2 3 4 5 6 7 8 |
CREATE FUNCTION Util.MyITVFunction (@Parameters INT) RETURNS TABLE AS RETURN SELECT TOP (@Parameters) N FROM Util.Tally ORDER BY N; GO |
1 2 3 4 5 6 7 8 9 10 11 |
CREATE FUNCTION Util.MyMTVFunction (@Parameters INT) RETURNS @FunctionResultTableVariable TABLE (N INT) AS BEGIN INSERT INTO @FunctionResultTableVariable (N) SELECT TOP (@Parameters) N FROM Util.Tally ORDER BY N; RETURN; END GO |
The changes in the syntax are that MTVFs must first declare a table variable that is to be returned. Secondly, the MTVF must have a BEGIN/END block. Third, inside the BEGIN/END block you need code that populates the table variable. And finally, you return from the function. In comparison, the ITVF just returns a select statement – there is no table variable to mess around with, no inserts, no code blocks. Just a SELECT statement.
So, how do these perform? For this comparison, let’s use the example that Microsoft supplies in Books Online (BOL) for the APPLY operator (//technet.microsoft.com/en-us/library/ms175156.aspx):
First, make and populate two tables: Employees and Departments:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
--Create Employees table and insert values. CREATE TABLE Employees ( empid int NOT NULL ,mgrid int NULL ,empname varchar(25) NOT NULL ,salary money NOT NULL CONSTRAINT PK_Employees PRIMARY KEY(empid) ); GO INSERT INTO Employees VALUES(1 , NULL, 'Nancy' , $10000.00); INSERT INTO Employees VALUES(2 , 1 , 'Andrew' , $5000.00); INSERT INTO Employees VALUES(3 , 1 , 'Janet' , $5000.00); INSERT INTO Employees VALUES(4 , 1 , 'Margaret', $5000.00); INSERT INTO Employees VALUES(5 , 2 , 'Steven' , $2500.00); INSERT INTO Employees VALUES(6 , 2 , 'Michael' , $2500.00); INSERT INTO Employees VALUES(7 , 3 , 'Robert' , $2500.00); INSERT INTO Employees VALUES(8 , 3 , 'Laura' , $2500.00); INSERT INTO Employees VALUES(9 , 3 , 'Ann' , $2500.00); INSERT INTO Employees VALUES(10, 4 , 'Ina' , $2500.00); INSERT INTO Employees VALUES(11, 7 , 'David' , $2000.00); INSERT INTO Employees VALUES(12, 7 , 'Ron' , $2000.00); INSERT INTO Employees VALUES(13, 7 , 'Dan' , $2000.00); INSERT INTO Employees VALUES(14, 11 , 'James' , $1500.00); GO --Create Departments table and insert values. CREATE TABLE Departments ( deptid INT NOT NULL PRIMARY KEY ,deptname VARCHAR(25) NOT NULL ,deptmgrid INT NULL REFERENCES Employees ); GO INSERT INTO Departments VALUES(1, 'HR', 2); INSERT INTO Departments VALUES(2, 'Marketing', 7); INSERT INTO Departments VALUES(3, 'Finance', 8); INSERT INTO Departments VALUES(4, 'R&D', 9); INSERT INTO Departments VALUES(5, 'Training', 4); INSERT INTO Departments VALUES(6, 'Gardening', NULL); |
In this example, most (but not all) of the departments in the Departments table have a manager ID that corresponds to an employee in the Employees table. The following table-valued function accepts an employee ID as an argument and returns that employee and all of his/her subordinates.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
CREATE FUNCTION dbo.fn_getsubtree(@empid AS INT) RETURNS @TREE TABLE ( empid INT NOT NULL ,empname VARCHAR(25) NOT NULL ,mgrid INT NULL ,lvl INT NOT NULL ) AS BEGIN WITH Employees_Subtree(empid, empname, mgrid, lvl) AS ( -- Anchor Member (AM) SELECT empid, empname, mgrid, 0 FROM Employees WHERE empid = @empid UNION all -- Recursive Member (RM) SELECT e.empid, e.empname, e.mgrid, es.lvl+1 FROM Employees AS e JOIN Employees_Subtree AS es ON e.mgrid = es.empid ) INSERT INTO @TREE SELECT * FROM Employees_Subtree; RETURN END GO |
To return all of the subordinates in all levels for the manager of each department, use the following query:
1 2 3 4 5 6 7 8 |
SELECT D.deptid, D.deptname, D.deptmgrid, ST.empid, ST.empname, ST.mgrid FROM Departments AS D CROSS APPLY fn_getsubtree(D.deptmgrid) AS ST; |
Which returns this result set:
deptid | deptname | deptmgrid | empid | empname | mgrid |
---|---|---|---|---|---|
1 | HR | 2 | 2 | Andrew | 1 |
1 | HR | 2 | 5 | Steven | 2 |
1 | HR | 2 | 6 | Michael | 2 |
2 | Marketing | 7 | 7 | Robert | 3 |
2 | Marketing | 7 | 11 | David | 7 |
2 | Marketing | 7 | 12 | Ron | 7 |
2 | Marketing | 7 | 13 | Dan | 7 |
2 | Marketing | 7 | 14 | James | 11 |
3 | Finance | 8 | 8 | Laura | 3 |
4 | R&D | 9 | 9 | Ann | 3 |
5 | Training | 4 | 4 | Margaret | 1 |
5 | Training | 4 | 10 | Ina | 4 |
This is an MTVF. Now, let’s convert it to an ITVF by removing the table variable declaration, the begin/end block, and the insert statement, and move the RETURN to the start:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
CREATE FUNCTION dbo.fn_getsubtreeITVF(@empid AS INT) RETURNS TABLE AS RETURN WITH Employees_Subtree(empid, empname, mgrid, lvl) AS ( -- Anchor Member (AM) SELECT empid, empname, mgrid, 0 FROM Employees WHERE empid = @empid UNION all -- Recursive Member (RM) SELECT e.empid, e.empname, e.mgrid, es.lvl+1 FROM Employees AS e JOIN Employees_Subtree AS es ON e.mgrid = es.empid ) SELECT * FROM Employees_Subtree; GO |
As you would expect, this function returns exactly the same result set. So, let’s look at how they perform.
Let’s get the statistics of each by running SET STATISTICS IO, TIME ON before the two queries. We’ll also grab the actual execution plans, and capture the activity with Profiler. The following statistics are returned:
MTVF:
1 2 3 4 5 6 7 |
Table '#15502E78'. Scan count 6, logical reads 6... Table 'Departments'. Scan count 1, logical reads 2... SQL Server Execution Times: CPU time = 15 ms, elapsed time = 12 ms. |
ITVF:
1 2 3 4 5 6 7 8 9 |
Table 'Worktable'. Scan count 7, logical reads 85... Table 'Employees'. Scan count 1, logical reads 35... Table 'Departments'. Scan count 1, logical reads 2... SQL Server Execution Times: CPU time = 0 ms, elapsed time = 114 ms. |
Note that for the MTVF, the Employees table doesn’t appear to have been touched. Instead, what we are seeing is some read activity on a table variable – some very low read activity. Also, notice that the elapsed CPU time for the MTVF is substantially greater than the ITVF. (The total elapsed time is related to how long it takes to return the information to the client, so I disregard this value.) Since the function is being called 6 times, the table variable is built, populated, and then read from 6 times, hence the value of 6. But how many reads were being performed against the Employees table? We know that the function is accessing the Employees table, but we have no clue as to what the IO statistics are for that table. Instead, the statistics show only the reading of the data from the MTVF table variable, not the reading and inserting of data into it.
In looking at the statistics, it appears that the ITVF is doing a lot more work than the MTVF – but it is running considerably faster. (For this small set of data, both are extremely fast, but you can see that the ITVF is so fast that it can’t be registered at the millisecond level.)
Let’s compare this to what Profiler caught:
As you can see, they have the same CPU time that the statistics caught. But here we can see the total read activity that the MTVF is doing and that the ITVF is doing 2/3 the reads of the MTVF. Again, I’m ignoring the Duration column since it can be impacted by other activities going on across the network/client computer.
Finally, let’s look at the execution plans:
MTVF:
ITVF:
In the MTVF, you see only an operation called “Table Valued Function”. Everything that it is doing is essentially a black box – something is happening, and data gets returned. For MTVFs, SQL can’t “see” what it is that the MTVF is doing since it is being run in a separate context. What this means is that SQL has to run the MTVF as it is written, without being able to make any optimizations in the query plan to optimize it.
In the ITVF, everything that it is doing is being shown… just like a view, its activity is being “inlined” into the query plan. Since SQL now can see everything that is going on (across the entire query), it can make optimizations in the query plan to be more efficient.
This example is just on a small handful of records in both tables, but we can already see a performance difference. When you expand these tables to tens of thousands of records, this difference is really magnified. I have seen performance improvements that take a query from running in tens of minutes to seconds, simply by converting an MTVF to an ITVF.
Another thing to note in these execution plans – look at the query cost (relative to the batch) percentages. They show that the MTVF is 20%, and the ITVF is 80%. So, that portion of the execution plan is also misleading.
By the way… scalar functions work nearly essentially the same as MTVFs. You can get a similar substantial performance boost by converting these to ITVFs, or possibly by just JOINing to the table.
So, in closing, use those ITVFs. And try to keep away from MTVFs and scalar functions. Those functions are necessary sometimes, but use the ITVFs if possible… your SQL Server will thank you.
PS – another closing note. When you have MTVFs, remember that both the IO statistics and the execution plan total cost percentages are misleading.
Related Posts:
How do different types of functions compare performance wise?