One of the things that I’m the most excited about in “Denali” CTP3 is the enhancements to the OVER clause. One of the things that I’m particularly excited about is the addition of the ROWS/RANGE clause support. One of the reasons that I’m excited about this is that this provides a supported, documented (in BOL) way to perform running totals.
My current favorite method of performing running totals is using the so-called “Quirky Update” / “Pseudo-Cursor” method. However, there is plenty of controversy surrounding the use of this method, primarily centered around the need for following a list of rules – rules that are not documented in BOL. (Please see this article for how to perform running totals using the Quirky Update method and for the rules necessary to achieve the proper results.)
Now that running totals are supported in “Denali” CTP3, I decided to test out the various methods of doing so. Using the environment setup and test code from the above article, I added the various methods of performing running totals available in “Denali” CTP3. (Code reprinted with permission of Jeff Moden.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 |
/* /************************************************************************************* Create the test table with a non-clustered Primary Key and a separate clustered index This code has been tested in SQL Server 2000 and 2005. *************************************************************************************/ --===== Do this testing in a nice, "safe" place that everyone has USE TempDB GO --===== If the test table already exists, drop it in case we need to rerun. -- The 3 part naming is overkill, but prevents accidents on real tables. IF OBJECT_ID('TempDB.dbo.TransactionDetail') IS NOT NULL DROP TABLE TempDB.dbo.TransactionDetail GO --===== Create the test table (TransactionDetail) with a NON clustered PK CREATE TABLE dbo.TransactionDetail ( TransactionDetailID INT IDENTITY(1,1), --Temporal "tie-breaker" Date DATETIME, AccountID INT, Amount MONEY, AccountRunningTotal MONEY, --Running total across each account AccountRunningCount INT, --Like "Rank" across each account NCID INT, --For "proof" later in the article CONSTRAINT PK_TransactionDetail_TransactionDetailID PRIMARY KEY NONCLUSTERED (TransactionDetailID) WITH FILLFACTOR = 100 ) GO --===== Add a clustered index that will easily cause page splits because -- of the randomized data being inserted. This index also represents -- the expected sort order of most of the code examples. CREATE CLUSTERED INDEX IXC_Transaction_AccountID_Date_TransactionDetailID ON dbo.TransactionDetail (AccountID, Date, TransactionDetailID) --===== Add a non-clustered index on the NCID column sorted in -- descending order. This is for some "proofs" later on -- in the article. CREATE NONCLUSTERED INDEX IX_Transaction_NCID ON dbo.TransactionDetail (NCID DESC) GO /************************************************************************************* Populate the table using a rather slow method but one that's sure to cause lots of Page splits and that will fragment the table with over 99% fragmentation. *************************************************************************************/ --===== Preset the environment for appearance and speed SET NOCOUNT ON --===== Populate the table in "segments" to force page splits. -- Normally this would NOT have a While loop in it. -- Because the While loop is there and page splits are happening, -- this takes a whopping 00:02:50 to create on my box. WHILE (ISNULL(IDENT_CURRENT('TransactionDetail'),0)) < 1000000 BEGIN INSERT INTO dbo.TransactionDetail (Date, AccountID, Amount) SELECT TOP 10000 --10 years worth of dates with times from 1/1/2000 to 12/31/2009 CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME) AS Date, --100 different account numbers ABS(CHECKSUM(NEWID()))%100+1, --Dollar amounts from -99.99 to + 99.99 CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY) FROM Master.dbo.SysColumns sc1 CROSS JOIN Master.dbo.SysColumns sc2 END --===== Update the NCID column to be the reverse of the TransactionDetailID column UPDATE dbo.TransactionDetail SET NCID = 1000000 - TransactionDetailID + 1 GO USE TEMPDB GO CREATE PROCEDURE dbo.Verify AS /************************************************************************************* Code to verify that the account running total calculation worked correctly. Please read the comments to see how it works. *************************************************************************************/ --===== Conditionally drop the verification table to make -- it easy to rerun the verification code IF OBJECT_ID('TempDB..#Verification') IS NOT NULL DROP TABLE dbo.#Verification --===== Define a variable to remember the number of rows -- copied to the verification table DECLARE @MyCount INT --===== Copy the data from the test table into the -- verification table in the correct order. -- Remember the correct order with an IDENTITY. SELECT IDENTITY(INT,1,1) AS RowNum, AccountID, Amount, AccountRunningTotal INTO #Verification FROM dbo.TransactionDetail ORDER BY AccountID, Date, TransactionDetailID --===== Remember the number of rows we just copied SELECT @MyCount = @@ROWCOUNT --===== Check the running total calculations SELECT CASE WHEN COUNT(hi.RowNum) + 1 = @MyCount THEN 'Account Running Total Calculations are correct' ELSE 'There are some errors in the Account Running Totals' END FROM #Verification lo INNER JOIN #Verification hi ON lo.RowNum + 1 = hi.RowNum WHERE (-- Compare lines with the same AccountID hi.AccountID = lo.AccountID AND hi.AccountRunningTotal = lo.AccountRunningTotal + hi.Amount) OR (-- First line of account has running total same as amount hi.AccountID <> lo.AccountID AND hi.AccountRunningTotal = hi.Amount) GO /************************************************************************************** This stored procedure will clear the calculated columns in the test table without disturbing the randomized data in the table so that we can repeat tests and use different methods without changing the test data. **************************************************************************************/ USE TEMPDB GO CREATE PROCEDURE dbo.ResetTestTable AS UPDATE dbo.TransactionDetail SET AccountRunningTotal = NULL, AccountRunningCount = NULL GO */ /************************************************************************************* Pseduo-cursor update using the "Quirky Update" to calculate both Running Totals and a Running Count that start over for each AccountID. Takes 24 seconds with the INDEX(0) hint and 6 seconds without it on my box. *************************************************************************************/ --===== Supress the auto-display of rowcounts for speed an appearance SET NOCOUNT ON -- Quirky Update running totals --===== Declare the working variables DECLARE @PrevAccountID INT DECLARE @AccountRunningTotal MONEY DECLARE @AccountRunningCount INT --===== Update the running total and running count for this row using the "Quirky -- Update" and a "Pseudo-cursor". The order of the UPDATE is controlled by the -- order of the clustered index. UPDATE dbo.TransactionDetail SET @AccountRunningTotal = AccountRunningTotal = CASE WHEN AccountID = @PrevAccountID THEN @AccountRunningTotal + Amount ELSE Amount END, @AccountRunningCount = AccountRunningCount = CASE WHEN AccountID = @PrevAccountID THEN @AccountRunningCount + 1 ELSE 1 END, @PrevAccountID = AccountID FROM dbo.TransactionDetail WITH (TABLOCKX) OPTION (MAXDOP 1) GO EXEC dbo.Verify GO EXEC dbo.ResetTestTable GO -- DENALI CTP3 Running Totals using DEFAULT RANGE option ; WITH cte AS ( SELECT AccountRunningTotal, AccountRunningCount, ART = SUM(Amount) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID), ARC = SUM(1) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID) FROM dbo.TransactionDetail ) UPDATE cte SET AccountRunningTotal = ART, AccountRunningCount = ARC GO EXEC dbo.Verify GO EXEC dbo.ResetTestTable GO -- DENALI CTP3 Running Totals subquery using DEFAULT RANGE option -- this is the same as the cte method ; UPDATE target SET AccountRunningTotal = ART, AccountRunningCount = ARC FROM (SELECT AccountRunningTotal, AccountRunningCount, ART = SUM(Amount) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID), ARC = SUM(1) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID) FROM dbo.TransactionDetail ) target GO EXEC dbo.Verify GO EXEC dbo.ResetTestTable GO -- DENALI CTP3 Running Totals with self-join using DEFAULT RANGE option. ; WITH cte AS ( SELECT TransactionDetailID, [Date], AccountID, Amount, AccountRunningTotal = SUM(Amount) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID), AccountRunningCount = SUM(1) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID), NCID FROM dbo.TransactionDetail ) UPDATE td SET AccountRunningTotal = cte.AccountRunningTotal, AccountRunningCount = cte.AccountRunningCount FROM dbo.TransactionDetail td JOIN cte ON cte.TransactionDetailID = td.TransactionDetailID GO EXEC dbo.Verify GO EXEC dbo.ResetTestTable GO -- DENALI CTP3 Running Totals using ROWS UNBOUNDED PRECEDING ; WITH cte AS ( SELECT AccountRunningTotal, AccountRunningCount, ART = SUM(Amount) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID ROWS UNBOUNDED PRECEDING), ARC = SUM(1) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID ROWS UNBOUNDED PRECEDING) FROM dbo.TransactionDetail ) UPDATE cte SET AccountRunningTotal = ART, AccountRunningCount = ARC GO EXEC dbo.Verify GO EXEC dbo.ResetTestTable GO -- DENALI CTP3 Running Totals using RANGE UNBOUNDED PRECEDING ; WITH cte AS ( SELECT AccountRunningTotal, AccountRunningCount, ART = SUM(Amount) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID RANGE UNBOUNDED PRECEDING), ARC = SUM(1) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID RANGE UNBOUNDED PRECEDING) FROM dbo.TransactionDetail ) UPDATE cte SET AccountRunningTotal = ART, AccountRunningCount = ARC GO EXEC dbo.Verify GO EXEC dbo.ResetTestTable GO -- DENALI CTP3 Running Totals using ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ; WITH cte AS ( SELECT AccountRunningTotal, AccountRunningCount, ART = SUM(Amount) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), ARC = SUM(1) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM dbo.TransactionDetail ) UPDATE cte SET AccountRunningTotal = ART, AccountRunningCount = ARC GO EXEC dbo.Verify GO EXEC dbo.ResetTestTable GO -- DENALI CTP3 Running Totals using RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ; WITH cte AS ( SELECT AccountRunningTotal, AccountRunningCount, ART = SUM(Amount) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), ARC = SUM(1) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM dbo.TransactionDetail ) UPDATE cte SET AccountRunningTotal = ART, AccountRunningCount = ARC GO EXEC dbo.Verify GO EXEC dbo.ResetTestTable GO -- DENALI CTP3 Running Totals using ROWS BETWEEN 2147483647 PRECEDING AND CURRENT ROW ; WITH cte AS ( SELECT AccountRunningTotal, AccountRunningCount, ART = SUM(Amount) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID ROWS BETWEEN 2147483647 PRECEDING AND CURRENT ROW), ARC = SUM(1) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID ROWS BETWEEN 2147483647 PRECEDING AND CURRENT ROW) FROM dbo.TransactionDetail ) UPDATE cte SET AccountRunningTotal = ART, AccountRunningCount = ARC GO EXEC dbo.Verify GO |
And the results, as shown in Profiler, are:
(Click the image to view the full Profiler screen.)
When performing the running totals with the new OVER clause enhancements in “Denali”, it has 3 million to 18 million reads, vs. 8 thousand for the Quirky Update. This difference is reflected in the time each takes: the OVER clause enhancements take 22 – 75 seconds to perform the running totals, while the Quirky Update performs it in 3.5 seconds.
Let’s hope that all of this is due to it being in a beta state and that it will improve when it reaches RTM. When it does reach the RTM status, I’ll test again to decide which method I’ll use. My impression right now is that it looks like I’ll be keeping the Quirky Update in my toolbag for a long time when performing running totals.
Update: 2014-01-30
Well, it’s been a while since SQL 2012 went to RTM, so I finally decided to rerun this performance test. I’ve waited so long, it’s actually running on SQL 2012 SP1. (And, my hardware has been upgraded, which you will notice with the improved times). The 2012 performance results are:
TextData | CPU | Reads | Writes | Duration | Rank by Duration |
---|---|---|---|---|---|
Quirky Update running totals | 1,623 | 8,764 | 0 | 1,675 | 1 |
SQL 2012 SP1 running totals using DEFAULT RANGE option | 24,773 | 17,943,842 | 4,005 | 26,018 | 7 |
SQL 2012 SP1 running totals subquery using DEFAULT RANGE option | 25,334 | 17,943,846 | 4,013 | 26,556 | 8 |
SQL 2012 SP1 running totals with self-join using DEFAULT RANGE option | 23,289 | 12,053,268 | 4,159 | 17,226 | 4 |
SQL 2012 SP1 running totals using ROWS UNBOUNDED PRECEDING | 13,120 | 8,902,436 | 0 | 13,714 | 2 |
SQL 2012 SP1 running totals using RANGE UNBOUNDED PRECEDING | 24,897 | 17,943,908 | 4,125 | 25,945 | 6 |
SQL 2012 SP1 running totals using ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | 13,292 | 8,902,440 | 0 | 13,930 | 3 |
SQL 2012 SP1 running totals using RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | 24,461 | 17,943,851 | 4,056 | 25,899 | 5 |
SQL 2012 SP1 Running Totals using ROWS BETWEEN 2147483647 PRECEDING AND CURRENT ROW | 27,971 | 17,882,461 | 10,224 | 29,210 | 9 |
As you can see, the Quirky Update still blows the other methods away. However, the OVER() clause enhancements are still way better than a cursor, or even a self-join. So, my recommendation is to use the OVER() clause, unless you need the extra performance boost of the Quirky Update. This is based solely upon the need to follow undocumented rules for the Quirky Update. Testing will reveal which method you need to use.
Something that you may notice in the results: all of these tests are updating a table with the running total. So how is it that the Quirky Update and the next two fastest means have zero writes? I’ve noticed this before and I filed a connect item about it, which Microsoft has decided that they won’t fix (not enough people affected). If you would like to see this fixed, please go vote for this connect item.