In my previous post, I introduced you to the Advent of Code programming challenge, and explained how I’m going through the days, solving both parts of each puzzle. Continuing on, today we have day 10. In this challenge, Santa’s elves are playing look-and-say. Never heard of it before? Well, let me explain it to you.
1
How many of what do you see there? There is one number 1. To represent this, you would have 11.
Now repeat this. What you have now is two ones, represented as 21.
When you repeat this again, you have one two and one one, which is represented as 1211.
Repeated again, you end up with 111221.
Then 312211.
Then 13112221.
Do you get it? Okay then. Part one of the puzzle is, given a specific input string, what is the length of the 40th result? The input string provided to me is 1113222113. The first time processing this number, I need to get 3113322113.
Analyzing these requirements, I need to parse through the string, counting the number of times a number is in the string until the number changes. This process needs to repeat until the end of the string. Since counting is an aggregate function, if I can make this string into a table of the numbers, I should be able to perform the COUNT function against this, and to group by the particular group that the number is in. To convert this string into a table of the numbers, I’ll again use the virtual tally table:
1 2 3 4 5 6 7 8 9 10 11 |
DECLARE @Input VARCHAR(MAX) = '1113222113'; WITH Tens (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), Hundreds(N) AS (SELECT 1 FROM Tens t1, Tens t2), Millions(N) AS (SELECT 1 FROM Hundreds t1, Hundreds t2, Hundreds t3), Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions) -- Get each character from the string SELECT TOP (LEN(@Input)) N , CONVERT(INTEGER, SUBSTRING(@Input, N, 1)) AS Num FROM Tally |
This produces the following results, where N is the position in the string and Item is the number at that position:
Now what I need to do is to group by the sequentially identical numbers, then count them. Essentially, I need to create data islands for each sequentially identical number. Most data islands are used to identify sequentially incrementing numbers, but this is not what is happening here. To accomplish this challenge, I tried to use a simple grouping, but I just wasn’t able to come up with a way to do this. Therefore, I’m going to go back to the quirky update, which I talked about in a previous post. As a reminder, there are undocumented rules to follow when using this format, and a complete explanation of how this works and those rules can be found at //www.sqlservercentral.com/articles/T-SQL/68467/.
I’m going to start off by creating a table that will hold the above results:
1 2 3 4 |
DECLARE @temp TABLE ( RowID INTEGER PRIMARY KEY CLUSTERED, Item INTEGER, Grp INTEGER); |
And then insert this insert line between lines 8 and 9 of the “Extract numbers into table” code block above:
1 |
INSERT INTO @temp (RowID, Item) |
Now that I have this in a table that I can work with, and a column to assign the grouping to, let’s now perform the quirky update:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
DECLARE @RowID INTEGER, @Grp INTEGER = 0, @LastItem INTEGER; WITH cte AS ( -- get the row number to ensure being updated in the proper order SELECT RowID, Item, Grp, ROW_NUMBER() OVER (ORDER BY RowID) AS SafetyCheck FROM @temp ) UPDATE cte -- if the RowID <> the calculated sequence, then abort with a divide by zero error SET @RowID = CASE WHEN RowID = SafetyCheck THEN RowID ELSE 1/0 END, -- calculate the group - the next number - when the Num changes @Grp = CASE WHEN Item = @LastItem THEN @Grp ELSE @Grp + 1 END, -- assign the group Grp = @Grp, -- carry over the last item for the @Grp calculation for the next row @LastItem = Item FROM cte OPTION (MAXDOP 1); -- prevent parallelism |
This will now produce the following results:
Here we can see that each Item is grouped by a unique Grp value. As the Item changes, the Grp changes. If the Item changes back to one of the Items already used, the Grp value still increments to the next number, instead of using the previous value.
At this point, we just need to get the Item and the count for each grp, and then put them back into a string to be displayed:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
WITH cte AS ( -- get the item and number of this item per group SELECT Grp, MAX(Item) AS Item, COUNT(*) AS Counter FROM @temp GROUP BY Grp ) -- insert the results into the functions output table variable SELECT ( -- use FOR XML to make the string of all of the records SELECT CONVERT(VARCHAR(15), Counter) + CONVERT(VARCHAR(15), Item) FROM cte FOR XML PATH(''), TYPE).value('.', 'varchar(max)') |
Please read my article at //www.sqlservercentral.com/articles/comma+separated+list/71700/ for how this is building the string in what I believe is the fastest possible way in SQL to create a list from a record set.
This code is returning the string “3113322113”. This is exactly what we are looking for. Now we just need to repeat this 40 times, where each time this calls upon the previous result. This sounds like a good job for a recursive common table expression (cte), so let me start off by putting all of the above code into a multi-statement table-valued function to make it easier to call from the cte:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
CREATE FUNCTION dbo.Day10MSTVF (@Input VARCHAR(MAX)) RETURNS @Output TABLE (OutputData VARCHAR(MAX)) AS BEGIN -- Create a table variable to hold the individual numbers from the input string DECLARE @temp TABLE ( RowID INTEGER PRIMARY KEY CLUSTERED, Item INTEGER, Grp INTEGER); -- virtual tally table WITH Tens (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), Hundreds(N) AS (SELECT 1 FROM Tens t1, Tens t2), Millions(N) AS (SELECT 1 FROM Hundreds t1, Hundreds t2, Hundreds t3), Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions) INSERT INTO @temp (RowID, Item) -- Get each character from the string SELECT TOP (LEN(@Input)) N , CONVERT(INTEGER, SUBSTRING(@Input, N, 1)) AS Num FROM Tally ORDER BY N; -- declare and initialize variables for the quirky update DECLARE @RowID INTEGER, @Grp INTEGER = 0, @LastItem INTEGER; WITH cte AS ( -- get the row number to ensure being updated in the proper order SELECT RowID, Item, Grp, ROW_NUMBER() OVER (ORDER BY RowID) AS SafetyCheck FROM @temp ) UPDATE cte -- if the RowID <> the calculated sequence, then abort with a divide by zero error SET @RowID = CASE WHEN RowID = SafetyCheck THEN RowID ELSE 1/0 END, -- calculate the group - the next number - when the Num changes @Grp = CASE WHEN Item = @LastItem THEN @Grp ELSE @Grp + 1 END, -- assign the group Grp = @Grp, -- carry over the last item for the @Grp calculation for the next row @LastItem = Item FROM cte OPTION (MAXDOP 1); -- prevent parallelism WITH cte AS ( -- get the item and number of this item per group SELECT Grp, MAX(Item) AS Item, COUNT(*) AS Counter FROM @temp GROUP BY Grp ) -- insert the results into the functions output table variable INSERT INTO @Output (OutputData) SELECT ( -- use FOR XML to make the string of all of the records SELECT CONVERT(VARCHAR(15), Counter) + CONVERT(VARCHAR(15), Item) FROM cte FOR XML PATH(''), TYPE).value('.', 'varchar(max)') RETURN; END; |
Now I can call this with a recursive cte. Remember that we are looking for the length of the string, so I’ll get that also:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
WITH cte AS ( -- use a recursive cte to run the number of desired processes SELECT 1 AS Lvl, OutputData FROM dbo.Day10MSTVF('1113222113') UNION ALL SELECT Lvl + 1, ca.OutputData FROM cte CROSS APPLY dbo.Day10MSTVF(cte.OutputData) ca ) SELECT Lvl, LEN(cte.OutputData), cte.OutputData FROM cte WHERE Lvl <= 40 OPTION (MAXRECURSION 40); -- prevent going further than necessary |
In a recursive cte, the first part of the cte (lines 4-5) gathers an initial result set, and provides the “anchor” member. It is then combined with a “recursive” member through the UNION ALL operator. The recursive member must reference the anchor member, which can be seen above where the recursive member is selecting from the anchor member. Additionally, I’m provided a column to show the level of recursion that is occurring; in the anchor member it is initialized to 1, and in the recursive member it is incremented. Finally, I just select out the information from this cte. I use the maxrecursion option here to prevent this code from running any longer than necessary. And the results that I get from this code for level 40 is a final length of 252594. Wow, that quickly built up a pretty long string.
For Part 2, it specifies to expand this to get the length at the 50th level. All that I need to do is to change the last two lines (13-14) by replacing the 40 with 50. When I run this, at level 46 I get a length of 1,239,364. If you look in the “Extract numbers into table” module, the virtual tally table will only generate numbers up to 1,000,000, which means that this isn’t enough to handle the string. I expanded this virtual tally table to handle 100 million rows by changing line 7 to:
1 |
Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions, Hundreds) |
And when this is run, I can get my final answer of over 3.5 million. And that is a long string. I wonder if Santa’s elves can do all of this in their heads?