In my previous post, I introduced you to the Advent of Code programming challenge, and explained how I’m going through the days, solving both parts of each puzzle. Continuing on, today we have day 5. Since you can only see this if you are up to this level in the challenge, let me explain it to you. In part one of this puzzle, Santa needs to figure out how many of the strings in his text file are “nice” strings. The text file to use can be found here. The string is a “nice” string if it contains at least 3 vowels, has any letter that appears twice consecutively, and it does not contain the strings “ab”, “cd”, “pq” or “xy”.
Once again, this list exceeds the limitations of my favorite string splitter function, so I’m going to use the alternate manner I used earlier to load it:
1 2 3 4 5 6 7 8 9 |
DECLARE @InputText VARCHAR(MAX); SELECT @InputText = REPLACE(FileText, CHAR(10), '') FROM OPENROWSET(BULK 'D:\AdventOfCode\Day05.txt', SINGLE_CLOB) UselessAlias(FileText) DECLARE @TotalLength INTEGER = LEN(@InputText), @LengthNoCR INTEGER = LEN(REPLACE(@InputText, CHAR(13), '')); DECLARE @TotalRows INTEGER = @TotalLength - @LengthNoCR; DECLARE @SizePerRow INTEGER = @TotalLength / @TotalRows; DECLARE @CRPosition INTEGER = CHARINDEX(CHAR(13), @InputText, @TotalLength / 2); |
The list (in the @InputText variable) actually contains 1000 lines. I’m declaring a few variables to determine how many rows there are, and the number of characters per row. In the next phase, I’m once again using a virtual tally table to split this string. I then check each line for whether it contains any of those conditions by comparing the length of the string to the string after the characters to search for have been removed, and finally I get a count of the number that meet the nice criteria.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
WITH Tens (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), Hundreds(N) AS (SELECT 1 FROM Tens t1, Tens t2), Millions(N) AS (SELECT 1 FROM Hundreds t1, Hundreds t2, Hundreds t3), Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions) , cte AS ( SELECT TOP (@TotalRows) N, ca.LineText, -- if difference in length between the original string and after all vowels have been removed is >= 3, then it has enough vowels. CASE WHEN ca2.LineTextLen-3 >= LEN(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(ca.LineText, 'a', ''), 'e', ''), 'i', ''), 'o', ''), 'u', '')) THEN 1 ELSE 0 END AS Has3Vowels, -- if there is a difference in length between the original string and after the non-allowed strings have been removed, then it has the non-allowed strings. CASE WHEN ca2.LineTextLen > LEN(REPLACE(REPLACE(REPLACE(REPLACE(ca.LineText, 'ab', ''), 'cd', ''), 'pq', ''), 'xy', '')) THEN 1 ELSE 0 END AS HasNotAllowedStrings, -- if there is a difference in length between the original string and after pairs of letters have been removed, then there is at least one pair of letters present. CASE WHEN ca2.LineTextLen > LEN(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(ca.LineText, 'aa', ''), 'bb', ''), 'cc', ''), 'dd', ''), 'ee', ''), 'ff', ''), 'gg', ''), 'hh', ''), 'ii', ''), 'jj', ''), 'kk', ''), 'll', ''), 'mm', ''), 'nn', ''), 'oo', ''), 'pp', ''), 'qq', ''), 'rr', ''), 'ss', ''), 'tt', ''), 'uu', ''), 'vv', ''), 'ww', ''), 'xx', ''), 'yy', ''), 'zz', '')) THEN 1 ELSE 0 END AS Has2Consecutive FROM Tally -- split the string into the lenghts for each item CROSS APPLY (SELECT SUBSTRING(@InputText, (N*@SizePerRow)-@SizePerRow+1, @SizePerRow)) ca(LineText) -- and get the length of it, so that the calculation is only performed once instead of three times. CROSS APPLY (SELECT LEN(ca.LineText)) ca2(LineTextLen) ) SELECT COUNT(*) AS IsNice FROM cte WHERE Has3Vowels = 1 AND HasNotAllowedStrings = 0 AND cte.Has2Consecutive = 1; |
After looking at this list, Santa realizes that the nice criteria isn’t valid, so some new criteria comes out. A “nice” string now must contain a pair of letters that appears in a row twice without overlapping, and a letter that repeats in the string with exactly one letter between those two repeating letters. In SQL, this is just pattern matching, and these can be performed with the LIKE operator. The hard part here is getting all of the valid pattern matching strings. I do this with a virtual tally table to get all of the letters of the alphabet, and then performing a set of queries to get all of these combinations (including the necessary wildcard characters to make the complete pattern to match). They are inserted into a table along with a number to identify which set of matching criteria that they belong to:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
-- make a table to hold all of the allowed combinations DECLARE @Nice TABLE (SetNbr INTEGER, Allowed VARCHAR(10)); WITH Tens (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), Hundreds(N) AS (SELECT 1 FROM Tens t1, Tens t2), Millions(N) AS (SELECT 1 FROM Hundreds t1, Hundreds t2, Hundreds t3), Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions) , Letters AS ( SELECT TOP 26 CHAR(96+N) AS Letter FROM Tally ) -- fill the table with pattern matches. -- first, use "%A_A%" - there is one character between the letter and itself (the "_" specifies one character, and the "%" on both ends allows this pattern to be anywhere in the string INSERT INTO @Nice (SetNbr, Allowed) SELECT 1, '%' + Letter + '_' + Letter + '%' FROM Letters UNION ALL -- and then cross join the letters with itself to get all combinations of letters. Surround this with "%" to allow there to be any # of characters on either side, and between the pair combinations. SELECT 2, '%' + a.Letter + b.Letter + '%' + a.Letter + b.Letter + '%' FROM Letters a CROSS JOIN Letters b; |
At this point, all that is needed is to join the tables of lines to this table twice (once for each matching condition) using the LIKE operator, and to get a count of the number of distinct strings that are nice:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
WITH Tens (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), Hundreds(N) AS (SELECT 1 FROM Tens t1, Tens t2), Millions(N) AS (SELECT 1 FROM Hundreds t1, Hundreds t2, Hundreds t3), Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions) , cte AS ( SELECT TOP (@TotalRows) N, ca.LineText FROM Tally -- split the string into the lenghts for each item CROSS APPLY (SELECT SUBSTRING(@InputText, (N*@SizePerRow)-@SizePerRow+1, @SizePerRow)) ca(LineText) -- and get the length of it, so that the calculation is only performed once instead of three times. CROSS APPLY (SELECT LEN(ca.LineText)) ca2(LineTextLen) ) -- get a count of the distinct LineText SELECT COUNT (DISTINCT cte.LineText) AS IsNice FROM cte -- where there is a match on one of the pattern combinations in set 1 JOIN @Nice N1 ON cte.LineText LIKE N1.Allowed AND N1.SetNbr = 1 -- and in set 2 JOIN @Nice N2 ON cte.LineText LIKE N2.Allowed AND N2.SetNbr = 2; |
The distinct is in there since the matching patterns can produce multiple rows for a single line.