Advent of Code 2018 – Day 3
As I explained in a recent post, I’m participating in this year’s Advent of Code challenge, with the twist of doing the challenges in T-SQL. In case you don’t know what “Advent of Code” is, Eric Wastl (t) created it for the purpose of, as Eric describes it:
Advent of Code is a series of small programming puzzles for a variety of skill levels. They are self-contained and are just as appropriate for an expert who wants to stay sharp as they are for a beginner who is just learning to code. Each puzzle calls upon different skills and has two parts that build on a theme.
The full explanation of “Advent of Code” is at //adventofcode.com/2018/about. You can see my other posts in the “Advent of Code” category.
2018 Advent of Code recap
We started of with someone going back in time and altering Santa’s past. You went back in time to correct it, but the device to fix the problems hadn’t been calibrated before sending you on your way. You calibrated the device on Day 1. On day 2, we located the boxes in the warehouse where the new (500 years ago) Santa suit material is.
Day 3, Part 1 – Loading the file
Now that the new suit material has been found, all the elves are trying to make suits. They’ve submitted claims for different pieces of the material (the list of claims is the input file). The list being processed looks like this:
The first like breaks down to:
- Claim #1
- 185″ from the left edge
- 501″ from the top edge
- 17″ wide
- 15″ tall
The task is to find the number of square inches of fabric that overlap with multiple claims. The piece of fabric is 1000 square inches. The first thing to do is to load the file and separate out all the values:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
SELECT ca4.claimId, ca4.pLeft, ca4.pTop, ca4.pWidth, ca4.pHeight FROM OPENROWSET(BULK 'D:\AdventOfCode\2018\Input\Day03.txt', SINGLE_CLOB) dt(FileData) CROSS APPLY dbo.Split(dt.FileData, CHAR(13)) ca1 CROSS APPLY (VALUES (REPLACE(ca1.Item, CHAR(10), ''))) ca2(Item) CROSS APPLY (VALUES (CHARINDEX('#', ca2.Item), -- start of claim ID CHARINDEX(' @', ca2.Item), -- starting position of rectangle CHARINDEX(': ', ca2.Item), -- size of rectangle CHARINDEX(',', ca2.Item), -- dimension split of starting position CHARINDEX('x', ca2.Item) -- split of rectangle size )) ca3(posClaimId, posRectStart, posRectSize, posStartSplit, posSizeSplit) CROSS APPLY (VALUES (CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posClaimId+1, ca3.posRectStart - ca3.posClaimId - 1)), -- claimID CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posRectStart + 3, ca3.posStartSplit - ca3.posRectStart - 3)), -- left CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posStartSplit + 1, ca3.posRectSize - ca3.posStartSplit - 1)), -- top CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posRectSize + 2, ca3.posSizeSplit - ca3.posRectSize - 2)), -- width CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posSizeSplit+1, LEN(ca2.Item)) ) -- height )) ca4(claimId, pLeft, pTop, pWidth, pHeight) |
As I’ve done with the other puzzles, I used OPENROWSET to load the file. This loads the file as one long string, so I use the SPLIT function to break this down into rows (split on CR). Next, I remove the LF from the string, and then find the starting positions of the various delimiters. Finally, I use the SUBSTRING function with all of those positions to extract the data. This query returns a result set looking like:
Day 3, Part 1 – Processing the data
My first thought was to create a temporary table, 1000 columns by 1000 rows. I would increment the appropriate columns / rows for each part that each claim used. As I thought about the work involved in accomplishing this, I decided that this wasn’t a feasible idea. There would be a lot of dynamically generated code to update the data. Additionally, analyzing the results would be a real pain. I need to come up with another idea.
My next thought was to use a table with three columns. One for the X axis, one for the Y axis, and the third is the counter. I added 1,000,000 rows, to handle 1,000 rows by 1,000 columns. It would be straight-forward to both update the coordinates being used, and to analyze the data. This table looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
IF OBJECT_ID('tempdb.dbo.#Fabric') IS NOT NULL DROP TABLE #Fabric; CREATE TABLE #Fabric ( X INTEGER, Y INTEGER, Z INTEGER, CONSTRAINT PK_Fabric PRIMARY KEY CLUSTERED (X,Y)); WITH Tens (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), Hundreds(N) AS (SELECT 1 FROM Tens t1, Tens t2), Millions(N) AS (SELECT 1 FROM Hundreds t1, Hundreds t2, Hundreds t3), Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions) INSERT INTO #Fabric (X, Y, Z) SELECT t1.N AS X, t2.N AS Y, 0 AS Z FROM Tally t1 -- for the horizontal (X) axis CROSS JOIN Tally t2 -- for the vertical (Y) axis WHERE t1.N <= 1000 AND t2.N <= 1000; |
This table looks like:
However, this would require updating the coordinates claim by claim. If I were to do an update of all the claims at once, each row would only be updated (incremented) one time, and not once per claim as I needed. What I need is a way to count how many times the elves use each coordinate.
Day 3, Part 1 – The final way to process the data
One way to overcome the single update issue is to use a cursor to perform updates claim by claim. I don’t like to use cursors unless there is no other way to do things – they cause a lot of performance issues. I get paid to fix performance issues… not to create them.
The method that I decided to go with is to join the table load query with two tally tables, one for the X-axis and one for the Y-axis. This lets me get the full range of values involved in each axis. I then just group on the X and Y axis, and do a count. This is all put into a sub query, and the outer query performs a count to get the total number of coordinates (square inches) being used by more than one claim. This query looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
SELECT COUNT(*) AS Part1Answer FROM ( SELECT x.N AS X, y.N AS Y, COUNT(*) AS Z FROM OPENROWSET(BULK 'D:\AdventOfCode\2018\Input\Day03.txt', SINGLE_CLOB) dt(FileData) CROSS APPLY dbo.Split(dt.FileData, CHAR(13)) ca1 CROSS APPLY (VALUES (REPLACE(ca1.Item, CHAR(10), ''))) ca2(Item) CROSS APPLY (VALUES (CHARINDEX('#', ca2.Item), -- start of claim ID CHARINDEX(' @', ca2.Item), -- starting position of rectangle CHARINDEX(': ', ca2.Item), -- size of rectangle CHARINDEX(',', ca2.Item), -- dimension split of starting position CHARINDEX('x', ca2.Item) -- split of rectangle size )) ca3(posClaimId, posRectStart, posRectSize, posStartSplit, posSizeSplit) CROSS APPLY (VALUES (CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posClaimId+1, ca3.posRectStart - ca3.posClaimId - 1)), -- claimID CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posRectStart + 3, ca3.posStartSplit - ca3.posRectStart - 3)), -- left CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posStartSplit + 1, ca3.posRectSize - ca3.posStartSplit - 1)), -- top CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posRectSize + 2, ca3.posSizeSplit - ca3.posRectSize - 2)), -- width CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posSizeSplit+1, LEN(ca2.Item)) ) -- height )) ca4(claimId, pLeft, pTop, pWidth, pHeight) JOIN dbo.Tally x ON x.N BETWEEN ca4.pLeft AND ca4.pLeft + ca4.pWidth - 1 JOIN dbo.Tally y ON y.N BETWEEN ca4.pTop AND ca4.pTop + ca4.pHeight - 1 GROUP BY x.N, y.N ) RequiredAlias WHERE Z > 1 |
Two notes to make: First, the results of the sub-query are the same as the used part of the proposed temporary table.
Secondly, I’m not joining to a virtual tally table here – this is a physical tally table. This is for performance reasons – I need an index on the tally table to quickly do these range searches. I created the physical tally table with this code:
1 2 3 4 5 6 7 8 9 10 |
CREATE TABLE dbo.Tally (N INTEGER CONSTRAINT PK_Tally PRIMARY KEY); WITH Tens (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), Hundreds(N) AS (SELECT 1 FROM Tens t1 CROSS JOIN Tens t2), Millions(N) AS (SELECT 1 FROM Hundreds t1 CROSS JOIN Hundreds t2 CROSS JOIN Hundreds t3), Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions) INSERT INTO dbo.Tally (N) SELECT N FROM Tally; |
Yes, I use a virtual tally table to create the physical tally table.
Day 3, Part 2
With the correct answer entered in for Part 1, Part 2 unlocks. For this, we need to find out the claims that do not overlap with any other claims.
My approach to this part of the puzzle is to look at all the coordinates of a claim, and to add up their usage using the SUM function. If the SUM is equal to the area of the claim (height x width), this all the coordinates are not shared. This requires getting the actual usage (I can use most of the above sub-query), and then adding this up for each claim.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
WITH cte AS ( SELECT ca4.claimId, ca4.pLeft, ca4.pTop, ca4.pWidth, ca4.pHeight FROM OPENROWSET(BULK 'D:\AdventOfCode\2018\Input\Day03.txt', SINGLE_CLOB) dt(FileData) CROSS APPLY dbo.Split(dt.FileData, CHAR(13)) ca1 CROSS APPLY (VALUES (REPLACE(ca1.Item, CHAR(10), ''))) ca2(Item) CROSS APPLY (VALUES (CHARINDEX('#', ca2.Item), -- start of claim ID CHARINDEX(' @', ca2.Item), -- starting position of rectangle CHARINDEX(': ', ca2.Item), -- size of rectangle CHARINDEX(',', ca2.Item), -- dimension split of starting position CHARINDEX('x', ca2.Item) -- split of rectangle size )) ca3(posClaimId, posRectStart, posRectSize, posStartSplit, posSizeSplit) CROSS APPLY (VALUES (CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posClaimId+1, ca3.posRectStart - ca3.posClaimId - 1)), -- claimID CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posRectStart + 3, ca3.posStartSplit - ca3.posRectStart - 3)), -- left CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posStartSplit + 1, ca3.posRectSize - ca3.posStartSplit - 1)), -- top CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posRectSize + 2, ca3.posSizeSplit - ca3.posRectSize - 2)), -- width CONVERT(INTEGER, SUBSTRING(ca2.Item, ca3.posSizeSplit+1, LEN(ca2.Item)) ) -- height )) ca4(claimId, pLeft, pTop, pWidth, pHeight) ) SELECT c.claimId, c.pWidth, c.pHeight, SUM(F.Z) AS Area FROM cte c JOIN ( SELECT x.N AS X, y.N AS Y, COUNT(*) AS Z FROM cte JOIN dbo.Tally x ON x.N BETWEEN cte.pLeft AND cte.pLeft + cte.pWidth - 1 JOIN dbo.Tally y ON y.N BETWEEN cte.pTop AND cte.pTop + cte.pHeight - 1 GROUP BY x.N, y.N ) F ON F.X BETWEEN c.pLeft AND c.pLeft + c.pWidth - 1 AND F.Y BETWEEN c.pTop AND c.pTop + c.pHeight - 1 GROUP BY c.claimId, c.pWidth, c.pHeight HAVING SUM(F.Z) = c.pWidth * c.pHeight; |
This returns the only claim with no overlapping coordinates.
Summary
And here we have a T-SQL solution for Day 3 of the Advent of Code challenge. The key tasks that we can learn from today are:
- Loading a file.
- Split a string on a delimiter.
- Assigning a sequential number to a set of rows in a specific order.
- Construction of a physical TALLY table.
- Common Table Expressions (CTE)