Advent of Code 2018 – Day 3

As I explained in a recent post, I’m participating in this year’s Advent of Code challenge, with the twist of doing the challenges in T-SQL. In case you don’t know what “Advent of Code” is, Eric Wastl (t) created it for the purpose of, as Eric describes it:

Advent of Code is a series of small programming puzzles for a variety of skill levels. They are self-contained and are just as appropriate for an expert who wants to stay sharp as they are for a beginner who is just learning to code. Each puzzle calls upon different skills and has two parts that build on a theme.

The full explanation of “Advent of Code” is at //adventofcode.com/2018/about. You can see my other posts in the “Advent of Code” category.

2018 Advent of Code recap

We started of with someone going back in time and altering Santa’s past. You went back in time to correct it, but the device to fix the problems hadn’t been calibrated before sending you on your way. You calibrated the device on Day 1. On day 2, we located the boxes in the warehouse where the new (500 years ago) Santa suit material is.

Day 3, Part 1 – Loading the file

Now that the new suit material has been found, all the elves are trying to make suits. They’ve submitted claims for different pieces of the material (the list of claims is the input file). The list being processed looks like this:

Day 3 input file

Day 3 input file

The first like breaks down to:

  • Claim #1
  • 185″ from the left edge
  • 501″ from the top edge
  • 17″ wide
  • 15″ tall

The task is to find the number of square inches of fabric that overlap with multiple claims. The piece of fabric is 1000 square inches. The first thing to do is to load the file and separate out all the values:

As I’ve done with the other puzzles, I used OPENROWSET to load the file. This loads the file as one long string, so I use the SPLIT function to break this down into rows (split on CR). Next, I remove the LF from the string, and then find the starting positions of the various delimiters. Finally, I use the SUBSTRING function with all of those positions to extract the data. This query returns a result set looking like:

Extracted data from input file

Extracted data from input file

Day 3, Part 1 – Processing the data

My first thought was to create a temporary table, 1000 columns by 1000 rows. I would increment the appropriate columns / rows for each part that each claim used. As I thought about the work involved in accomplishing this, I decided that this wasn’t a feasible idea. There would be a lot of dynamically generated code to update the data. Additionally, analyzing the results would be a real pain. I need to come up with another idea.

My next thought was to use a table with three columns. One for the X axis, one for the Y axis, and the third is the counter. I added 1,000,000 rows, to handle 1,000 rows by 1,000 columns. It would be straight-forward to both update the coordinates being used, and to analyze the data. This table looks like this:

This table looks like:

work table

Work Table

However, this would require updating the coordinates claim by claim. If I were to do an update of all the claims at once, each row would only be updated (incremented) one time, and not once per claim as I needed. What I need is a way to count how many times the elves use each coordinate.

Day 3, Part 1 – The final way to process the data

One way to overcome the single update issue is to use a cursor to perform updates claim by claim. I don’t like to use cursors unless there is no other way to do things – they cause a lot of performance issues. I get paid to fix performance issues… not to create them.

The method that I decided to go with is to join the table load query with two tally tables, one for the X-axis and one for the Y-axis. This lets me get the full range of values involved in each axis. I then just group on the X and Y axis, and do a count. This is all put into a sub query, and the outer query performs a count to get the total number of coordinates (square inches) being used by more than one claim. This query looks like this:

Two notes to make: First, the results of the sub-query are the same as the used part of the proposed temporary table.

Secondly, I’m not joining to a virtual tally table here – this is a physical tally table. This is for performance reasons – I need an index on the tally table to quickly do these range searches. I created the physical tally table with this code:

Yes, I use a virtual tally table to create the physical tally table.

Day 3, Part 2

With the correct answer entered in for Part 1, Part 2 unlocks. For this, we need to find out the claims that do not overlap with any other claims.

My approach to this part of the puzzle is to look at all the coordinates of a claim, and to add up their usage using the SUM function. If the SUM is equal to the area of the claim (height x width), this all the coordinates are not shared. This requires getting the actual usage (I can use most of the above sub-query), and then adding this up for each claim.

This returns the only claim with no overlapping coordinates.

Summary

And here we have a T-SQL solution for Day 3 of the Advent of Code challenge. The key tasks that we can learn from today are:

  • Loading a file.
  • Split a string on a delimiter.
  • Assigning a sequential number to a set of rows in a specific order.
  • Construction of a physical TALLY table.
  • Common Table Expressions (CTE)