In my previous post, I introduced you to the Advent of Code programming challenge, and explained how I’m going through the days, solving both parts of each puzzle. Continuing on, today we have day 9. In this challenge, Santa needs to travel the world in the most efficient manner possible. The elves have provided him with a list of the distance between cities, and he needs to determine the shortest route to cover all cities, without repeating any of the cities. The elves put the list into an input file of consisting of “City1 to City2 = 3” (where the 3 is the number of miles between the two cities). For this exercise, the route is limited to just eight cities.

The input file has three parts that need to be extracted out: the starting city, the ending city and the distance between the two. However, the list also has each city combination listed once… and since the shortest route may include going from City2 to City1, we need to ensure that all combinations are included. So after reading the file in as I have been doing all throughout these challenges, I’ll extract out the two city names and the distance, and I’ll put them into a table where each city is listed as the starting city, thus having two entries for each distance, with starting from either city:

Once the data has been loaded, the next step is to get a list of all of the unique combinations of the eight cities, for possible routes. Each route needs to list all of the cities, but only once. The way that I’m going to do this is to use numbers to represent the cities. There are eight cities, so I will start off by creating a set of 8 numbers (1-8). I will then join this set to itself seven times, with the join criteria specified to get values that have not already been selected. I’ll then store this set of data into a temporary table. Since I’m going to need to join back to the first table to get the city names and distance, I’ll make this table to hold just one city per row, along with the position in the route and the route number (which is calculated with the ROW_NUMBER function):

At this point, we just need to join the tables together, get the distance between each city, and then sum up the distances between the cities for each route and return the shortest distance. I start off by getting the unique city names, and assigning them a number. Next I’ll join to this set the possible routes, and then join to the input file to get the distance between the two cities (I’ll use the LEAD function to get the city that this current row is traveling to next). Finally I’ll sum up the distance between each city for each route, and return the smallest distance.

And there we have the solution for Part 1. For Part 2, the next year Santa decides to be a showboat, and he decides to travel along the route with the longest distance. This requires a one-line change to the above code – insert the following line between lines 49 and 50 of the “Find the shortest route” section of code:

Addendum – why not just use a loop to get all the possible routes?

In most of the other programming languages, since they don’t have the capability to work in sets, they would have accomplished getting all of the possible routes via a looping mechanism to iterate through each city to get all of the possible combinations while ensuring that no city is duplicated. We could have done that here, but let me show you the performance difference in working with sets verses iterative programming in SQL Server. I’ll set up two identical temporary tables, and the loop will populate one while the set populates the other. I’ll repeat the population section 10 times for each, and capture the time that it takes to perform each (which will be stored into a third temporary table). Finally, I’ll average the elapsed time that each section took across the 10 times run.

The results that I get are:

It is easily seen that in SQL Server, working in sets is (in this case) 5 times faster – an entire order of magnitude faster. If you were to look at the individual run times, the set-based is normally ~10x faster (the first run for me was an outlier that occasionally ran slower than the loop).