In my previous post, I introduced you to the Advent of Code programming challenge, and explained how I’m going through the days, solving both parts of each puzzle. Continuing on, today we have day 9. In this challenge, Santa needs to travel the world in the most efficient manner possible. The elves have provided him with a list of the distance between cities, and he needs to determine the shortest route to cover all cities, without repeating any of the cities. The elves put the list into an input file of consisting of “City1 to City2 = 3” (where the 3 is the number of miles between the two cities). For this exercise, the route is limited to just eight cities.
The input file has three parts that need to be extracted out: the starting city, the ending city and the distance between the two. However, the list also has each city combination listed once… and since the shortest route may include going from City2 to City1, we need to ensure that all combinations are included. So after reading the file in as I have been doing all throughout these challenges, I’ll extract out the two city names and the distance, and I’ll put them into a table where each city is listed as the starting city, thus having two entries for each distance, with starting from either city:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
DECLARE @InputText VARCHAR(MAX); SELECT @InputText = REPLACE(FileText, CHAR(10), '') FROM OPENROWSET(BULK 'D:\AdventOfCode\Day09.txt', SINGLE_CLOB) UselessAlias(FileText) ; IF OBJECT_ID('tempdb.dbo.#temp') IS NOT NULL DROP TABLE dbo.#temp; CREATE TABLE dbo.#temp ( ItemNumber INTEGER , Item VARCHAR(50) , StartCity VARCHAR(50) , EndCity VARCHAR(50) , Distance INTEGER ); CREATE INDEX IX1 ON #temp (StartCity, EndCity) INCLUDE (Distance); WITH cte AS ( -- split the input file into rows. -- on each row, extract the starting city, ending city and distance SELECT * , LEFT(ds.Item, ca.ToPos-1) AS StartCity , SUBSTRING(ds.Item, ca.ToPos+4, ca.EqualPos-ca.ToPos-4) AS EndingCity , CONVERT(INTEGER, SUBSTRING(ds.item, ca.EqualPos+3, 8000)) AS Distance FROM Sandbox.dbo.DelimitedSplit8K(@InputText, CHAR(13)) ds CROSS APPLY (SELECT CHARINDEX(' to ', ds.Item), CHARINDEX(' = ', ds.Item)) ca(ToPos, EqualPos) ) -- get all cities as a starting city by reversing the order of the INSERT INTO #temp SELECT cte.ItemNumber, cte.Item, cte.StartCity, cte.EndingCity, cte.Distance FROM cte UNION ALL SELECT cte.ItemNumber, cte.Item, cte.EndingCity, cte.StartCity, cte.Distance FROM cte ORDER BY cte.StartCity, cte.Distance; |
Once the data has been loaded, the next step is to get a list of all of the unique combinations of the eight cities, for possible routes. Each route needs to list all of the cities, but only once. The way that I’m going to do this is to use numbers to represent the cities. There are eight cities, so I will start off by creating a set of 8 numbers (1-8). I will then join this set to itself seven times, with the join criteria specified to get values that have not already been selected. I’ll then store this set of data into a temporary table. Since I’m going to need to join back to the first table to get the city names and distance, I’ll make this table to hold just one city per row, along with the position in the route and the route number (which is calculated with the ROW_NUMBER function):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
IF OBJECT_ID('tempdb.dbo.#UniqueCityRoutes') IS NOT NULL DROP TABLE #UniqueCityRoutes; CREATE TABLE #UniqueCityRoutes ( RouteNbr INTEGER , CityPos INTEGER , CityNbr INTEGER); WITH nums AS ( -- get a number to represent each of the 8 cities SELECT * FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8)) dt(num) ), cte AS ( -- join the nums cte together 8 times, but ensure that each one does not have a value that any of the other joins are already using SELECT t1.num AS num1 , t2.num AS num2 , t3.num AS num3 , t4.num AS num4 , t5.num AS num5 , t6.num AS num6 , t7.num AS num7 , t8.num AS num8 -- assign a row number to be the route number for , ROW_NUMBER() OVER (ORDER BY t1.num, t2.num, t3.num, t4.num, t5.num, t6.num, t7.num, t8.num) AS RN FROM nums t1 JOIN nums t2 ON t1.num <> t2.num JOIN nums t3 ON t1.num <> t3.num AND t2.num <> t3.num JOIN nums t4 ON t1.num <> t4.num AND t2.num <> t4.num AND t3.num <> t4.num JOIN nums t5 ON t1.num <> t5.num AND t2.num <> t5.num AND t3.num <> t5.num AND t4.num <> t5.num JOIN nums t6 ON t1.num <> t6.num AND t2.num <> t6.num AND t3.num <> t6.num AND t4.num <> t6.num AND t5.num <> t6.num JOIN nums t7 ON t1.num <> t7.num AND t2.num <> t7.num AND t3.num <> t7.num AND t4.num <> t7.num AND t5.num <> t7.num AND t6.num <> t7.num JOIN nums t8 ON t1.num <> t8.num AND t2.num <> t8.num AND t3.num <> t8.num AND t4.num <> t8.num AND t5.num <> t8.num AND t6.num <> t8.num AND t7.num <> t8.num ) -- insert these results into the temp table -- put each number into a separate row to make the subsequent join easier INSERT INTO #UniqueCityRoutes (RouteNbr, CityPos, CityNbr) SELECT ca.* FROM cte CROSS APPLY (VALUES (RN, 1, num1), (RN, 2, num2), (RN, 3, num3), (RN, 4, num4), (RN, 5, num5), (RN, 6, num6), (RN, 7, num7), (RN, 8, num8)) ca(RN, CityPos, CityNbr) ORDER BY RN, num1, num2, num3, num4, num5, num6, num7, num8; |
At this point, we just need to join the tables together, get the distance between each city, and then sum up the distances between the cities for each route and return the shortest distance. I start off by getting the unique city names, and assigning them a number. Next I’ll join to this set the possible routes, and then join to the input file to get the distance between the two cities (I’ll use the LEAD function to get the city that this current row is traveling to next). Finally I’ll sum up the distance between each city for each route, and return the smallest distance.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
WITH cte1 AS ( -- get the distinct cities SELECT DISTINCT StartCity FROM #temp ), cte2 AS ( -- assign a number to each city SELECT StartCity, ROW_NUMBER() OVER (ORDER BY StartCity) AS RN FROM cte1 ), cte3 AS ( -- join each city to it's name. Assign a row number within each loop SELECT u.RouteNbr -- , ROW_NUMBER() OVER (PARTITION BY u.LoopID ORDER BY u.RowID) AS RN , u.CityPos AS RN , cte2.StartCity FROM #UniqueCityRoutes u JOIN cte2 ON u.CityNbr = cte2.RN ), cte4 AS ( -- get the next starting city SELECT RouteNbr , RN , StartCity , LEAD(StartCity) OVER (PARTITION BY RouteNbr ORDER BY RN) AS NextCity FROM cte3 ), cte5 AS ( -- join to the distance table to get the distance between this city and the next city SELECT cte4.RouteNbr , cte4.RN , cte4.StartCity , cte4.NextCity , t.Distance FROM cte4 JOIN #temp t ON cte4.StartCity = t.StartCity AND cte4.NextCity = t.EndCity ), cte6 AS ( -- sum up the total distance for this set of cities / order seen SELECT RouteNbr , SUM(Distance) AS TotalDistance FROM cte5 GROUP BY RouteNbr ) -- get the shortest and longest routes SELECT MIN(cte6.TotalDistance) AS ShortestRoute FROM cte6 |
And there we have the solution for Part 1. For Part 2, the next year Santa decides to be a showboat, and he decides to travel along the route with the longest distance. This requires a one-line change to the above code – insert the following line between lines 49 and 50 of the “Find the shortest route” section of code:
1 |
, MAX(cte6.TotalDistance) AS LongestRoute |
Addendum – why not just use a loop to get all the possible routes?
In most of the other programming languages, since they don’t have the capability to work in sets, they would have accomplished getting all of the possible routes via a looping mechanism to iterate through each city to get all of the possible combinations while ensuring that no city is duplicated. We could have done that here, but let me show you the performance difference in working with sets verses iterative programming in SQL Server. I’ll set up two identical temporary tables, and the loop will populate one while the set populates the other. I’ll repeat the population section 10 times for each, and capture the time that it takes to perform each (which will be stored into a third temporary table). Finally, I’ll average the elapsed time that each section took across the 10 times run.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
IF OBJECT_ID('tempdb.dbo.#UniqueCityRoutes1') IS NOT NULL DROP TABLE #UniqueCityRoutes1; CREATE TABLE #UniqueCityRoutes1 ( LoopID INTEGER , CityPos INTEGER , CityNbr INTEGER); IF OBJECT_ID('tempdb.dbo.#UniqueCityRoutes2') IS NOT NULL DROP TABLE #UniqueCityRoutes2; CREATE TABLE #UniqueCityRoutes2 ( LoopID INTEGER , CityPos INTEGER , CityNbr INTEGER); IF OBJECT_ID('tempdb.dbo.#TimeTracker') IS NOT NULL DROP TABLE dbo.#TimeTracker; CREATE TABLE #TimeTracker ( RowID INTEGER IDENTITY, RunType VARCHAR(20), StartTime DATETIME, EndTime DATETIME); GO DECLARE @StartTime DATETIME, @EndTime DATETIME; SET @StartTime = GETDATE(); -- start the loop DECLARE @CityCount INTEGER = 8; DECLARE @l1 INTEGER, @l2 INTEGER, @l3 INTEGER, @l4 INTEGER, @l5 INTEGER, @l6 INTEGER, @l7 INTEGER, @l8 INTEGER, @InsertNbr INTEGER = 0; SET @l1 = 0; WHILE @l1 < @CityCount BEGIN SET @l1 += 1; SET @l2 = 0; WHILE @l2 < @CityCount BEGIN SET @l2 += 1; SET @l3 = 0; IF @l2 NOT IN ( @l1) WHILE @l3 < @CityCount BEGIN SET @l3 += 1; SET @l4 = 0; IF @l3 NOT IN (@l1, @l2) WHILE @l4 < @CityCount BEGIN SET @l4 += 1; SET @l5 = 0; IF @l4 NOT IN (@l1, @l2, @l3) WHILE @l5 < @CityCount BEGIN SET @l5 += 1; SET @l6 = 0; IF @l5 NOT IN (@l1, @l2, @l3, @l4) WHILE @l6 < @CityCount BEGIN SET @l6 += 1; SET @l7 = 0; IF @l6 NOT IN (@l1, @l2, @l3, @l4, @l5) WHILE @l7 < @CityCount BEGIN SET @l7 += 1; SET @l8 = 0; IF @l7 NOT IN (@l1, @l2, @l3, @l4, @l5, @l6) WHILE @l8 < @CityCount BEGIN SET @l8 += 1; IF @l8 NOT IN (@l1, @l2, @l3, @l4, @l5, @l6, @l7) BEGIN SET @InsertNbr +=1; -- insert each value into a separate row. This will make the subsequent join later on a bit easier. INSERT INTO #UniqueCityRoutes1 (LoopID, CityPos, CityNbr) VALUES (@InsertNbr, 1, @l1), (@InsertNbr, 2, @l2), (@InsertNbr, 3, @l3), (@InsertNbr, 4, @l4), (@InsertNbr, 5, @l5), (@InsertNbr, 6, @l6), (@InsertNbr, 7, @l7), (@InsertNbr, 8, @l8); END; -- insert END; -- @l8 loop END; -- @l7 loop END; -- @l6 loop END; -- @l5 loop END; -- @l4 loop END; -- @l3 loop END; -- @l2 loop END; -- @l1 loop SET @EndTime = GETDATE(); INSERT INTO #TimeTracker (RunType, StartTime, EndTime) VALUES ('Loop', @StartTime, @EndTime); SET @StartTime = GETDATE(); -- start the set based WITH nums AS ( -- get a number to represent each of the 8 cities SELECT * FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8)) dt(num) ), cte AS ( -- join the nums cte together 8 times, but ensure that it does not have a value that any of the other joins are already using SELECT t1.num AS CityNbr1 , t2.num AS CityNbr2 , t3.num AS CityNbr3 , t4.num AS CityNbr4 , t5.num AS CityNbr5 , t6.num AS CityNbr6 , t7.num AS CityNbr7 , t8.num AS CityNbr8 -- assign a row number to be the route number for , ROW_NUMBER() OVER (ORDER BY t1.num, t2.num, t3.num, t4.num, t5.num, t6.num, t7.num, t8.num) AS RN FROM nums t1 JOIN nums t2 ON t1.num <> t2.num JOIN nums t3 ON t1.num <> t3.num AND t2.num <> t3.num JOIN nums t4 ON t1.num <> t4.num AND t2.num <> t4.num AND t3.num <> t4.num JOIN nums t5 ON t1.num <> t5.num AND t2.num <> t5.num AND t3.num <> t5.num AND t4.num <> t5.num JOIN nums t6 ON t1.num <> t6.num AND t2.num <> t6.num AND t3.num <> t6.num AND t4.num <> t6.num AND t5.num <> t6.num JOIN nums t7 ON t1.num <> t7.num AND t2.num <> t7.num AND t3.num <> t7.num AND t4.num <> t7.num AND t5.num <> t7.num AND t6.num <> t7.num JOIN nums t8 ON t1.num <> t8.num AND t2.num <> t8.num AND t3.num <> t8.num AND t4.num <> t8.num AND t5.num <> t8.num AND t6.num <> t8.num AND t7.num <> t8.num ) -- insert these results into the temp table -- put each number into a separate row to make the subsequent join easier INSERT INTO #UniqueCityRoutes2 (LoopID, CityPos, CityNbr) SELECT ca.* --FROM #UniqueCityRoutes3 FROM cte CROSS APPLY (VALUES (RN, 1, CityNbr1), (RN, 2, CityNbr2), (RN, 3, CityNbr3), (RN, 4, CityNbr4), (RN, 5, CityNbr5), (RN, 6, CityNbr6), (RN, 7, CityNbr7), (RN, 8, CityNbr8)) ca(RN, CityPos, CityNbr) ORDER BY RN, CityNbr1, CityNbr2, CityNbr3, CityNbr4, CityNbr5, CityNbr6, CityNbr7, CityNbr8; --*/ SET @EndTime = GETDATE(); INSERT INTO #TimeTracker (RunType, StartTime, EndTime) VALUES ('Set', @StartTime, @EndTime); GO 10 SELECT RunType, AVG(DATEDIFF(MILLISECOND, StartTime, EndTime)) AS [ElapsedTime (ms)] FROM #TimeTracker GROUP BY RunType ORDER BY RunType; |
The results that I get are:
1 2 3 4 |
RunType ElapsedTime (ms) -------------------- ---------------- Loop 1781 Set 339 |
It is easily seen that in SQL Server, working in sets is (in this case) 5 times faster – an entire order of magnitude faster. If you were to look at the individual run times, the set-based is normally ~10x faster (the first run for me was an outlier that occasionally ran slower than the loop).