Wayne Sheffield

My blog about SQL Server

Browsing Posts published by Wayne Sheffield

source: http://thistimeimeanit.com/wp-content/uploads/2011/06/officejuggler.jpg

One of the nightmare scenarios that a SQL Saturday organizer faces is selecting a speaker that is already going to another event on the same day. The speaker can only be at one event… and the site doesn’t really help us out with letting us know about these over-committed speakers.

So, I wrote this little PowerShell script to compare your selected speakers with the speakers selected at other SQL Saturday events that are on the same day. Now, by “selected speaker”, I mean those speakers that have been approved in the “Manage Sessions” section of the admin area of your event (those sessions don’t need to be on the schedule yet). Those sessions are then visible in the XML feed that is generated. This script will load the speakers for your event into an array, and then get the sessions for all of the other SQL Saturdays. If the other event is the same day as yours, it then compares the speakers. Finally, it will output those speakers that are selected in your event that are also selected in another event (aka the over-committed speakers) and the event that they are speaking at.

This script is based upon the script that Steve Jones’ wrote, however this script loads the information directly from the SQLSaturday.com web site each time that it runs. It could have used saved files (like Steve did), but this will be a bit easier to get changed speakers when either event changes the speakers that are presenting.

This script does not handle duplicate names (from a speaker presenting more than one session) at any of the events, so you may end up with duplicate output lines for a speaker.

Update: script updated on 2/9/2016 to handle > 1 duplicate speaker.

If anyone knows how to get a feed of the submitted sessions, I’ll create another script for comparing the sites for speakers that have submitted to multiple locations on the same day. I’d rather not even get to the selected phase if it can be avoided. If you know how to do this, just leave me a comment below please (or contact me directly).

source: http://cdn2.pcadvisor.co.uk/cmsdata/features/3380127/Infinity_Blade_II.jpg

In my previous post, I introduced you to the Advent of Code programming challenge, and explained how I’m going through the days, solving both parts of each puzzle. Continuing on, today we have day 21. In this challenge,you received a present – a role playing game. In this game, you (the player) and your enemy (the boss) take turns attacking each other. You each start off with 100 “Hit Points”, and you inflict damage based upon the weapons that you have. You are shielded by the armor that you have. You must purchase one weapon to use, and you may purchase one armor. You may purchase a ring for each hand. The store has just one of each item. The list of items in the store, along with their cost, how much damage they inflict, and how much protection they provide (as armor) is at the above link. The first person whose hit point score drops to zero (or less) loses. The damage inflicted is equal to the damage of all items of the attacker minus the armor total of the defender, and this damage is reduced from the hit points. Each attack will diminish the defender’s hit points by at least a value of one. The input file for this puzzle provides you with the boss’s stats. For Part 1 of this puzzle, you need to find out what is the least amount of gold that you can spend to win.

When the input file is examined, you see that it has three lines, one each for hit points, damage and armor. Each line lists the item, and then the amount of units for that item, with a colon between them. Therefore, we can use the string splitter routine to separate the lines, and then separate the items from their values by splitting again on the colon. The values returned will be stored into variables for future use. This code is:

The next section of code builds a common table expression (cte) for each weapon, armor and ring that is available. Note that for the armor and rings, I’ve added rows with 0 points to handle not purchasing those items. These ctes are (note that this will not run until all of the remaining sections are put together).

In the next cte, a list of all possible weapons with their costs, damage and armor scores are created. The actual weapon information is not necessary, just the values. This is accomplished by performing a CROSS JOIN between the three store items ctes. The rings cte is cross joined an additional time to handle a ring on each hand. Additionally, four CROSS APPLYs are performed. The first sums up the totals of the cost, damage and armor scores for the Player. The second calculates the damage to both the Player and Boss for that battle. The third adjusts the minimum damage to 1 in case it is less than 1. The fourth calculates the number of plays that the Boss and Player have for this battle by dividing the number of hit points (100) by the number of plays. Since this may return a fraction, and that fraction keeps the player alive for the next play, this is rounded up to the next integer value with the CEILING function. Additionally, the number of hits is handled as a numeric with a decimal position because in SQL Server, when an integer is divided by another integer, the result will be an integer with the fractional part truncated. A SELECT DISTINCT is performed because we only need to run the battle once for each condition:

Finally, the code returns the cost column from the first row where the number of Boss Plays is less than or equal to the number of Player Plays (since the player plays first, if they both would eliminate their opponent on that play, then the player eliminated the boss first):

And now we have the solution for Part 1. For Part 2, the challenge is to find the highest cost that the player can spend and still lose to the boss. All of the hard work has been done, and this requires only a minor change to the final query. Just replace the code starting at line 67 with this code:

Well, there we go, Day 21 has been solved. Come on back tomorrow to see how to solve Day 22.

source: http://www.myphillyalive.com/wp-content/uploads/2013/11/A-Elves-having-fun-Photo-by-Mark-Garvin.jpg

source: http://www.myphillyalive.com/wp-content/uploads/2013/11/A-Elves-having-fun-Photo-by-Mark-Garvin.jpg

In my previous post, I introduced you to the Advent of Code programming challenge, and explained how I’m going through the days, solving both parts of each puzzle. Continuing on, today we have day 20. In this challenge, it seems that Santa’s elves have too much time on their hands. To keep them busy, he is having them deliver presents. By hand. Door to door.

Santa sends the elves down a street. But this isn’t an ordinary street… this street is infinitely long. The houses on this street are sequentially numbered, starting with one. The elves (which is seems there is also an infinite supply of) are also numbered, starting with one. Each elf goes to every x house of the same number as the elf. In other words, elf #1 stops at every house. Elf #2 stops at every even numbered house. Elf #3 stops at every third house. And so on. At each stop, the elf leaves 10 times his number of presents. So elf 1 leave 10 presents, elf 2 leaves 20 presents, and so on.

For Part 1 of this puzzle, what is the lowest numbered house to receive at least a certain number of gifts? For me, I need to find out the lowest numbered house with 36 million gifts (and I’m left wondering just how long it would take to unwrap all of those presents).

To solve this problem, we need to determine what elves would be stopping at a particular house. We then need to sum up the elvf # times 10 for all of those elves, and see if it is at least the specified number. To determine if an elf would be stopping at a house, the elf’s number needs to divide evenly into the house number. In other words, we are looking for all of the factors of a number. In SQL Server, we can determine whether a number can be evenly divided into another with the modulo operator. This function returns the remainder, so we just need to look for results where the remainder is zero.

My first attempt to solve this is to use a set-based approach. In this approach, I use a virtual tally table, which is then cross-joined to itself (once for the sequential house numbers, and once for the elf numbers). The code checks that the elf number would go to the house number with the modulo operator, and it reduces the work by ensuring the elf number is less than or equal to the house number.

I did mention that this was my first attempt. This would be because this code locked up my laptop overnight for over 12 hours, and still didn’t complete (once I was able to determine the actual number, I came back to this and in running the code for just 1000 houses that included the correct house, this still took almost 90 seconds). Obviously, this is not a desired solution. So let’s look into an alternative.

In testing out this code, I found that this runs fine for an individual house. So I next decided to iterate through the houses, and to perform the determination of which elves go to that house in a set-based manner. The resulting code is:

In this newer incarnation of the code, the @HouseNbr variable is used to iterate through the house numbers. To keep track of the progress, I print out messages (with the RAISERROR function). At this point, patience is a virtue as you need to wait for the iteration to find the first house number that receives the specified number of presents. But at least while you wait, the laptop isn’t locked up and you can continue other work while waiting. And once the code returns the answer, you can enter it and move on to Part 2.

In Part 2, the elves will only stop at 50 houses each; however they will leave 11 presents at each house. With this new criteria, what is the new house number with this minimum number of presents?

Well, this is pretty straightforward change. Since each elf is now dropping of 11 presents instead of 10, change the 10 to 11 in line 13. And since each elf is only going to 50 houses, just add a new predicate where the Elf # * 50 is greater than or equal to the house number. Replace likes 13-16 of the above code with these 5 lines of code:

And there is my solution for day 20. I would love to find a better way to solve this – if you know of a way, please share it in the remarks.

source: http://www.brockpress.com/wp-content/uploads/2014/11/RudolphandSanta.jpg

In my previous post, I introduced you to the Advent of Code programming challenge, and explained how I’m going through the days, solving both parts of each puzzle. Continuing on, today we have day 19. In this challenge, Rudolph is sick and needs some medicine. Since Rudolph’s biology is different from other reindeer, he needs some specialized medicine. Therefore, Santa has ensured that the North Pole has a specialized Red-Nosed Reindeer nuclear fusion/fission plant, which is capable of constructing any Red-Nosed Reindeer molecule that you might need. This plant works by starting with an input molecule and then doing a series of replacements, one per step, until it has the right molecule.

Prior to using the machine, it needs to be calibrated. This involves determining the number of molecules that can be generated in one step from a given starting point. For instance, assume that the machine can make the following replacements:

H => HO
H => OH
O => HH

If you are starting with the molecule HOH, then the following replacements can be made:

  1. Replace the first H with HO to get HOOH.
  2. Replace the second H with HO to get HOHO.
  3. Replace the first H with OH to get OHOH.
  4. Replace the second H with OH to get HOOH.
  5. Replace the O with HH to get HHHH.

Today’s puzzle has an input file that lists all of the replacements that the machine can make, followed by the medicine molecule used to calibrate the machine. For Part 1, we need to determine how many unique molecule combinations can be made by doing just one replacement of all of the combinations. In the above example, there are 5 replacement molecules, but there are only 4 distinct molecules created (HOOH can be created two different ways).

The first challenge we run into when loading this file is that the input file has both the replacements and the medicine molecule. These need to be separated. After splitting the line on the carriage returns, the code then looks to see if the line has the “=>” string within the line. If so, then this line is a replacement line; otherwise it is the medicine molecule that we need. Since we will need to be extracting out from the replacement lines the string to be replaced, and the string to be replaced with, the code gets the position of the “=>” for further use. This information will be stored into a temporary table. Finally, get the line without the “=>” for the medicine molecule:

In the next step, we need to step through each replacement line, and replace just one occurrence of the characters to be replaced with the replacement characters. If I were to use the REPLACE function, then all occurrences of the characters to be replaced would be replaced in that line. Since we need to do just one at a time, the code needs to walk through the medicine molecule string to find the characters to be replaced, and then use the STUFF function to replace those characters with the replacement characters. I’ll be using the CHARINDEX function to find the next occurrence of the characters to be replaced, using the optional third parameter to specify the starting position within the string in order to find the next occurrence. Additionally, this is a case-sensitive string search, so I need to use the COLLATE option to specify that the medicine molecule string is of a collation that is case sensitive. After each replacement, the new medicine molecule string is stored into a table, and when all of the replacements have been performed, the code then will get the distinct count of all of these medicine molecules, using the DISTINCT keyword in the COUNT function. Since we need to work on each input line one at a time, and then to do the replacements one at a time upon the medicine molecule, I will be using a cursor to go line by line, and a nested WHILE loop to go through the medicine molecule. The cursor shreds the line into the Characters To Be Replaced and the Replacement Characters:

Part 1 has solved the issue for the calibration of the machine. For Part 2, we need to build the molecule. To build the molecule, we start with “e”, and run through the steps until the desired molecule has been created. For Part 2, starting with “e”, what is the minimum number of steps required to build the supplied medicine molecule using the same replacement list? At this point, I haven’t figured out how to solve this part, but I intend to come back to it and finish this up when time permits.

Well, I did get Part 1 finished. Come back tomorrow for the solutions for Day 20, and I’ll update this post when Part 2 is finished.

source: https://www.adafruit.com/images/145×109/2026-04.jpg

In my previous post, I introduced you to the Advent of Code programming challenge, and explained how I’m going through the days, solving both parts of each puzzle. Continuing on, today we have day 18. In this challenge, the fire department has cracked down on how many lights can be in your display. So the million light display that you used previously needs to be modified to have no more than 10,000 lights. You are working with them on a 100×100 grid. And Santa sends you instructions on how best to work with them.

With limiting the number of lights, you are going to need to move to an animated display. You have an input file of the initial state of all of the lights. In this file, a “#” means the light is on, and a “.” means that the light is off. The rules for turning on/off a light are:

  1. If the light is currently on, it will only stay on if 2 or 3 of it’s neighbors are also on. Otherwise, it is turned off
  2. If the light is currently off, it will only turn on if exactly 3 of it’s neighbors are on. Otherwise, it stays off.

The status of the lights you are comparing to is the status at the start of the step. All lights are compared based on this initial status.

In this puzzle, a “neighbor” is the eight “cells” that surround a particular cell. If the cell is on the top, bottom, left or right edge, there will be less than eight, and those “missing” neighbors are to be considered to be turned off.

For Part 1, we need to determine just how many lights are on after 100 steps have been performed.

The first step is going to be just like all the other steps with an input file – the input file needs to be loaded in to a temporary table. The file is larger than 8000 characters, so it will use the staged approach to extract the individual rows:

At this point, we have one hundred rows, and each row has the starting status of it’s one hundred lights. We need to extract the status of the individual lights. To accomplish this, let’s use a virtual tally table to get the character at each position (using the SUBSTRING function), and store the row, position and status in a second temporary table. Here I’m converting the status of “#” or “.” to 1 or 0:

Since we need to get the status of all lights based upon the initial state of the lights at the start of a step, we can use the update statement. However, since we need to perform 100 steps, we will have to perform this update 100 times. This is a combination of set-based and iterative code. I’ll accomplish the iterative part by using a while loop based upon the value of a counter. To determine the new status of an individual light, the query needs to get the status of the lights for the rows prior to and after the current row, and for the light positions prior to and after the current position. Since we are looking for a count of the number of neighbor lights that are on, we can utilize a sub-query to get the neighbors light status (remember that this is now a 1 or 0), and then sum them up. Next the query determines what the new light status should be, and updates the temporary table with this information:

If the lights current status is on (1), then the sum of all of the cells within the rows and positions will include this cell, therefore the neighbors count is looking for 3 or 4 instead of the specified 2 or 3 in order to compensate for this row (position) being included. If the current light is off, then we just look for whether 3 neighbors are on. And here we have the T-SQL solution for Part 1 of this puzzle.

For Part 2, you notice that the four corner lights are permanently on. What is the number of lights on after 100 steps now?

To solve this, we first need to set the four corner lights to being on initially. After the “Extract individual light status” code block, update those four rows to being on:

Next we need to alter the CASE statement in cte2. Replace line 68 and insert a new line 69 in the “Update grid 100 times” code block to these new lines:

The rest of the code remains the same. And here we have the solution for part 2 of day 18.

At this point, the solution is solved. But is this the most efficient way? That sub-query in the CROSS APPLY part of cte is being run for each row… or it’s being run 10,000 times. Let’s modify this query to use window functions instead – specifically the LAG and LEAD offset functions. In this new code, the LAG and LEAD functions will get the eight neighboring cells. A little bit of logic is needed to determine if this row is an edge, and if so then a 0 is used, otherwise the function is called to get the value from the offset row. The status of these eight neighboring cells are then added together to get the new Neighbors value. cte2 is also modified slightly – the query no longer needs to compensate for the current position being in the group being summed, and the case statement is changed slightly for determining when a status is turned on. The actual update statement is the same. The modified query is:

Performance wise, this updated code runs two-thirds faster. That’s a pretty nice improvement.

Come on back tomorrow for a T-SQL Solution for Day 19.

In my previous post, I introduced you to the Advent of Code programming challenge, and explained how I’m going through the days, solving both parts of each puzzle. Continuing on, today we have day 16. In this challenge, you have received a wonderful gift from your Aunt Sue, and you want to send her a thank-you card. There’s just one small problem… you have 500 Aunt Sue’s, and you don’t know which one sent you the gift.

The gift turns out to be your very own “My First Crime Scene Analysis Machine” (MFCSAM), and it can what specific compounds are in a sample. According to the instructions, it can detect:

  • children, by human DNA age analysis.
  • cats. It doesn’t differentiate individual breeds.
  • Several seemingly random breeds of dog: samoyeds, pomeranians, akitas, and vizslas.
  • goldfish. No other kinds of fish.
  • trees, all in one group.
  • cars, presumably by exhaust or gasoline or something.
  • perfumes, which is handy, since many of your Aunts Sue wear a few kinds.

In order to determine which Aunt Sue sent you the gift, you analyze the wrapping from the gift, and the machine tells you the following:

Well, you know a bit about your Aunts, so you start making a list of what you do know. However, there are items that you just don’t know about. They’re not missing, you just don’t know the value.

For part 1 of this puzzle, you need to return the number of the Aunt Sue that sent you the gift. The input file is in the format:

Sue 1: goldfish: 6, trees: 9, akitas: 0

You can see how there are some items that the MFCSAM returned that are not listed in the input file. As previously mentioned, these are not missing – you just don’t know what that Aunt Sue has for that item.

The first part of the solution is to load the input file, and to get for each Aunt the quantity of each item. This solution starts off by loading the input file into a variable:

Since the delimited string splitter only works on strings up to 8000 characters, you check the length and find that it is over 20,000 characters. So you decide to load the input file in steps, performing up to 8000 characters at a time. First off, create a temporary table to hold these lines. Next, get the first 8000 characters in the @InputText variable, and determine where the last CHAR(13) is. Get the string from @InputText up to this point. Then the string is split by the CHAR(13) to separate it into rows, and then each row is examined for which Sue # this is, and which items that we know about her by utilizing the CHARINDEX and SUBSTRING functions. This data is inserted into the temporary table, and the text that was just analyzed is removed from the beginning of @InputText. This process continues as long as @InputText has > 1 character (since we are not doing that final CHAR(13), @InputText will end up with just that in it):

At this point, we just need to analyze the data to see which Aunt Sue matches all of the criteria that the MFCSAM returned. However, since we don’t know all of the information for each Aunt, we need to assume that if the data isn’t there, that it does match. This will be accomplished with the ISNULL function, and if the item is NULL then it will return the value that was returned from the MFCSAM. The code for this is:

Great – now we know which Aunt sent us the gift. However, before sending the thank-you card, you notice in the instructions for the MFCSAM that for some of the items, the MFCSAM returns a range and not a specific value. Specifically, the cats and trees readings indicate that there are more than this number, and the pomeranians and goldfish readings indicate that there are fewer than this number. This is a relatively simple change to the code… for the cats and trees, change the value operator to a greater than, and change the value used in the ISNULL function to be higher than the number being tested for. Likewise, for the pomeranians and golffish, change the operator to a less than, and change the value used in the ISNULL function to be lower than the number being tested for:

And here is my T-SQL solution for Advent of Code, Day 16. Come on back tomorrow and see my next solution.

In my previous post, I introduced you to the Advent of Code programming challenge, and explained how I’m going through the days, solving both parts of each puzzle. Continuing on, today we have day 15. In this challenge, we are working on the cookie recipe. Specifically, different ingredients impact a few different aspects / properties of the cookie (flavor, texture, capacity (to absorb milk) and durability (how well the cookie stay intact when full of milk). We are provided an input file that for each ingredient, breaks down how 1 teaspoon of that ingredient impacts the various aspects. The input file is formatted like this:

Each cookie needs 100 teaspoons of the combination of ingredients. The task is to find the combination that when considering the number of teaspoons of the ingredients and how this impacts the properties of the cookies, gives the highest score for the cookie. The score is determined  by adding together, for each property, all the the ingredients. Negative amounts of the total are considered zero, and then the properties are multiplied together to determine the score.

The first step will be to load the input file, separate the lines, and then from each line extract out the various parts. For that, we proceed as we have through most of these puzzles –  loading the data into a temporary table, and then determining the position of various characters in the line with the CHARINDEX function and extracting this data with the SUBSTRING function. The CROSS APPLY is just introducing those columns into the query so that they can be subsequently used in multiple places, while only being calculated once. Note that several of the CHARINDEX calls use the optional third parameter which specifies the starting position. In these cases, the code is looking for a comma, so we are specifying the position of the last comma + 1 as the starting position for the next comma:

In the next phase, I use a virtual tally table of the numbers 1-100, and cross join this to itself once for each ingredient, and the resulting numbers are the teaspoons to be used for each ingredient. Since the number of teaspoons used needs to equal 100, the code only gets the values where the four numbers add up to 100. Additionally, a ROW_NUMBER is calculated for each row. This will be used to be able to perform aggregations and to group upon. Next, each property needs to be added up across all ingredients (the property for that ingredient multiplied by the number of teaspoons used for that ingredient).  This requires performing a CROSS JOIN to the input file. Finally, all of the properties are multiplied across each other (a zero is substituted for negative sums), and the highest one is returned with the MAX function.

For part 2, you are asked to restrict the cookies to 500 calories. What is the score of the highest scoring cookie that is 500 calories?

To handle this modification, two minor changes are needed. Since the calories are already being stored in the input table, no changes are necessary in that section. In cte2, we need to calculate the calories for the teaspoons used and to sum up the calories across all of the ingredients, and in the final query we need to add a predicate to only get the max for cookies with 500 calories. To make these changes, insert lines 1-5 below between lines 69-70 of cte2, and add line 5 to the end:

Now that the cookie recipe has been mastered, day 15 is complete. Come back Monday for day 16.

In my previous post, I introduced you to the Advent of Code programming challenge, and explained how I’m going through the days, solving both parts of each puzzle. Continuing on, today we have day 14. In this challenge, Santa is evaluating his reindeer, so he’s sending them through Reindeer Olympics. The reindeer can all fly at high speeds… but that comes at a cost where they need to rest also. In Part 1 of this challenge, Santa wants to find out which of the reindeer is the fastest. The supplied list tells the speed and length of time that each reindeer can fly at, and how long it must rest after that burst. So Santa sends the reindeer to a race, and he wants to know the distance the fastest reindeer traveled in 2503 seconds. Yeah, that’s a weird number all right. Santa’s elves have determined that this is the minimum time in order to get accurate results.

In the following code, the first cte imports the test data and uses the PATINDEX, CHARINDEX and SUBSTRING functions to find key parts of the string, and then extract the data for the reindeer. The second cte uses this data to calculate how far each reindeer travels during the time, and then the query just returns the longest distance.

The hardest part of this code is the formula that determines the distance that the reindeer travels. Starting off in the CROSS APPLY lines, I first introduce the time interval so that it can then be referenced in multiple locations later. I could have used the hard coded value in all of those locations, but I’m expecting that the part 2 will do something involving changing the time, and I’m taking steps now to make it easier later. The second CROSS APPLY calculates the number of cycles where that reindeer can move. The cycle is the time limit divided by the sum of the active moving and resting times. The third cte calculates how much time is left where the reindeer can travel after the last cycle. The code utilizes all of this information to calculate the distance that the reindeer traveled. It starts off with the number of cycles times the speed and duration. It then checks to see if the time left is longer that the duration the reindeer can fly before needing to rest, and determines whether it needs to use the active duration time or the time left. Once this has been determined, it is multiplied by the speed, and this result is added to the prior calculation for the total distance that the reindeer traveled during this time frame.

For part 2, Santa realizes that this scoring isn’t that good of a true indicator, so he decides to go to a point system. At each second of the race, the reindeer in the lead gets a point, and if multiple reindeer are tied then they both get a point. At the end of the race, the reindeer in the lead in the winner. For this, we need to return the number of points that the winning reindeer received.

For this solution, the code starts off with a dynamic tally table. The tally table is restricted to 2503 rows – one for each second. The code next (in cte) extracts the data from the input file just as in part 1. In cte2, the distance that each reindeer has traveled so far is calculated for each second. This is different from above by cross-joining to the tally table, and using this number to calculate the distance for that second. In cte3, the code calculates the reindeer in the lead at that second. Since multiple reindeer can be tied, the RANK function is used so that those ties will both be marked as being at #1 for that second. The partition by clause is used to recalculate the winner for each second, and by ordering by the distance descending, the reindeer(s) that has traveled the furthest are ranked with a value of 1. The next cte (cte4) gets the distinct list of reindeers that are ranked 1 for any time slot, and the query finally counts for each reindeer the number of times that it was leading during the race. By ordering this in descending order, and getting just the top row, we have returned the number of points that the winning reindeer has.

This completes day 14. Come back tomorrow to see how day 15 is solved in T-SQL!

In my previous post, I introduced you to the Advent of Code programming challenge, and explained how I’m going through the days, solving both parts of each puzzle. Continuing on, today we have day 13. In this challenge, it’s time for the family dinner, and you’re in charge of the seating arrangements. There is one oval shaped table, and you have a list of how happy a particular person is to be seated next to every other person in the family (a quantified ranking happiness unit, which can be positive or negative). Your task is to come up with the seating arrangement that produces the highest overall happiness unit. The input for this task is here, and the answer to be supplied is the total of all of the happiness units for this optimal seating arrangement.

Each line in the input file is in the format “Alice would lose 57 happiness units by sitting next to Bob.” The “lose” will be either “lose” or “gain”. So the first step is to load the file in, and to split out the two names and the number of happiness units that will be had. I’m going to load the list into a temporary table:

This code starts off by loading the file (using the OPENROWSET function) and splitting it into individual lines. For each line, the second CROSS APPLY gets the position where any number starts at using the PATINDEX function, and the third CROSS APPLY gets the first space after this number using the CHARINDEX function. In the column list, the CHARINDEX function is used to find the first space. Everything prior to that is the persons name (PersonA). It then looks to see if the word “gain” is in the string. If so it uses the value 1; if not, it uses the value -1. This value is multiplied by the number that is extracted by the SUBSTRING function using the values from the two CROSS APPLY operations. Finally, we get the person that PersonA is sitting next to by looking for the string ” to “. Everything after this is the person that PersonA is sitting next to. However, each line also end with a period, so this is also removed.

If we think back to Day 9, Santa had to create an optimal traveling route between cities. To solve that, we had to get every possible route, and then compare the distance between the cities to get the shortest travel route. Today’s puzzle is pretty much the same thing, with one difference. The difference is that the distance between the two cities is the same whether we are going from CityA to CityB, or CityB to CityA. In this case, each person has a different happiness unit for sitting next to another person. So, Alice may lose 57 happiness units when sitting next to Bob (because he talks soooooo much), but Bob may gain 83 happiness units when sitting next to Alice (because she listens so well). Therefore, both of these need to be considered.

Just like in Day 9, we need to generate every possible combination of seating arrangements. In Day 9, I used numbers to represent the cities, but then I had to join back to this information to get the city name. Today, we’ll use the names of the people, so let’s start off by getting every distinct name from the list and inserting that into another temporary table:

This next step does all of the work. The first cte starts off by getting all possible seating arrangements by joining this table to itself eight times (once for each person). In the second cte, this data is pivoted so that each arrangement has a seating position number and person in that position. Concurrently with this step, it determines who this person is sitting next to, on both the left and right sides. The third cte joins back to the table with the happiness units to get the units on both the left, right and total for that person. The fourth cte sums up the total happiness units for each sitting arrangement, and finally the seating arrangement with the highest happiness unit is returned.

And there we have our optimal seating arrangement.

For part 2 of this puzzle, you realize that you forgot to seat yourself. After realizing this, and considering that you are such a neutral person, you decide that the happiness unit for everyone involved with you is zero. So, the first step is to add you to this mix (this should be performed between the first two blocks of code above):

Next, the code needs to be modified slightly to handle the additional person. In the “Get the seating arrangement with the highest total happiness units”, insert line 1 below after the Person8 line (insert it as line 12); replace the ROW_NUMBER() calculation (lines 13-15) with lines 2-4 below, insert lines 5-8 below after the join to t8 (starting at line 33). All three of these are in the cteAllPossibleSeatingCombos cte. Lastly, replace the CROSS APPLY in the cte2SeatingCombosPivoted (lines 38-46) with the one starting at line 9 below.

This concludes day 13, which means that we are over 50% through the Advent of Code challenges. Come back tomorrow to see my solution for Day 14.

In my previous post, I introduced you to the Advent of Code programming challenge, and explained how I’m going through the days, solving both parts of each puzzle. Continuing on, today we have day 12. In this challenge, Santa’s elves need help in balancing the books. Their accounting software uses a weird storage format. Maybe you’ve heard of it… JSON. You’ve given a JSON string, and you need to add up all of the numbers that are contained within it.

Through the current version of SQL Server, there isn’t JSON support. So, we need to work with the string, identifying the numeric parts, and then adding them up:

This code starts off by loading the file into a variable. A virtual tally table is then used to split the file apart character-by-character, storing off just the numeric data (including the minus signs) and the position in the string. The next objective is to put the numbers back together so that they can be added together. To accomplish this, we need to work with the concept of data islands – data that is in sequential rows, but needs to be handled together. In this case, we need to put the numeric characters back together to form a number that can then be added together.

To create these islands, we use the ROW_NUMBER function to assign a sequential number to all of these numbers (ordered by the position the character was in the string). If you subtract the number calculated from the ROW_NUMBER from the original position, then the characters that are sequential will have the same difference. For example, assume that the first number is in the input string at characters 15-16. When assigned the ROW_NUMBER value, these will be 1 and 2. 15-1=14, and 16-2=14. See how that have the same difference? If the next number is in positions 33-35, and has the ROW_NUMBER values 3-5, then 33-3=30, 34-4=30 and 35-5=30 – again this grouping will have the same value. And it is this value that is used to separate these subsets of data into data islands.

After having all of these data islands, the next step is to put the numbers in each data island back together. For this, I use the XML PATH trick to concatenate strings together. Please read my article for a thorough explanation of how it works – no need to repeat that here. Please note that while the article is about creating a comma-delimited list, it can be used with any delimiter, or as in this case, no delimiter. The ORDER BY clause prior to the FOR XML PATH line puts the numbers back together in the proper order, and all that is left is to convert this data into a number so that it can be added up. The conversion and summing are performed at the same time.

So, this is the brute-force way of getting the numbers. However, this is a JSON string, and the new and upcoming version of SQL Server (2016) has implemented JSON support, so can this be accomplished there? Yes, it can be:

This code loads the file in, and then cross applies the input file into the new OPENJSON function. This function returns three columns: key, value and type, and returns just the first-level properties for this file. The key is the property name or index position of the value; the value is the data and the type indicates the type of data. In order to continue shredding this JSON file, you need to recursively call the OPENJSON, passing in the previous returned value. To accomplish this, the code utilizes a recursive cte. Typically when I use a recursive cte, I add a Lvl column to show at what level of recursion the particular row came from – this comes in handy many times, so I just normally do it automatically.

In the recursive member of the cte, the code also uses the new ISJSON function so that only data that is a valid JSON document will be evaluated. Without this, at some point in the recursion, the data gets to a point where it no longer is valid and subsequently crashes.

At this point, the code now has the key, value and type of every piece of data in the JSON document. What is needed now is to get all of the numbers, and add them up. As previously mentioned, the type column from the OPENJSON function indicates the type of data that it is. A type of 2 indicates that the data is a number, so we can get just those, convert them to an integer, and then sum them up.

While neither of these solutions are complex at all (well, the first one does use a few tricks, but once you understand those, it’s not a complex piece of code), the SQL 2016 version does end up being shorter and simpler.

In part 2, Santa’s elves discovered that everything that is red is wrong, and those numbers need to be skipped in the count. This is to skip all JSON objects with a property value of “red”, and any child objects. While I might be able to figure out a brute-force way to accomplish this, it will be a bit of a pain in the derriere to accomplish. So I’ll jump back into SQL 2016 to accomplish this with the built-in support for JSON.

I start off by creating a temporary table and storing the results from the OPENJSON function in it. Since we need to ignore not only the item, but any child item(s) also, I’m also going to calculate and store a hierarchy for the item, as well as the parent hierarchy for the item:

In this example, all of the properties are a single character, and all of the indexes are less than nine (represented by a single character), so I can get the parent hierarchy but just removing the last character from the current hierarchy. In the next step, I need to identify all of the objects that should not be evaluated, and update the “Exclude” column in the temp table to be set so that they can be excluded from the query.

In this section of code, I get all of the objects where the value = “red”, and where the key (remember, this is the property name or index position) is not a number (indicating an index position). For all of these, I set the items that are like the ParentHierarchy to be excluded. The final step is to count up the numbers, just like previously: