In my previous post, I introduced you to the Advent of Code programming challenge, and explained how I’m going through the days, solving both parts of each puzzle. Continuing on, today we have day 12. In this challenge, Santa’s elves need help in balancing the books. Their accounting software uses a weird storage format. Maybe you’ve heard of it… JSON. You’ve given a JSON string, and you need to add up all of the numbers that are contained within it.
Through the current version of SQL Server, there isn’t JSON support. So, we need to work with the string, identifying the numeric parts, and then adding them up:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
DECLARE @InputText VARCHAR(MAX); SELECT @InputText = REPLACE(FileText, CHAR(10), '') FROM OPENROWSET(BULK 'D:\AdventOfCode\Day12.txt', SINGLE_CLOB) UselessAlias(FileText); IF OBJECT_ID('tempdb.dbo.#temp') IS NOT NULL DROP TABLE #temp; CREATE TABLE #temp ( N INTEGER, InputText CHAR(1), RN INTEGER, Grp INTEGER); WITH Tens (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), Hundreds(N) AS (SELECT 1 FROM Tens t1, Tens t2), Millions(N) AS (SELECT 1 FROM Hundreds t1, Hundreds t2, Hundreds t3), Tally (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM Millions) , cte1 AS ( SELECT N, ca.InputText, ROW_NUMBER() OVER (ORDER BY N) AS RN FROM Tally CROSS APPLY (VALUES (SUBSTRING(@InputText, N, 1))) ca(InputText) WHERE ISNUMERIC(ca.InputText) = 1 -- this will get the minus signs AND ca.InputText NOT LIKE '[.,]' AND ca.InputText <> CHAR(13) ) INSERT INTO #temp (N, InputText, RN, GRP) SELECT N, InputText, RN, N - RN AS Grp FROM cte1; WITH cte AS ( SELECT DISTINCT Grp FROM #temp ) --SELECT * FROM cte3; SELECT SUM(CONVERT(INTEGER, ca.Numbers)) AS Part1SUM FROM cte t1 CROSS APPLY (SELECT (SELECT '' + InputText FROM #temp t2 WHERE t2.Grp = t1.Grp ORDER BY N FOR XML PATH(''), TYPE).value('.','VARCHAR(MAX)')) ca(Numbers); |
This code starts off by loading the file into a variable. A virtual tally table is then used to split the file apart character-by-character, storing off just the numeric data (including the minus signs) and the position in the string. The next objective is to put the numbers back together so that they can be added together. To accomplish this, we need to work with the concept of data islands – data that is in sequential rows, but needs to be handled together. In this case, we need to put the numeric characters back together to form a number that can then be added together.
To create these islands, we use the ROW_NUMBER function to assign a sequential number to all of these numbers (ordered by the position the character was in the string). If you subtract the number calculated from the ROW_NUMBER from the original position, then the characters that are sequential will have the same difference. For example, assume that the first number is in the input string at characters 15-16. When assigned the ROW_NUMBER value, these will be 1 and 2. 15-1=14, and 16-2=14. See how that have the same difference? If the next number is in positions 33-35, and has the ROW_NUMBER values 3-5, then 33-3=30, 34-4=30 and 35-5=30 – again this grouping will have the same value. And it is this value that is used to separate these subsets of data into data islands.
After having all of these data islands, the next step is to put the numbers in each data island back together. For this, I use the XML PATH trick to concatenate strings together. Please read my article for a thorough explanation of how it works – no need to repeat that here. Please note that while the article is about creating a comma-delimited list, it can be used with any delimiter, or as in this case, no delimiter. The ORDER BY clause prior to the FOR XML PATH line puts the numbers back together in the proper order, and all that is left is to convert this data into a number so that it can be added up. The conversion and summing are performed at the same time.
So, this is the brute-force way of getting the numbers. However, this is a JSON string, and the new and upcoming version of SQL Server (2016) has implemented JSON support, so can this be accomplished there? Yes, it can be:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
WITH cteShredJson AS ( SELECT 1 AS Lvl, ca.* FROM OPENROWSET(BULK 'C:\Users\Administrator\Desktop\Day12.txt', SINGLE_CLOB) ImportFile(FileText) CROSS APPLY OPENJSON(ImportFile.FileText) ca UNION ALL SELECT Lvl+1, ca.* FROM cteShredJson CROSS APPLY OPENJSON(cteShredJson.value) ca WHERE ISJSON(cteShredJson.value) = 1 -- as it goes thru the heirarchy, things become invalid json strings. So don't get those. ) SELECT SUM(CONVERT(INTEGER, [value])) FROM cteShredJson -- get just the integers WHERE cteShredJson.[type] = 2; |
This code loads the file in, and then cross applies the input file into the new OPENJSON function. This function returns three columns: key, value and type, and returns just the first-level properties for this file. The key is the property name or index position of the value; the value is the data and the type indicates the type of data. In order to continue shredding this JSON file, you need to recursively call the OPENJSON, passing in the previous returned value. To accomplish this, the code utilizes a recursive cte. Typically when I use a recursive cte, I add a Lvl column to show at what level of recursion the particular row came from – this comes in handy many times, so I just normally do it automatically.
In the recursive member of the cte, the code also uses the new ISJSON function so that only data that is a valid JSON document will be evaluated. Without this, at some point in the recursion, the data gets to a point where it no longer is valid and subsequently crashes.
At this point, the code now has the key, value and type of every piece of data in the JSON document. What is needed now is to get all of the numbers, and add them up. As previously mentioned, the type column from the OPENJSON function indicates the type of data that it is. A type of 2 indicates that the data is a number, so we can get just those, convert them to an integer, and then sum them up.
While neither of these solutions are complex at all (well, the first one does use a few tricks, but once you understand those, it’s not a complex piece of code), the SQL 2016 version does end up being shorter and simpler.
In part 2, Santa’s elves discovered that everything that is red is wrong, and those numbers need to be skipped in the count. This is to skip all JSON objects with a property value of “red”, and any child objects. While I might be able to figure out a brute-force way to accomplish this, it will be a bit of a pain in the derriere to accomplish. So I’ll jump back into SQL 2016 to accomplish this with the built-in support for JSON.
I start off by creating a temporary table and storing the results from the OPENJSON function in it. Since we need to ignore not only the item, but any child item(s) also, I’m also going to calculate and store a hierarchy for the item, as well as the parent hierarchy for the item:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
IF OBJECT_ID('tempdb.dbo.#temp') IS NOT NULL DROP TABLE #temp; CREATE TABLE #temp ( Lvl INTEGER, [key] NVARCHAR(4000), [value] NVARCHAR(MAX), [type] INTEGER, Hierarchy NVARCHAR(4000), ParentHierarchy NVARCHAR(4000), Exclude BIT DEFAULT (0)); WITH cteShredJson AS ( SELECT 1 AS Lvl , ca.* , CONVERT(NVARCHAR(4000), [key]) AS Hierarchy FROM OPENROWSET(BULK 'C:\Users\Administrator\Desktop\Day12.txt', SINGLE_CLOB) ImportFile(FileText) CROSS APPLY OPENJSON(ImportFile.FileText) ca UNION ALL SELECT Lvl+1 , ca.* , CONVERT(NVARCHAR(4000), Hierarchy + CONVERT(NVARCHAR(4000), ca.[key])) AS Hierarchy FROM cteShredJson CROSS APPLY OPENJSON(cteShredJson.value) ca WHERE ISJSON(cteShredJson.value) = 1 ) INSERT INTO #temp (Lvl, [key], value, type, Hierarchy, ParentHierarchy) SELECT *, LEFT(Hierarchy, LEN(Hierarchy)-1) FROM cteShredJson; |
In this example, all of the properties are a single character, and all of the indexes are less than nine (represented by a single character), so I can get the parent hierarchy but just removing the last character from the current hierarchy. In the next step, I need to identify all of the objects that should not be evaluated, and update the “Exclude” column in the temp table to be set so that they can be excluded from the query.
1 2 3 4 5 6 7 8 9 10 11 |
WITH cteRedObjects AS ( SELECT [key], Heirarchy, ParentHeirarchy FROM #temp WHERE [value] = N'red' AND [key] NOT LIKE N'[0-9]' -- 0-9 is an array element; anything else then this is the name of the object property ) UPDATE t SET Exclude = 1 FROM #temp t JOIN cteRedObjects c ON t.Heirarchy LIKE c.ParentHeirarchy + '%'; |
In this section of code, I get all of the objects where the value = “red”, and where the key (remember, this is the property name or index position) is not a number (indicating an index position). For all of these, I set the items that are like the ParentHierarchy to be excluded. The final step is to count up the numbers, just like previously:
1 2 3 4 5 |
SELECT SUM(CONVERT(INTEGER, [value])) FROM #temp -- get just the integers WHERE [type] = 2 AND Exclude = 0; |