Wayne Sheffield

My blog about SQL Server

Browsing Posts published by Wayne Sheffield

secret_iTarget. Home Depot. NASA. U.S. Army. Anthem. Wall Street Journal. MarketWired. Ashley Madison. What do they all have in common? They are all recent victims of cyber-attacks – several of these are confirmed to involve SQL Injection. Even though the exact method used to penetrate the other systems has not been released, experts believe that SQL Injection was involved in all of these cases. Take a look at http://codecurmudgeon.com/wp/sql-injection-hall-of-shame/ for places that are known victims of SQL Injection.

The part that is really astonishing is that we have known how to stop SQL Injection for more than a decade. Completely. Dead. In. Its. Tracks. Which means that the problems happening today are because of careless coding techniques, and as such, they can be prevented.

The “Funky query”

A few years ago, I was working for a company that developed and sold software. One day, my supervisor sends me an email titled “Funky query”. Hmm, that sounds interesting… I’ll open this up right away, instead of ignoring it until I finish what I’m currently working on (like I would usually do).

Wayne… one of our customer support engineers sent us this from the application error log of a customer. They were running a report. Any ideas on how this funky query came to be?

Query from error log (genericized… because I can’t use the real query):

SELECT * FROM sys.objects WHERE name = ‘O’Malley’

Details from the error log:

Me: Umm, boss… Does this report allow users to enter in search criteria?

Boss: But of course!

Me: Well, I really hate to tell you this, but we have a SQL Injection problem.

And after a bit of back and forth where the developers were insisting that no way was there a SQL Injection problem, I sat down with the dev team lead and the boss and proved it to them. We created a dummy table in the database, went to the report criteria form, and I dropped the table.

Wayne: +1000

Development Team: -1000

Defining SQL Injection

Before we go into how SQL Injection occurs, let’s first examine what it is. SQL Injection is simply a code injection technique that allows SQL code that the user enters in the front-end to be run against the back-end database.

Please note a few things about this definition:

  1. SQL Server was not mentioned. Any and all database platforms are susceptible to SQL Injection. SQL Server. Oracle. PostgreSQL. DB2. It doesn’t matter – all database platforms are susceptible to SQL Injection.
  2. A generic front-end is mentioned. This is anything that connects to a back-end database, be it a web form or your favorite application. Anything that connects to a database can allow SQL Injection.
  3. Improperly written code within the database (such as a stored procedure) can allow SQL Injection. It’s all about how the query is created and run against the database.
  4. There are other forms of Injection attacks that can occur; SQL Injection is just one. Other forms include Cross-site scripting, Web Parameter Tampering, LDAP Injection and XPath Injection (as I write this, WordPress 4.3.1 is being released to fix two Cross-site scripting issues).

SQL Injection can happen in three different ways:

  1. Classic – this is when a user enters SQL code, and can immediately view the results on screen.
  2. Blind – this is essentially a classic attack, but the user does not see the results. Things may still happen, and the user may be able to send results to files that can be downloaded, but the results are not shown on screen.
  3. Second Order – the user supplied SQL code is actually stored in the database, and at a later point in time it is run.

Creating SQL Injection

SQL Injection happens when a SQL statement is created from the data supplied by the user, and it is run by the application or other process. Obviously the intention of the application is to only run the queries that it supplies. Consider a login form with user name and password data entry fields. Assume that the user enters “Mickey” as the user, and “1234” as the password. In this case, the application essentially performs:

This will create the SQL statement:

And when we run this, we get the expected results. But now, a hacker becomes the user. Being wise to SQL Injection, the hacker enters for the password the string

This results in the following SQL Statement being generated:

Since every row matches the “OR 1=1” predicate, every username and password will be returned.

Hacker: +1

Developer: -1

Notice that I showed this attack with T-SQL code – this means that you can have SQL Injection vulnerabilities within your T-SQL code (such as stored procedures), as well as any SQL commands that the application might construct and send to the database. The application might be properly handling for SQL Injection, but if it is calling SQL code that is improperly coded, you can still be vulnerable to SQL Injection. Everything has to be done properly – all it takes is one point where SQL Injection is possible to become the latest victim.

What can a hacker do with a SQL Injection vulnerability?

At this point, the hacker knows that the application is susceptible to SQL Injection, and can now issue any SQL statement desired. Depending on the access level that the application has to connect to the database, the hacker can:

  1. Get a list of all of the databases in the instance.
  2. Get a list of all of the tables in the current database, or any other database.
  3. Query the data in any of the other tables that can be seen.
  4. Modify (insert / update / delete) data in any of the tables that can be seen.
  5. Create / modify / drop any object in any database on the instance.
  6. Create elevated permissions at the SQL Server instance level. This will allow the hacker to do anything they desire on this instance. Drop tables? Yes. Query any and all data? Yes.
  7. Create elevated permissions at the Operating System level.
  8. Install programs on the server.
  9. Access any server on the network, and perform all of the above on those servers.

Scary? Absolutely. However, as I mentioned before, we know how to prevent SQL Injection. And it turns out that it’s just not that hard. Instead of making SQL Statements that include data (either as directly supplied from the user, or from the database), pass that data to the SQL statement through parameters.

If we take the previous example, and modify it to use parameters, we would have:

What’s the chances that you have a password of “1234′ OR 1=1; –” in your system? Assuming that you don’t, this query will return zero rows. SQL Injection has been prevented.

So now you want an example of the difference between classic and blind SQL Injection? Well, I’m not going to show you an example… because this will normally come down to how the application handles the data being returned. In the “OR 1=1” example, if the application only shows the first record, then the hacker won’t see the results of the SQL Injection (unless they notice that it’s showing the wrong data). Or the application may have been coded to expect only 1 row, and if more are returned then it won’t display any.

Second Order SQL Injection

But how about a second order SQL Injection? Let me explain that with an example. Imagine a form for entering new users. The development team has learned about the SQL Injection on the forms, and they are now using parameters. When the hacker enters for the password:

1234′ OR 1=1;–

The generated SQL statement is:

When the data is queried:

We get the following results:

The data has been stored in the database, and we are now primed for a second order SQL Injection attack. Some other process comes along (for example, a month-end report), and it creates dynamic SQL by building the SQL statement from the data. And the second order SQL Injection has just been released…

What have we learned so far?

  1. SQL Injection can happen to any database system.
  2. SQL Injection can be within SQL code (stored procedures), or it can be in the statements passed to the database from the application.
  3. Parameters will prevent the SQL code from being executed immediately, thereby preventing Classic and Blind SQL Injection attacks.
  4. The SQL Injection attempt can still be stored in the database, priming yourself for a Second Order SQL Injection attack.

Preventing SQL Injection

Whenever you are creating a SQL statement, it must never create the statement with data (either from the database, or from what the user may pass). Instead, all SQL statements must use parameters. Even the month-end report.

Within SQL Server, have your queries reference the parameters (ColumnName = @VariableName). If you must use dynamic SQL to build your statement, use sp_executesql – it allows for the building of SQL statements with parameters, and then defining and passing those parameters in as the procedure is called.

In your application, there are several component choices. Some directly execute SQL, and others can use parameters. You want to always use the components that use parameters, and you must use those parameters.

Learning more

The Open Web Application Security Project is a community-driven, non-profit organization where experts analyze web application security issues and then compile a list of what the biggest issues are. For the last several years, the category “Injection” has been the biggest issue, and this includes SQL Injection.

This isn’t all that they do. They also have educational resources to not only instruct you on how these issues manifest themselves, but what to do about them. I strongly encourage everyone to take this resource back to your company, and to start identifying and addressing security issues in your code.

Another issue is that developers don’t really know about SQL Injection. In the workshops that I have given where this topic is presented, very few attendees have seen SQL Injection in a test environment. Most can barely give an adequate description of what it is. Those that have seen it are usually those who have been a victim of it. So I also recommend getting your company to arrange for SQL Injection training for the developers and DBAs. It would be good to send your QA team to this also.

This post is re-published from my original post on SQL Solutions Group.

NoProfilerIn a prior post, I shared a script that will take a running trace and show you the XE events that it relates to, and what columns are available within those XE events. Specifically, this was for converting a deadlock trace into an XE session; however the process is the same for converting any trace into an XE session. In today’s post, we’ll compare the deadlock trace and the new XE by running both, creating a deadlock, and comparing the captured data. We’ll look at the data captured in the XE from both script and GUI, and look at a few other differences between running a trace and an XE session.

The first step is to grab the trace and XE scripts from the prior post. Modify both scripts to put the output files in an appropriate place on your system. Run both of the scripts to start the trace and to create the XE session. Next, start the XE session with the following script:

The next step is to create a deadlock. Open up a new query window, and run the following. Leave this query window open.

Next, open up a second query window, and run the following code in that window:

Finally, return to the first query window and run the following code, at which point one of the statements in one of the query windows will be deadlocked:

Now that we’ve created a deadlock, let’s compare the trace output data to the XE output data. First, let’s grab the data from the trace file with this script, which selects all of the non-null columns from the table valued function (remember to change the filename/path as appropriate):

SSMS2014XE09-XE-Collected-Trace-Data

SSMS2014XE10-XE-Collected-Trace-Data

SSMS2014XE11-XE-Collected-Trace-Data

We now want to compare this to the data that was collected by the Extended Event session. In SSMS, expand the Management Tree, then the Extended Events tree, Session tree, and the Deadlocks tree:

SSMS2014XE07-XE-Session-target

The event file target can now be seen. Double-click the file target to view the data, and you can see the events that fired for the deadlock. Note that you can right-click on the grid header, and choose additional columns to put into the grid – for this XE session, I like to add the database_name, resource_description, resource_owner, resource_type and xml_report columns.

SSMS2014XE08-XE-Collected-Data

By selecting one of the rows, all of the data will be available in the lower grid in a Name/Value format.

Let’s compare the XE output to the trace output, and ensure that the data captured by the trace is present in the XE output. The first trace column is TextData. In that, we see multiple bits of data: SPID, lock resource, and the deadlock graph in XML format. In the XE, the resource_description column has the lock resource, the xml_report column has the deadlock graph in XML format, and the session_id column has the SPID. Skipping the BinaryData column, the next column is the NTUserName. Here we can see that this is collected in the XE nt_username column. A minor difference is that the XE includes the domain name, and the trace doesn’t.

The ClientProcessID column maps to the client_pid column. Likewise, the ApplicationName maps to client_app_name, LoginName maps to server_principal_name, SPID maps to session_id, Duration maps to duration, StartTime maps to timestamp, ObjectID maps to object_id. The EventClass, Mode and Type columns maps via friendly names to the name, mode and resource_type columns. The only columns that aren’t explicitly in the XE results are the BinaryData, EndDate and ServerName columns. The EndDate can be calculated from the timestamp and duration columns, and we could have selected the server_instance_name in “Global Fields (Actions)” tab when configuring the XE session. So here, we can see that everything needed from the trace is available in the XE session.

For the next step, one might want to load the data into a table, and viewing it in the viewer doesn’t allow for this. Or maybe you are working on a server prior to SQL Server 2012, and the GUI isn’t available for you to use. The following script can be used to query the data from the file target (again, remember to change the file name/path as necessary):

SSMS2014XE12-XE-target-file-query-results

As can be seen from the output, the data that we most want to see is stored as XML (when working with Extended Events, all of the data is stored as XML) in the event_data column, so I have converted this column to XML. At this point, you can click the XML output, and it will be opened up in text, allowing you to see all of the data as it is stored in the XML output.

SSMS2014XE13-XE-target-file-event_data-results

From this point, it’s just a matter of modifying the query to return the columns that you are interested in.

This query extracts the data from XE file target. It also calculates the end date, and displays both the internal and user-friendly names for the resource_type and mode columns – the internal values are what the trace was returning.

For a quick recap: in Part 1, you learned how to convert an existing trace into an XE session, identifying the columns and stepping through the GUI for creating the XE session. In part 2 you learned how to query both the trace and XE file target outputs, and then compared the two outputs and learned that all of the data in the trace output is available in the XE output.

This post is re-published from my original post on SQL Solutions Group.

NoProfilerIn this brave, new world of Extended Events (XE, XEvents), I find myself with a mixture of scripts for troubleshooting issues – some use XE, and some use traces. We’ve all been told that XE is a much better system (it is much more lightweight, causing less of an issue with the server). In fact, it is so much better that Microsoft has deprecated SQL Trace and SQL Profiler, and in the future, one will not be able to run any traces at all. Believe it or not, this is a good thing!

TSQL TuesdayIt just so happens that today is the 67th installment of the monthly T-SQL Tuesday blogging event. T-SQL Tuesday, that wonderful monthly blogging party started by Adam Machanic where a selected host challenges the SQL Server universe to have a blog post about a specific topic. This wild frenzy of SQL Server blog posting occurs on the second Tuesday of each month. This month, it is being hosted by my friend (and the person with the highest amount of energy known to mankind) Jes Borland (b / t), and the topic that she has chosen is Extended Events. Her specific challenge to the SQL Server universe is:

I want to know (and others do, too) how you’ve solved problems with Extended Events. What sessions have you created? What unique way have you used predicates or targets? What challenges have you overcome?

The biggest challenge that I have is not a technical challenge… it’s a personal challenge: actually getting started with XE. I have so many scripts for doing traces, that I just immediately use them instead of the better XE system. I really need to wean myself off of using Profiler / traces. Therefore, I’ve decided to start converting my trace scripts into XE scripts, and I’ll share with you how I go about doing it. Today, I’m going to look at my favorite trace script – a trace to capture deadlock information.

First off, let’s start with the trace script:

From this script, you can see that the trace is collecting data from three events (Deadlock Graph (148), Lock:Deadlock (25) and Lock:Deadlock Chain (59)) and several columns for each event. The next step is to convert all of this information into XE events / actions. For this, I’ll modify the script in BOL (at https://msdn.microsoft.com/en-us/library/ff878264.aspx) to the following to return the XE events / actions for a specified running trace. Since the trace needs to be running, execute the above trace script to start the trace, then run the following code:

These results may contain NULL values in the “Extended Events action” column. A NULL value here means that there is not a corresponding event action for that column. For my deadlock trace, I get the following results:

SSMS2014XE00-TraceXExref

With this information, I can now jump into SSMS (2014) and use the New Session wizard to create a new XE session (Expand the Management and Extended Events nodes. Right click on Sessions and select New Session). I name the session “Deadlocks”, and click the “Events” page to select the events that I want to use.

SSMS2014XE01-NewXESession

Since all of the events that I’m looking for contain “Deadlock”, I search for this and 5 events are displayed. I double-click each of the three listed above (or select them and click the “>” button) to move them to the “Selected Events” grid.

SSMS2014XE02-SelectEvents

The next step is to configure the columns, so click the Configure button.

SSMS2014XE03-ConfigureEvents

Select the lock_deadlock event, and on the “Global Fields (Actions)” tab select the client_app_name, client_pid, database_name, nt_username, server_principal_name and session_id columns.

SSMS2014XE04-ConfigureEventColumns

Click on the “Event Fields” tab, and check the checkboxes for database_name and resource_description. Fields that show up in this tab with checkboxes have customizable actions, and don’t actually collect the information unless checked. Frequently this will be because these fields require an additional level of resource usage to gather this information.

SSMS2014XE05-ConfigureEventColumns

Select the lock_deadlock_chain event, and select the session_id and database_name columns on the “Global Fields (Actions)” tab. Click on the “Event Fields” tab and check the checkbox for database_name and resource_description. Finally, select the xml_deadlock_report event, and select the server_principal_name and session_id columns on the “Global Fields (Actions)” tab.

Finally, click on the Data Storage page, and add an event_file target type. The file name defaults to the name that you gave the session earlier. I set the maximum size to 250mb and 5 rollover files. If you select to script to a new query window (HIGHLY recommended), you will have a script with all of the DDL necessary to create this XE.

SSMS2014XE06-ConfigureDataStorage

At this point, there are two changes that I like to make. First, I like to make my scripts so that they won’t have an error, and if the XE session already exists then one will be generated. To make it bullet-proof, I add an IF EXISTS check at the top and drop the XE session if it already exists. Secondly, I like to call xp_create_subdir to create the directory that the script points to, just in case the directory doesn’t exist. Note that xp_create_subdir will be successful if the directory already exists, so I don’t check to ensure that the directory doesn’t exist before executing this procedure.

My final XE Deadlock script looks like:

Now that you have this XE session scripted out, it can be easily installed on multiple servers. If you encounter a deadlock problem, you can easily start the XE session and let it run to trap your deadlocks. They will be persisted to a file dedicated for the deadlocks. You can use my Deadlock Shredder script at http://bit.ly/ShredDL to read the deadlocks from the file and shred the deadlock XML into a tabular output.

Note that the default system_health XE session also captures deadlocks. I like to have a dedicated session for just deadlocks. As lightweight as XE is, sometimes it may benefit a server to turn off the system_health session. Additionally, Jonathan Kehayias has a script that will take a running trace and completely script out an XE session for it. This script can be found at https://www.sqlskills.com/blogs/jonathan/converting-sql-trace-to-extended-events-in-sql-server-2012/. Even though this script is available, I like to figure things out for myself so that I can learn what is actually going on.

Did you catch the note above where I mentioned that the default system_health XE captures deadlocks? This means that if you are still enabling trace flags 1204 and/or 1222 to capture deadlock information in your error logs, you don’t need to do that anymore. Furthermore, by using this Deadlock XE, you can have the deadlocks persisted to a file where, in combination with my Deadlock Shredder script, it will be even easier to analyze the deadlocks than trying to figure it out from the captured information in the error logs.

I hope that this post will help someone with the conversion from using SQL Profiler to using XE instead. The process for converting other traces would be the same, so it can be easily adapted. I also hope that these deadlock scripts will be helpful to you!

In my next post, I will be comparing the results of the deadlock trace and the deadlock XE session.

And Jes – thanks for hosting this month! I’m really looking forward to seeing the roundup for this month.

This post is re-published from my original post on SQL Solutions Group.

I was recently reading this msdn article on Ghost Records, and it mentioned that you could get the number of ghost records on a page with DBCC DBTABLE… and it also mentioned that you need to be sure that you enable Trace Flag 3604 in order to see the results. So, two things immediately jumped out at me. First, I wanted to look at this to see where the ghost records were located. Secondly, I’ve just written a few articles (here, here, here and here) where I’ve been able to use the “WITH TABLERESULTS” option on the DBCC command to avoid using this trace flag and to provide automation for the process, and I wanted to see if that would work here also.

The good news is that “WITH TABLERESULTS” does indeed work with DBCC DBTABLE. The bad news is that I could not find the ghost record count in the results.

When I was looking for this information, I noted that the results meta-data are identical to the way DBCC PAGE has its output, so this means that the automation processes already developed will work for them. And as I was looking through the results, looking for a ghost record counter, I noticed two interesting fields:

Field Value
m_FormattedSectorSize 4096
m_ActualSectorSize 512

Hmm, this is showing me the disk Sector Size of each database file. After checking things on a few different systems, it looks like the m_ActualSectorSize is what the sector size is for the disk that the database file is currently on, and the m_FormattedSectorSize appears to be the sector size for when the database was created – and it is copied from the model database, so it appears to be what the disk was like when Microsoft created the model database.

Since it’s a best practice to have the disk sector size (also known as the allocation unit size or block size) set to 64kb (see this white paper), I decided to programmatically get this information. After digging through the Object and ParentObject columns, this script to get the current allocation using size (Sector Size) for each drive was developed:

And here we have yet another way for how a process can be automated by using “WITH TABLERESULTS” on a DBCC command. I think that this one is a particularly good one to show the possibilities – to get this information you have to hit multiple parts of the DBCC results, and repeat it for each file in each database. Doing this by using the 3604 trace flag, finding the appropriate piece and then proceeding on to the piece would be very time consuming to do manually.

Finally, a quick note here: there are better ways of getting the disk sector size – since you can get it with WMI calls, you can get it with PowerShell (or even dos), and there are also command line utilities that will also get you this information. This is just a way to do it from within SQL Server. Note also that this only gets the drives that contain database files on this SQL Server instance – if you are looking for other drives, then this won’t work for you.

Take a look at the other fields that are available in DBTABLE – you just might find another item that you’d like to be able to retrieve.

Previous related posts:

Persisting DBCC Output
Automating DBCC Page
Automating DBCC Page, Part 2
Automating DBCC Page, Part 3

This post is re-published from my original post on SQL Solutions Group.

SQLSaturday SpeedPASSes

I’ve been working with running SQLSaturdays in the Richmond, VA area for several years now. It seems that every year, I end up blogging something new. Frequently, this ends up being a script of some kind to help me out. This post follows this theme by introducing a script for working with SQLSaturday SpeedPASSes.

The SpeedPASS for SQLSaturday is used for many things. It has the attendee’s admission ticket, lunch ticket, name badge and raffle tickets for all of the vendors. What most organizers do for their local SQLSaturday is to do whatever it takes to entice the attendee to pre-print their ticket. However, it seems that there are always people that forget, or didn’t realize that they should have done this. Last year, at our event, we had about 40% of the people that needed their tickets printed out for them on the spot. This causes lines waiting as we hunt up their SpeedPASS on the registration site, or find it in an Excel document to get the link to those that have already been downloaded.

I mentioned getting the link in an Excel document. This is because PASS generates the SpeedPASS into a PDF file that has a GUID as its filename. Specifically, it is the “Invoice ID” guid in the attendees registration. The Excel document that the site administrators can download of the registrations has this, along with their name. So we look up the name, find the invoice id, and match this to the PDF file for that person. Open it up, and print it out. One by one. Does this sound like a cursor to you also?

Changing things up

This year, our organizing committee decided that this was too much hassle, and that we were going to print out the SpeedPASSes for everyone ahead of time and to have it ready for them as they walk in the door. What we have accomplished is to shift the line from waiting to print out their SpeedPASS, to finding their SpeedPASS. However, we have hundreds of these SpeedPASSes to deal with. Now we have to get all of the PDFs, open them up (one-by-one), and print them out. After they have all been printed, they need to be sorted by last name. Manually.

Hello… this is insane. Surely, there has to be a better way. So I started looking.

A better way…

My good friend, Mr. Google, found this post by Kendal Van Dyke. This post has a PowerShell script that will download and merge all of the PDFs for selected attendees into one big PDF. This enables printing out all of the SpeedPASSes at once, instead of one-by-one. However, I have a couple of problems with this script in it’s current form. First, the instructions for how to get the information from the SQLSaturday admin site have changed (they did do a major web site change last year). Secondly, it downloads all of the PDFs one-by-one, and puts them into a temporary directory, where they are all merged together. In file-name order. Not alphabetically. This means that the manual sorting is still necessary. But hey – it’s PowerShell. Surely we can come up with a way to do this sorting for us!

So I decided to re-write this script to suit my needs. Kendal’s script downloads the SpeedPASS PDF files one-by one. However, the admin site allows us to download all of them in one zip file. I like this approach better. I ended up making two major changes to the script. The first change requires pre-downloading and extracting all of the SpeedPASS files. The second change is to get them to merge alphabetically. Like Kendal’s script, this uses the PDFSharp assemblies. This requires using PowerShell 3.0 or higher.

The new instructions for using this script

  1. Download and save the PDFSharp Assemblies from http://pdfsharp.codeplex.com/releases.
    1. Unblock the zip file (right-click the file, select properties, and click “Unblock”. See Kendal’s post for pictures showing you how to do this), and save the files in the zip file to your desired location.
    2. Put this location in the $PdfSharpPath variable in the script.
  2. Log into the SQLSaturday admin site and navigate to Event Settings > Manage SpeedPASS.
    1. Cick the button to “Generate All” SpeedPASSes.
    2. Click the button to “Download SpeedPASSs” and save to disk.
    3. Extract the files from the zip file.
    4. Put this location in the $SpeedPASSPDFPath variable in the script.
  3. Go to Event Settings > Manage Registrations.
    1. Click the “Export to Excel” button.
    2. Save to disk.
  4. Open the Excel file and Sort as desired (I used lastname / firstname).
    1. If you only want to do the speakers, filter for complimentary lunches or not.
  5. Copy the InvoiceID column from Excel to a text file and save the file.
    1. Put the complete filename for this file in the $SpeedPASSUrlPath variable in the script.
  6. Ensure that the following variables are set. Run the script to merge the PDF files into a single PDF.
    1. Set the $PdfSharpPath variable to the location of PDFSharp DLL (from #1 above).
    2. Set the $SpeedPASSPDFPath variable to the location where the extracted PDF files are (from #2 above).
    3. Set the $SpeedPASSUrlPath variable to the location of the text file with the InvoiceID values in it (from #5 above).
  7. Finally, open the merged PDF file. You will see that they are ordered by how you ordered the Excel spreadsheet. When you’re ready, print them out.

The PowerShell script.

Some of it is exactly how Kendal’s is, but the Merge-PDF2 function is a re-write of his Merge-PDF function to handle the differences from above.

There you have it. Working with SQLSaturday SpeedPASSes to generate one PDF for all selected SpeedPASSes sorted by name. Eazy-Peazy.

You might be wondering what is up with that picture up above. Yes, it’s a Dr. Who picture. My friend, Grant Fritchey, challenged folks to incorporate this picture into a technical post – read his challenge here. I hope that I did it justice.

While you may not need to worry about the physical location of a row very often, every so often the need comes up. In a few of my earlier posts (here), I have needed this information to prove a point. This post shows how to identify a row’s physical location. It will also show the various methods used to work with this value once returned.

Acquiring the physical location of a row

SQL Server 2008 introduced a new virtual system column: “%%physloc%%”. “%%physloc%%” returns the file_id, page_id and slot_id information for the current row, in a binary format. Thankfully, SQL Server also includes a couple of functions to split this binary data into a more useful format. Unfortunately, Microsoft has not documented either the column or the functions.

The first of these functions is “sys.fn_PhysLocFormatter”. This scalar function receives the %%physloc%% virtual column as an input. It splits the input apart into the file, page and slot information, and then returns the information in a column formatted in the form file_id:page_id:slot_id. This is the function that I used in the above posts.

The second function is “sys.fn_PhysLocCracker”. As a table-valued function, it is used either directly in the FROM clause or by using the cross apply operator. It also receives the %%physloc%% virtual column as an input, and returns the file_id, page_id and slot_id as columns in a table. In at least one of my previous posts, this would have been a better function to have used.

Example usage:

Let’s see some examples of using the virtual column and these functions. In the following script, we start off by creating a temporary table with 700 rows of data. The first query after that uses the sys.fn_PhysLocFormatter function, returning all of the physical location information about the row into one column. The subsequent query uses the sys.fn_PhysLocCracker function, returning the file_id, page_id and slot_id values from the physical location of the row. This allows these values to be used in subsequent processing – say maybe for automating DBCC Page?

This returns:

physloc_query_results

So there you go. %%physloc%% identifies the physical location of a row. Then the sys.fn_PhysLocFormatter function will decipher this and expose it in a composite form, while the sys.fn_PhysLocCracker function will decipher it and return the values as columns in a result set. You can use whichever one suits your needs.

Performance

When I ran these statements with the “Include Actual Execution Plans” option enabled, I saw:

physloc_execution_plan

The bottom statement has the infamous “Table Valued Function” operator in it. SQL Server uses this operator when dealing with a multi-statement table-valued function. If you happened to have read my recent post comparing functions, you will recall that these are the worst performing type of functions. So I decided to look at the code of these functions, to see if sys.fn_PhysLocCracker could be improved upon. Running sp_helptext on these functions returns the following code:

sys.fn_PhysLocFormatter

sys.fn_PhysLocCracker

It seems obvious to me that the multi-statement function was created by cloning the scalar function. Furthermore, this code can be improved upon and converted into an inline table-valued function. An inline table-valued function for this functionality would look like this:

Now, since this is an undocumented function, I don’t expect that this function will ever be upgraded. But if you ever need it, you could create it yourself. Alternatively, you could just code it into the cross apply operator yourself, like so:

Which will return the same information, without the performance inhibiting multi-statement table-valued function being used.

In several of my last few blog posts, I’ve shared several methods of getting internal information from a database by using the DBCC PAGE command and utilizing the “WITH TABLERESULTS” option to be allowed to automate this process for further processing. This post will also do this, but in this case, we’ll be using it to bust a common myth—data in a clustered index is physically stored on disk in the order of the clustered index.

Busting this myth

To bust this myth, we’ll create a database, put a table with a clustered index into this database, and then we’ll add some rows in random order. Next, we will show that the rows are stored on the pages in logical order, and then we’ll take a deeper look at the page internals to see that the rows are not stored in physical order.

To start off with, let’s create a database and a table, and add a few rows to this table:

What we have is a table with 5 rows. The table’s clustered key is the RowID integer column. The rows are inserted so that the odd rows are inserted first, followed by the even rows.

Determine row’s physical location

At this point, let’s look at where the system reports these rows to be at. To do this, we’ll utilize two undocumented system commands. The first is %%physloc%%, which returns the physical location in a hexadecimal format. The second is fn_PhysLocFormatter, which converts this into a format of FileID:PageID:SlotID. This is just simply added to the select clause, so the query is:

This query produces these results:

DBCCPage3-1

From these results, you can see that the rows are all on the same page, and that the slot in the slot array on that page are incrementing for the appropriate RowID value. Remember also that the slot array is zero-based, where the first slot is slot #0. This is the logical order.

How inserts change the slot the row is in

To see this changing around as the rows are inserted, just put the select statement (which I just introduced) after each of the prior insert commands and run the script to create the table and add the rows again. You will initially see RowID=1 put into Slot #0, RowID =3 into Slot #1 and RowID=5 into slot #2. When you then add RowID=2, this needs to be between RowID #s 1 and 3, so #2 is now in slot array #1, #3 moves to slot array #2, and #5 is moved to slot array #3. When you add RowID=4, it gets inserted into slot #3 and RowID#5 is again pushed down, to slot #4. The logical order follows what we are expecting:

DBCCPage3-3

Check the physical storage

In order to show that these are not physically stored in this order, we will need to crack this page and look internally at where the data is actually stored. We will accomplish this with this code:

Which produces these results:

DBCCPage3-2.

The row’s offset is where the row starts at on the page

As we look at these results, pay close attention to the Offset. This is where the row physically starts on the page. You can see how for RowID #2, that this offset is higher than the offset for RowID #3… and even RowID #5. When the row was added, the data was added to the end of the other existing rows on the page, and the offset where this data starts was entered into the slot array, after having the remaining slot array entries pushed down in order to maintain the correct logical order. We can see this happening once again when RowID #4 is inserted.

And, finally, let’s use DBCC PAGE to crack open this page and look at the raw data. For this, we want to use dump style 2:

If you expand out the VALUE column, you can see it go from Row01 to Row03 to Row05, then to Row02 and Row04.

So there you go… the data is physically stored in the next available space on the page; however, the page’s slot array is in the clustered key order. Furthermore, we now know that we can look at the offset and determine the order on the page. And we also have yet another way to use DBCC PAGE in an automated manner to show this.

Previous related posts:

Persisting DBCC Output
Automating DBCC Page
Automating DBCC Page, Part 2

This post is re-published from my original post on SQL Solutions Group.

Today is yet another T-SQL Tuesday – the monthly blogging party started by Adam Machanic (b|t) to get everyone in the SQL Server blogosphere together and to blog about a topic. Today’s party is hosted by none other than Matt Gordon (b|t), and the topic that he has chosen for us to blog about is “Fixing old problems with Shiny New Toys”. Specifically, he says:

What I’d like to see from the blog responses for this T-SQL Tuesday is how you’ve used a “new” Microsoft data platform toy to fix an old problem.

I’ve decided that I’m going to turn this around. Sometimes, those Shiny New Toys cause problems. Lets talk about one such case.

Recently, I was helping a client with an issue where when a job would run, the CPU on all processors would max out. This in turn would cause other performance issues on the server:

After a bit of investigation, I was able to determine that the statement causing this was a MERGE statement. The MERGE statement was a Shiny New Toy introduced in SQL 2008. This MERGE statement had both a WHEN MATCHED and a WHEN NOT MATCHED BY TARGET clause inside it.

What does this mean?

BOL defines the MERGE statement as:

Performs insert, update, or delete operations on a target table based on the results of a join with a source table.

This is performed with just one pass of the target table.

There are three possible operations that can be used within the MERGE statement. These are:

WHEN MATCHED – this allows you to perform an UPDATE or DELETE against the target table using the source table. There is a row existing in both tables for this match. If you were performing just the individual UPDATE or DELETE statement, you would use an INNER JOIN between the source and target table.

WHEN NOT MATCHED BY TARGET – this allows you to perform an INSERT of rows into the target table from rows in the source table that are not in the target table. If you were performing just the INSERT statement, you would drive it with a SELECT statement that performs a LEFT OUTER JOIN between the source and target tables, checking for a (not-null) column in the target table being NULL (signifying that there wasn’t a matching row).

WHEN NOT MATCHED BY SOURCE – this allows you to perform an UPDATE or DELETE against the target table when the target table has rows that the source table does not. If you were performing just the UPDATE or DELETE statement, you would use a RIGHT OUTER JOIN between the source and target tables… or how most people would write it, a LEFT OUTER JOIN between the target and source tables.

Keep in mind that my examples above are what you would do if you were separating the commands. Individually, these different conditions perform just fine with the MERGE statement. However, when the MERGE statement combines multiple match conditions, then think about what SQL has to do. See the chart at this link for when the different types of join conditions are used in the MERGE statement.

In this case, it’s doing both a WHEN MATCHED and a WHEN NOT MATCHED BY TARGET. In order to process the target table in just one pass, it will frequently end up performing worse. It can also end up using more resources in the process. I frequently find it better to just use the individual statements.

By separating this out into the respective statements, each can be optimized to perform better. Which is what I did… an UPDATE statement followed by an INSERT statement. Not only did the job run faster overall, but this change also stopped the CPU from maxing out:

Ah, much better.

There are other issues with the MERGE statement. It has bugs… some of which can cause database corruption.

Here we have a Shiny New Toy (feature), supposed to make life easier, yet it causes problems. Until it can perform better (and the bugs are eliminated), I just don’t use it.

Beware the Shiny New Toys.

In a prior blog post, I demonstrated how using DBCC PAGE can be automated by using the “WITH TABLERESULTS” option. In this post, we will continue with another look at how this can be done.

On a nice wintry day, your city ended up being covered in several feet of snow. During the course of the night, your own house had several power outages. Being concerned about your databases, you shoveled your way into your office, so that you could check on things. (Okay… actually you would just VPN in, but this is my story after all…)

Once you get into your server, you check the jobs and find that your job that runs DBCC CHECKDB has failed. Let’s assume that a power glitch has caused corruption in your database. In order to find out what all is affected, you run DBCC CHECKDB WITH NO_INFOMSGS, ALL_ERRORMSGS. But, this happens to be on your 2TB database, and it will take a while for CHECKDB to finish so that you can find the scope of corruption.

You’d really like to know what tables are affected without having to wait. Luckily(?), this corruption was recorded in msdb.dbo.suspect_pages, and having just recently read Paul Randal’s post here, we know we can use DBCC PAGE to determine this information. And, after having read my prior blog post, you know that we can automate DBCC PAGE, so we can use our new friend “WITH TABLERESULTS” to find out what objects have been corrupted.

The suspect_pages table, documented here, has three particular columns of interest: database_id, file_id and page_id. These correspond nicely to the first three parameters needed for DBCC PAGE. To automate this, we need to know what information we need to return off of the page – and from Paul’s post, we know that this is the field “METADATA: ObjectId”. For this code example, let’s assume that this corruption is on page 11 of the master database (just change “master” to the name of your 2TB database).

And there you go… you now know which object it is that has the corruption. In the same fashion, another interesting field that is returned is the IndexId – the Field value is “Metadata: IndexId”. It would be a similar exercise to grab that from this page also, an exercise that I’ll leave to you.

An automated method for getting the object from all suspect pages would entail encapsulating this logic into a cursor to spin through each row in the suspect_pages table (and I’ll even throw in getting the index_id also):

If you happen to be on SQL 2012 or higher, this can be greatly simplified by using the new (undocumented) DMO function sys.dm_db_database_page_allocations (and it also takes away the need to crack the page using DBCC PAGE).

And there we go – yet another time when you might want to automate using DBCC PAGE. By now you should be able to see other uses for it – as long as you can get the database_id, file_id and page_id, you can automate the usage of it to retrieve the information that you are looking for.

Previous related posts:

Persisting DBCC Output
Automating DBCC Page

This post is re-published from my original post on SQL Solutions Group.

On the second Tuesday of each month, the SQL Server universe all comes together to blog about a topic. Known as T-SQL Tuesday and started by Adam Machanic (Blog | @AdamMachanic), this month’s T-SQL Tuesday topic is hosted by friend and fellow MCM Brent Ozar. The theme that Brent has chosen for this month is “SQL Server Bugs & Enhancement Requests”. Specifically, Brent wants us to visit connect.microsoft.com, find an interesting request, and blog about it. Now Brent, this is a fantastic topic for a T-SQL Tuesday. I’m really looking forward to seeing what others blog about. I plan on upvoting those I like to get the attention of the SQL Server team at Microsoft. For what it’s worth, I think that this should be a yearly event to do with T-SQL Tuesday!

My favorite connect item

The connect item that I want to blog about is one that I submitted a few months back. Now, I’ve submitted a few enhancement requests over time (see this link,) and there are two current ones that I’m really keen on. However, Brent only wants us to talk about one, so let’s explore what my favorite one is. It can be found at this link.

Track object last used data

At almost every place that I’ve worked (and many that I’ve consulted at), I’ve been asked to figure out what objects in the database are no longer being used. I’ll bet that you’ve been asked this also. While we can tell when objects were created, it’s not so easy to determine when they were last used.

Now, there are a couple of DMVs that can help us out some. Specifically, sys.dm_exec_procedure_stats will show us information about the procedures that are in the plan cache. In BOL there is some information about what this DMV does:

The view returns one row for each cached stored procedure plan, and the lifetime of the row is as long as the stored procedure remains cached.

Another DMV that can be helpful is sys.dm_exec_index_usage_stats, which shows us various types of index operations and when they were last performed. However, this data is only available as long as the database is online – when the database is shutdown or detached (or when the SQL Server service is shutdown), this data is lost.

In both of these DMVs, the data is not persisted. Now, we can persist this data by running a job every minute or so to write this information out to a table. However, this only tells us a little bit of the picture: when procedures and indexes are used. If an index is being used, then the table (or indexed view) that the index belongs to has been utilized.

What is missing is all of the other objects: triggers, functions, non-indexed views, user-defined data-types, etc. In order to look for these, you would have to get an extended event session to look at all of the sql commands going on, and then to parse the code to find out what objects are being used. Yuck.

My proposal

What I have proposed is to add two columns to sys.objects: last_used_time_user and last_used_time_system. Two columns are proposed to be consistent with how the sys.dm_exec_index_usage_stats DMV tracks usage by user or system objects. I also suggest adding these columns to sys.indexes. For sys.indexes, this could just be the max of the appropriate columns from the sys.dm_exec_index_usage_stats DMV.

I would imagine that tracking all of this information would cause a performance hit, so this should not be on all the time. I suggest an ALTER DATABASE command (ALTER DATABASE TRACK OBJECT USAGE = ON/OFF) to control whether this information is being collected (now if they can figure out a way to do this where the overhead is minimal, just ignore this command).

Taking this further?

Would it be advantageous to know whether a column is being used in a table? Do you have a column in a table that you believe is obsolete? Have you left it in the table because you’re afraid of breaking the application somewhere? So how about adding these same columns to sys.columns? I’d suggest another ALTER DATABASE command to control whether column usage is being tracked (ALTER DATABASE TRACK COLUMN USAGE = ON/OFF).

Summary

SQL Server can make it easier for us to know whether objects (or even columns) are being used. Having this information will make it easier for us to eliminate obsolete stuff hanging around. This stuff takes up space (including backups) and increases time for maintenance.