Wayne Sheffield

My blog about SQL Server

Browsing Posts published by Wayne Sheffield

There is a lot of information within the Windows registry. Sometimes, it would sure be nice to work with the registry within your T-SQL scripts. As it turns out, there are a bunch of undocumented extended stored procedures that do just this. A listing of these procedures are:

Regular Instance-Aware
sys.xp_regread sys.xp_instance_regread
sys.xp_regenumvalues sys.xp_instance_regenumvalues
sys.xp_regenumkeys sys.xp_instance_regenumkeys
sys.xp_regwrite sys.xp_instance_regwrite
sys.xp_regdeletevalue sys.xp_instance_regdeletevalue
sys.xp_regdeletekey sys.xp_instance_regdeletekey
sys.xp_regaddmultistring sys.xp_instance_regaddmultistring
sys.xp_regremovemultistring sys.xp_instance_regremovemultistring

As you can see, there are two categories of extended stored procedures that we can use, with complimentary procedures between them. The “Instance-Aware” category procedures utilizes key word substitution in order to navigate to the appropriate registry keys for the SQL Server instance that is running. I’ll show you an example shortly.

Registry Basics

Before we go too much further, lets explain a few key phrases when dealing with the registry:

Registry: The registry is a hierarchical database used by Windows. It is organized in a tree view manner.

Registry Hive: A hive is a logical group of keys, subkeys, and values in the registry that has a set of supporting files containing backups of its data.

Examples of Registry Hives are:

Registry Hive
HKEY_CURRENT_CONFIG<
HKEY_CURRENT_USER
HKEY_LOCAL_MACHINE\SAM
HKEY_LOCAL_MACHINE\Security
HKEY_LOCAL_MACHINE\Software
HKEY_LOCAL_MACHINE\System
HKEY_USERS\.DEFAULT

Of all of these hives, you will probably use HKEY_LOCAL_MACHINE and HKEY_CURRENT_USER the most.

Registry Key: A node within a registry hive is called a key. A key can contain both sub-keys and values. Sometimes an application will only be looking for the presence of a key; at other times it will need to look at the value.

Registry Value: A registry value is simply a piece of data, of any data type.

The following image shows the tree view layout of the registry, and identifies the various items within it:

Registry

Querying the registry from SQL Server

Using xp_regread / xp_instance_regread

Let’s start off by querying some data within SQL Server. I have an instance of SQL Server named “SQL2014” (would you care to take a guess as to what version of SQL Server this is?). One of the items stored in the registry is the location of the SQL Agent working directory. These procedures can query the registry and return the specified values. For example:

When I execute this statement, SQL returns the following result:

In order for you to run this, you may need to change the key as appropriate for the version and instance on your server. The result includes both the value, and the data for the specified path.

xp_instance_regread

In this example, I used xp_regread to read the direct registry path. If you remember from earlier, there are SQL Server instance-aware versions of each registry procedure. A comparable statement using the instance-aware procedure would be:

This statement returns the exact same information. Let’s look at the difference between these – in the first query, the registry path is the exact registry path needed, and it includes “\Microsoft SQL Server\MSSQL12.SQL2014\”. In the latter query, this string is replaced with “\MSSQLSERVER\”. Since the latter function is instance aware, it replaces the “MSSQLSERVER” with the exact registry path necessary for this instance of SQL Server. Pretty neat, isn’t it? This allows you to have a script that will run properly regardless of the instance that it is being run on. The rest of the examples in this post will utilize the instance-aware procedures to make it easier for you to follow along and run these yourself.

Syntax

The syntax for these procedures is:

The first parameter is the registry hive that you want to query, the second parameter is the key path, and the third is the value name. The third parameter is optional – if it is provided, then the procedure will return the data from the specified value item; if it is not provided, then the procedure only returns whether the specified key exists. There is also an optional fourth parameter, which is an output parameter, and the data output will go into that. The parameters are positional, and while you can specify a name for the parameter, any name will work. Since the parameters are positional, my recommendation is to not use the parameter names. The following example utilizes the optional fourth parameter, and it will return just the specified path into the variable:

You would think that if you specify either NULL or an empty string for the third parameter, that you could send whether the key exists to an output variable – however, I have not been able to figure out a way to accomplish this. Specifying either of these values results in an error when running this statement. If you know how to do this, please leave a remark so that this post can be updated with that information.

Using xp_regenumvalues / xp_instance_regenumvalues

These procedures will enumerate through all of the values of the specified key, returning a separate result set for each value. For instance, the following statement will return all of the values in the above SQLServerAgent key:

Having all of these result sets makes it difficult to work with this procedure. Thankfully, you can put these all into one result set – just create a temporary table (or table variable) to hold the output, and then use INSERT / EXECUTE to fill it, like the following example does:

Now that all of the results are in one result set, you can work with it a bit easier.

Syntax

The syntax for these procedures is:

Using xp_regenumkeys / xp_instance_regenumkeys

These procedures will enumerate through all of the keys in a specified path, and return all of the keys in that path. Unlike xp_instance_regenumvalues, all of the keys are returned in one result set, though you will probably want to use INSERT / EXECUTE to put this into temporary storage so that you can work with it. An example of using these procedures is:

The syntax for these procedures is:

Modifying the registry from within SQL Server

Up to this point, we have focused on retrieving data from the registry. What if you want to modify the registry? Read on… be forewarned that the following procedures are modifying the registry, which means that they can also damage the registry, possibly rendering the server unusable. Use at your own risk!

xp_regwrite / xp_instance_regwrite

These procedures are used to create keys and write data into the registry. You can create up to 32 sub-keys at a time. In the following example, a new key “MyNewKey” will be added to the SQLServerAgent key, and the value “MyNewValue” will be added to this new key with the data “Now you see me!”. It will then read this value from the registry.

Syntax

The syntax for these procedures is:

xp_regdeletevalue / xp_instance_regdeletevalue

These procedures are used to delete a specified value from the registry. In this example, the “MyNewValue” value will be deleted. The example first enumerates through all of the values in this key (just the one), deletes the value, enumerates through them again (since there are no values, there will be no result set), and then finally shows that the key is still present.

Syntax

The syntax for these procedures is:

xp_regdeletekey / xp_instance_regdeletekey

These procedures are used to delete an entire key from the registry. In this example, the script will first add another new key under “MyNewKey”, and a value in that new key. The script then deletes both the “AnotherNewKey” (which deletes the value just added also) and “MyNewKey” keys and finally shows that both keys have been deleted. Note that in order to delete a key, it cannot have any sub-keys, which is why the script deletes “AnotherNewKey” first (try running the script first by commenting out the first xp_instance_regdeletekey to see that it must be empty).

Syntax

The syntax for these procedures is:

xp_regaddmultistring / xp_instance_regaddmultistring

These procedures are used to add a string to a multi-string entry in the registry, or to create a multi-string registry entry. In this example, I’ll call the procedure twice. The first time will create the entry with one string in it, and the second time will add a second string to it. Then the example will show the results of this value.

Did you notice that the regread procedure got a little confused here? It has two value columns (which is the name of the value), but the second one has the data for the first one. Then the data column is null. If a third string is added, it still returns just these three columns. To see that we actually added these strings, we’ll have to use regedit.exe:

Registry2

Syntax

The syntax for these procedures is:

xp_regremovemultistring / xp_instance_regremovemultistring

These procedures are used to remove a string from a multi-string entry. This example will remove the middle string.

Examining the string now in regedit.exe, it can be seen that it removed all of the strings starting with the specified string… in other words, it removed the second and third strings. If we add the strings back in, and then remove the third string, we can see that it also removes the second and third string. In testing with a fourth string, it appears that if you are deleting a string that is not the first string, then all of the remainder of the strings after the first string are removed. Deleting the first string deletes all of the strings. Well, these are undocumented procedures, so it’s not likely that this bug will ever be fixed.

Syntax

The syntax for these procedures is:

And finally, let’s ensure that things are cleaned up:

Conclusion

With the exception of removing and reading the multi-string values, these extended stored procedures all work pretty well. We’ve been able to read registry values, and to enumerate through a list of keys and values. Keys and values can be created and deleted.

If you try to use the procedures in sections outside of SQL Server keys, you may run into registry security problems where the SQL Server service account doesn’t have permissions to accomplish the task. On the internet, you will find articles telling you to add the service account to the local administrators group (and to restart the service). This is a VERY BAD IDEA – by doing this, the server can be completely compromised – you will be allowed to do anything that you desire on the server. Instead, use regedt32.exe, which will allow you to modify the permissions on the key that where you are having access issues.

I close by repeating my warning from above – be very careful when modifying the registry. It is possible to corrupt the registry to the point where the server will no longer function. To recover, an operating system reinstall will be necessary.

As a footnote on working with the instance-aware installation. I recall seeing on the internet other keywords that could be substituted instead of MSSQLSERVER. One of them is SQLServerAgent. However, I can’t find any others right now. If you happen to know of other keywords that work, please reply to this post so that it can be added to this post.

This post is re-published from my original post on SQL Solutions Group.

In order for your SQL Server instance to run optimally, there are many SQL Server best practices that you need to follow. The SQL Health Check that you can have us perform on your instances looks at many of these. SQL Server 2016, which at the time of this writing is in a Release Candidate 0 state of readiness, has been changed to handle several of these best practices. This post will look at several of these changes.

Some of the SQL Server Best Practices

Instant File Initialization

The first change appears during the installation of SQL Server 2016 itself. One of the best practice items that I look for in my Health Check is that the SQL Server Database Engine service has the “Perform Volume Maintenance Task” security privilege. SQL Server 2016 now allows the setting of this privilege right in the install screen.

SSMS2016-PVMT

The benefit of having this privilege set is that it allows SQL Server to utilize “Instant File Initialization” (IFI). When IFI is not set, then any growth of a database’s data file will be zero-initialized… that is, the entire growth portion of the file is overwritten with zeros (note that database log files are always zero-initialized initialized with 0xC0… this is needed for the crash recovery process). During the time that the database data file is being zero-initialized, no other activity can occur in the database. This setting not only affects growths of data files, but also affects database restores and adding new files to a filegroup. Kimberly Tripp has an excellent post that explains this in more detail, and the kind of performance impact that can be obtained by having IFI enabled.

However, configuring the instance to utilize IFI has a potential negative security side-effect. If you have just dropped a database, and a growth occurs on another database, it could use the part of the drive that the other database was using. And although it is extremely unlikely, the pages could possibly line up. With advanced techniques, one could then read the data that was on those pages. Obviously, zero-initializing the pages would eliminate this problem. It is also obvious that there are a lot of circumstances that have to line up to allow this potential security issue…but the chance of it occurring will never be zero with IFI enabled.

Because of the performance impact, and extremely low security risk, most SQL Server professionals recommend having this option set.

tempdb

Ahh, tempdb. In SQL Server, that database does so much… it’s where temporary tables and table variables are stored. It’s where row version activity is stored. It’s where running queries will spill out temporary workspace needed for the query. And on, and on, and on. On busy systems, having this database running optimally is of vital importance. And there are several best practices that need to be observed here.

First up is the number of data files needed for tempdb. On a busy system that is creating a lot of temporary objects (specifically temporary tables and table variables), the creation of these objects creates a hot-spot in one of the database file’s internal page types that stores information about the allocation of these objects. However, each database file has these internal pages. Frequently, this contention can be greatly reduced by simply adding additional database files to this database. The specific advice for how many files to have has changed over time, and there is disagreement over the exact number of files to have. What isn’t disagreed over is that additional files are needed. And once again, during the SQL Server 2016 installation, this can be set, as seen in the following screen shot:

SSMS2016-TempDB

This screen shot shows several of the changes that are now available in the installation process. First off is the number of data files. Note that the installation limits the number of data files to the lower of 8 or the number of CPUs presented to SQL Server (one of the methods of determining how many files to have).

The next settings that this screen shot shows is the new default size and autogrowth values. Technically, these changes are made to the model database. However, they apply to all new databases, including tempdb when it is created. The initial size of the data and log files is now 8 MB. The default auto-growth of data and log files is now 64MB.

The next setting available in this screen is the directory to place the data and log files in. Specifically, you can have tempdb data files in multiple directories. During the installation, these files are created in the directories in a round-robin fashion.

Trace Flags

There are three specific trace flags that are commonly set in SQL Server. If trace flag (TF) 1117 is enabled, then whenever a data file grows, all of the files in that database will also grow. In SQL Server 2016, this trace flag has been replaced with the AUTOGROW_ALL_FLAGS or AUTOGROW_SINGLE_FILE option of the ALTER DATABASE command, and setting TF 1117 no longer has any effect. Since it is set as part of the ALTER DATABASE command, it can be controlled at the database level, where TF 1117 was a setting across the entire instance. Furthermore, because this setting is so valuable to have set in tempdb, by default tempdb is set with this feature enabled.

TF 1118 controls whether an extent can have pages belonging to multiple objects (known as a mixed extent), or to a single object (known as a dedicated extent). Using mixed extents allows having small tables use the absolute minimum amount of storage space…which isn’t really that much of an issue as it was in SQL 7.0 when it was added. Using dedicated extents can also help in the contention when creating lots of objects really fast. In SQL Server 2016, this trace flag has been replaced with the SET MIXED_PAGE_ALLOCATION option of the ALTER DATABASE command (and like TF 1117, TF1118 no longer has any effect). Furthermore, in the tempdb database, this setting is also enabled by default.

The next trace flag to talk about is TF 4199. This trace flag controls whether changes to how the query optimizer functions are used or not. In prior versions of SQL Server, the changes were not used unless this TF had been enabled. Starting in SQL Server 2016, most of these behaviors are enabled if the database is in the SQL Server 2016 compatibility level.

Reference: SQL Server 2016: What’s New in (the) Database Engine

This post is re-published from my original post on SQL Solutions Group.

secret_iTarget. Home Depot. NASA. U.S. Army. Anthem. Wall Street Journal. MarketWired. Ashley Madison. What do they all have in common? They are all recent victims of cyber-attacks – several of these are confirmed to involve SQL Injection. Even though the exact method used to penetrate the other systems has not been released, experts believe that SQL Injection was involved in all of these cases. Take a look at http://codecurmudgeon.com/wp/sql-injection-hall-of-shame/ for places that are known victims of SQL Injection.

The part that is really astonishing is that we have known how to stop SQL Injection for more than a decade. Completely. Dead. In. Its. Tracks. Which means that the problems happening today are because of careless coding techniques, and as such, they can be prevented.

The “Funky query”

A few years ago, I was working for a company that developed and sold software. One day, my supervisor sends me an email titled “Funky query”. Hmm, that sounds interesting… I’ll open this up right away, instead of ignoring it until I finish what I’m currently working on (like I would usually do).

Wayne… one of our customer support engineers sent us this from the application error log of a customer. They were running a report. Any ideas on how this funky query came to be?

Query from error log (genericized… because I can’t use the real query):

SELECT * FROM sys.objects WHERE name = ‘O’Malley’

Details from the error log:

Me: Umm, boss… Does this report allow users to enter in search criteria?

Boss: But of course!

Me: Well, I really hate to tell you this, but we have a SQL Injection problem.

And after a bit of back and forth where the developers were insisting that no way was there a SQL Injection problem, I sat down with the dev team lead and the boss and proved it to them. We created a dummy table in the database, went to the report criteria form, and I dropped the table.

Wayne: +1000

Development Team: -1000

Defining SQL Injection

Before we go into how SQL Injection occurs, let’s first examine what it is. SQL Injection is simply a code injection technique that allows SQL code that the user enters in the front-end to be run against the back-end database.

Please note a few things about this definition:

  1. SQL Server was not mentioned. Any and all database platforms are susceptible to SQL Injection. SQL Server. Oracle. PostgreSQL. DB2. It doesn’t matter – all database platforms are susceptible to SQL Injection.
  2. A generic front-end is mentioned. This is anything that connects to a back-end database, be it a web form or your favorite application. Anything that connects to a database can allow SQL Injection.
  3. Improperly written code within the database (such as a stored procedure) can allow SQL Injection. It’s all about how the query is created and run against the database.
  4. There are other forms of Injection attacks that can occur; SQL Injection is just one. Other forms include Cross-site scripting, Web Parameter Tampering, LDAP Injection and XPath Injection (as I write this, WordPress 4.3.1 is being released to fix two Cross-site scripting issues).

SQL Injection can happen in three different ways:

  1. Classic – this is when a user enters SQL code, and can immediately view the results on screen.
  2. Blind – this is essentially a classic attack, but the user does not see the results. Things may still happen, and the user may be able to send results to files that can be downloaded, but the results are not shown on screen.
  3. Second Order – the user supplied SQL code is actually stored in the database, and at a later point in time it is run.

Creating SQL Injection

SQL Injection happens when a SQL statement is created from the data supplied by the user, and it is run by the application or other process. Obviously the intention of the application is to only run the queries that it supplies. Consider a login form with user name and password data entry fields. Assume that the user enters “Mickey” as the user, and “1234” as the password. In this case, the application essentially performs:

This will create the SQL statement:

And when we run this, we get the expected results. But now, a hacker becomes the user. Being wise to SQL Injection, the hacker enters for the password the string

This results in the following SQL Statement being generated:

Since every row matches the “OR 1=1” predicate, every username and password will be returned.

Hacker: +1

Developer: -1

Notice that I showed this attack with T-SQL code – this means that you can have SQL Injection vulnerabilities within your T-SQL code (such as stored procedures), as well as any SQL commands that the application might construct and send to the database. The application might be properly handling for SQL Injection, but if it is calling SQL code that is improperly coded, you can still be vulnerable to SQL Injection. Everything has to be done properly – all it takes is one point where SQL Injection is possible to become the latest victim.

What can a hacker do with a SQL Injection vulnerability?

At this point, the hacker knows that the application is susceptible to SQL Injection, and can now issue any SQL statement desired. Depending on the access level that the application has to connect to the database, the hacker can:

  1. Get a list of all of the databases in the instance.
  2. Get a list of all of the tables in the current database, or any other database.
  3. Query the data in any of the other tables that can be seen.
  4. Modify (insert / update / delete) data in any of the tables that can be seen.
  5. Create / modify / drop any object in any database on the instance.
  6. Create elevated permissions at the SQL Server instance level. This will allow the hacker to do anything they desire on this instance. Drop tables? Yes. Query any and all data? Yes.
  7. Create elevated permissions at the Operating System level.
  8. Install programs on the server.
  9. Access any server on the network, and perform all of the above on those servers.

Scary? Absolutely. However, as I mentioned before, we know how to prevent SQL Injection. And it turns out that it’s just not that hard. Instead of making SQL Statements that include data (either as directly supplied from the user, or from the database), pass that data to the SQL statement through parameters.

If we take the previous example, and modify it to use parameters, we would have:

What’s the chances that you have a password of “1234′ OR 1=1; –” in your system? Assuming that you don’t, this query will return zero rows. SQL Injection has been prevented.

So now you want an example of the difference between classic and blind SQL Injection? Well, I’m not going to show you an example… because this will normally come down to how the application handles the data being returned. In the “OR 1=1” example, if the application only shows the first record, then the hacker won’t see the results of the SQL Injection (unless they notice that it’s showing the wrong data). Or the application may have been coded to expect only 1 row, and if more are returned then it won’t display any.

Second Order SQL Injection

But how about a second order SQL Injection? Let me explain that with an example. Imagine a form for entering new users. The development team has learned about the SQL Injection on the forms, and they are now using parameters. When the hacker enters for the password:

1234′ OR 1=1;–

The generated SQL statement is:

When the data is queried:

We get the following results:

The data has been stored in the database, and we are now primed for a second order SQL Injection attack. Some other process comes along (for example, a month-end report), and it creates dynamic SQL by building the SQL statement from the data. And the second order SQL Injection has just been released…

What have we learned so far?

  1. SQL Injection can happen to any database system.
  2. SQL Injection can be within SQL code (stored procedures), or it can be in the statements passed to the database from the application.
  3. Parameters will prevent the SQL code from being executed immediately, thereby preventing Classic and Blind SQL Injection attacks.
  4. The SQL Injection attempt can still be stored in the database, priming yourself for a Second Order SQL Injection attack.

Preventing SQL Injection

Whenever you are creating a SQL statement, it must never create the statement with data (either from the database, or from what the user may pass). Instead, all SQL statements must use parameters. Even the month-end report.

Within SQL Server, have your queries reference the parameters (ColumnName = @VariableName). If you must use dynamic SQL to build your statement, use sp_executesql – it allows for the building of SQL statements with parameters, and then defining and passing those parameters in as the procedure is called.

In your application, there are several component choices. Some directly execute SQL, and others can use parameters. You want to always use the components that use parameters, and you must use those parameters.

Learning more

The Open Web Application Security Project is a community-driven, non-profit organization where experts analyze web application security issues and then compile a list of what the biggest issues are. For the last several years, the category “Injection” has been the biggest issue, and this includes SQL Injection.

This isn’t all that they do. They also have educational resources to not only instruct you on how these issues manifest themselves, but what to do about them. I strongly encourage everyone to take this resource back to your company, and to start identifying and addressing security issues in your code.

Another issue is that developers don’t really know about SQL Injection. In the workshops that I have given where this topic is presented, very few attendees have seen SQL Injection in a test environment. Most can barely give an adequate description of what it is. Those that have seen it are usually those who have been a victim of it. So I also recommend getting your company to arrange for SQL Injection training for the developers and DBAs. It would be good to send your QA team to this also.

This post is re-published from my original post on SQL Solutions Group.

NoProfilerIn a prior post, I shared a script that will take a running trace and show you the XE events that it relates to, and what columns are available within those XE events. Specifically, this was for converting a deadlock trace into an XE session; however the process is the same for converting any trace into an XE session. In today’s post, we’ll compare the deadlock trace and the new XE by running both, creating a deadlock, and comparing the captured data. We’ll look at the data captured in the XE from both script and GUI, and look at a few other differences between running a trace and an XE session.

The first step is to grab the trace and XE scripts from the prior post. Modify both scripts to put the output files in an appropriate place on your system. Run both of the scripts to start the trace and to create the XE session. Next, start the XE session with the following script:

The next step is to create a deadlock. Open up a new query window, and run the following. Leave this query window open.

Next, open up a second query window, and run the following code in that window:

Finally, return to the first query window and run the following code, at which point one of the statements in one of the query windows will be deadlocked:

Now that we’ve created a deadlock, let’s compare the trace output data to the XE output data. First, let’s grab the data from the trace file with this script, which selects all of the non-null columns from the table valued function (remember to change the filename/path as appropriate):

SSMS2014XE09-XE-Collected-Trace-Data

SSMS2014XE10-XE-Collected-Trace-Data

SSMS2014XE11-XE-Collected-Trace-Data

We now want to compare this to the data that was collected by the Extended Event session. In SSMS, expand the Management Tree, then the Extended Events tree, Session tree, and the Deadlocks tree:

SSMS2014XE07-XE-Session-target

The event file target can now be seen. Double-click the file target to view the data, and you can see the events that fired for the deadlock. Note that you can right-click on the grid header, and choose additional columns to put into the grid – for this XE session, I like to add the database_name, resource_description, resource_owner, resource_type and xml_report columns.

SSMS2014XE08-XE-Collected-Data

By selecting one of the rows, all of the data will be available in the lower grid in a Name/Value format.

Let’s compare the XE output to the trace output, and ensure that the data captured by the trace is present in the XE output. The first trace column is TextData. In that, we see multiple bits of data: SPID, lock resource, and the deadlock graph in XML format. In the XE, the resource_description column has the lock resource, the xml_report column has the deadlock graph in XML format, and the session_id column has the SPID. Skipping the BinaryData column, the next column is the NTUserName. Here we can see that this is collected in the XE nt_username column. A minor difference is that the XE includes the domain name, and the trace doesn’t.

The ClientProcessID column maps to the client_pid column. Likewise, the ApplicationName maps to client_app_name, LoginName maps to server_principal_name, SPID maps to session_id, Duration maps to duration, StartTime maps to timestamp, ObjectID maps to object_id. The EventClass, Mode and Type columns maps via friendly names to the name, mode and resource_type columns. The only columns that aren’t explicitly in the XE results are the BinaryData, EndDate and ServerName columns. The EndDate can be calculated from the timestamp and duration columns, and we could have selected the server_instance_name in “Global Fields (Actions)” tab when configuring the XE session. So here, we can see that everything needed from the trace is available in the XE session.

For the next step, one might want to load the data into a table, and viewing it in the viewer doesn’t allow for this. Or maybe you are working on a server prior to SQL Server 2012, and the GUI isn’t available for you to use. The following script can be used to query the data from the file target (again, remember to change the file name/path as necessary):

SSMS2014XE12-XE-target-file-query-results

As can be seen from the output, the data that we most want to see is stored as XML (when working with Extended Events, all of the data is stored as XML) in the event_data column, so I have converted this column to XML. At this point, you can click the XML output, and it will be opened up in text, allowing you to see all of the data as it is stored in the XML output.

SSMS2014XE13-XE-target-file-event_data-results

From this point, it’s just a matter of modifying the query to return the columns that you are interested in.

This query extracts the data from XE file target. It also calculates the end date, and displays both the internal and user-friendly names for the resource_type and mode columns – the internal values are what the trace was returning.

For a quick recap: in Part 1, you learned how to convert an existing trace into an XE session, identifying the columns and stepping through the GUI for creating the XE session. In part 2 you learned how to query both the trace and XE file target outputs, and then compared the two outputs and learned that all of the data in the trace output is available in the XE output.

This post is re-published from my original post on SQL Solutions Group.

NoProfilerIn this brave, new world of Extended Events (XE, XEvents), I find myself with a mixture of scripts for troubleshooting issues – some use XE, and some use traces. We’ve all been told that XE is a much better system (it is much more lightweight, causing less of an issue with the server). In fact, it is so much better that Microsoft has deprecated SQL Trace and SQL Profiler, and in the future, one will not be able to run any traces at all. Believe it or not, this is a good thing!

TSQL TuesdayIt just so happens that today is the 67th installment of the monthly T-SQL Tuesday blogging event. T-SQL Tuesday, that wonderful monthly blogging party started by Adam Machanic where a selected host challenges the SQL Server universe to have a blog post about a specific topic. This wild frenzy of SQL Server blog posting occurs on the second Tuesday of each month. This month, it is being hosted by my friend (and the person with the highest amount of energy known to mankind) Jes Borland (b / t), and the topic that she has chosen is Extended Events. Her specific challenge to the SQL Server universe is:

I want to know (and others do, too) how you’ve solved problems with Extended Events. What sessions have you created? What unique way have you used predicates or targets? What challenges have you overcome?

The biggest challenge that I have is not a technical challenge… it’s a personal challenge: actually getting started with XE. I have so many scripts for doing traces, that I just immediately use them instead of the better XE system. I really need to wean myself off of using Profiler / traces. Therefore, I’ve decided to start converting my trace scripts into XE scripts, and I’ll share with you how I go about doing it. Today, I’m going to look at my favorite trace script – a trace to capture deadlock information.

First off, let’s start with the trace script:

From this script, you can see that the trace is collecting data from three events (Deadlock Graph (148), Lock:Deadlock (25) and Lock:Deadlock Chain (59)) and several columns for each event. The next step is to convert all of this information into XE events / actions. For this, I’ll modify the script in BOL (at https://msdn.microsoft.com/en-us/library/ff878264.aspx) to the following to return the XE events / actions for a specified running trace. Since the trace needs to be running, execute the above trace script to start the trace, then run the following code:

These results may contain NULL values in the “Extended Events action” column. A NULL value here means that there is not a corresponding event action for that column. For my deadlock trace, I get the following results:

SSMS2014XE00-TraceXExref

With this information, I can now jump into SSMS (2014) and use the New Session wizard to create a new XE session (Expand the Management and Extended Events nodes. Right click on Sessions and select New Session). I name the session “Deadlocks”, and click the “Events” page to select the events that I want to use.

SSMS2014XE01-NewXESession

Since all of the events that I’m looking for contain “Deadlock”, I search for this and 5 events are displayed. I double-click each of the three listed above (or select them and click the “>” button) to move them to the “Selected Events” grid.

SSMS2014XE02-SelectEvents

The next step is to configure the columns, so click the Configure button.

SSMS2014XE03-ConfigureEvents

Select the lock_deadlock event, and on the “Global Fields (Actions)” tab select the client_app_name, client_pid, database_name, nt_username, server_principal_name and session_id columns.

SSMS2014XE04-ConfigureEventColumns

Click on the “Event Fields” tab, and check the checkboxes for database_name and resource_description. Fields that show up in this tab with checkboxes have customizable actions, and don’t actually collect the information unless checked. Frequently this will be because these fields require an additional level of resource usage to gather this information.

SSMS2014XE05-ConfigureEventColumns

Select the lock_deadlock_chain event, and select the session_id and database_name columns on the “Global Fields (Actions)” tab. Click on the “Event Fields” tab and check the checkbox for database_name and resource_description. Finally, select the xml_deadlock_report event, and select the server_principal_name and session_id columns on the “Global Fields (Actions)” tab.

Finally, click on the Data Storage page, and add an event_file target type. The file name defaults to the name that you gave the session earlier. I set the maximum size to 250mb and 5 rollover files. If you select to script to a new query window (HIGHLY recommended), you will have a script with all of the DDL necessary to create this XE.

SSMS2014XE06-ConfigureDataStorage

At this point, there are two changes that I like to make. First, I like to make my scripts so that they won’t have an error, and if the XE session already exists then one will be generated. To make it bullet-proof, I add an IF EXISTS check at the top and drop the XE session if it already exists. Secondly, I like to call xp_create_subdir to create the directory that the script points to, just in case the directory doesn’t exist. Note that xp_create_subdir will be successful if the directory already exists, so I don’t check to ensure that the directory doesn’t exist before executing this procedure.

My final XE Deadlock script looks like:

Now that you have this XE session scripted out, it can be easily installed on multiple servers. If you encounter a deadlock problem, you can easily start the XE session and let it run to trap your deadlocks. They will be persisted to a file dedicated for the deadlocks. You can use my Deadlock Shredder script at http://bit.ly/ShredDL to read the deadlocks from the file and shred the deadlock XML into a tabular output.

Note that the default system_health XE session also captures deadlocks. I like to have a dedicated session for just deadlocks. As lightweight as XE is, sometimes it may benefit a server to turn off the system_health session. Additionally, Jonathan Kehayias has a script that will take a running trace and completely script out an XE session for it. This script can be found at https://www.sqlskills.com/blogs/jonathan/converting-sql-trace-to-extended-events-in-sql-server-2012/. Even though this script is available, I like to figure things out for myself so that I can learn what is actually going on.

Did you catch the note above where I mentioned that the default system_health XE captures deadlocks? This means that if you are still enabling trace flags 1204 and/or 1222 to capture deadlock information in your error logs, you don’t need to do that anymore. Furthermore, by using this Deadlock XE, you can have the deadlocks persisted to a file where, in combination with my Deadlock Shredder script, it will be even easier to analyze the deadlocks than trying to figure it out from the captured information in the error logs.

I hope that this post will help someone with the conversion from using SQL Profiler to using XE instead. The process for converting other traces would be the same, so it can be easily adapted. I also hope that these deadlock scripts will be helpful to you!

In my next post, I will be comparing the results of the deadlock trace and the deadlock XE session.

And Jes – thanks for hosting this month! I’m really looking forward to seeing the roundup for this month.

This post is re-published from my original post on SQL Solutions Group.

I was recently reading this msdn article on Ghost Records, and it mentioned that you could get the number of ghost records on a page with DBCC DBTABLE… and it also mentioned that you need to be sure that you enable Trace Flag 3604 in order to see the results. So, two things immediately jumped out at me. First, I wanted to look at this to see where the ghost records were located. Secondly, I’ve just written a few articles (here, here, here and here) where I’ve been able to use the “WITH TABLERESULTS” option on the DBCC command to avoid using this trace flag and to provide automation for the process, and I wanted to see if that would work here also.

The good news is that “WITH TABLERESULTS” does indeed work with DBCC DBTABLE. The bad news is that I could not find the ghost record count in the results.

When I was looking for this information, I noted that the results meta-data are identical to the way DBCC PAGE has its output, so this means that the automation processes already developed will work for them. And as I was looking through the results, looking for a ghost record counter, I noticed two interesting fields:

Field Value
m_FormattedSectorSize 4096
m_ActualSectorSize 512

Hmm, this is showing me the disk Sector Size of each database file. After checking things on a few different systems, it looks like the m_ActualSectorSize is what the sector size is for the disk that the database file is currently on, and the m_FormattedSectorSize appears to be the sector size for when the database was created – and it is copied from the model database, so it appears to be what the disk was like when Microsoft created the model database.

Since it’s a best practice to have the disk sector size (also known as the allocation unit size or block size) set to 64kb (see this white paper), I decided to programmatically get this information. After digging through the Object and ParentObject columns, this script to get the current allocation using size (Sector Size) for each drive was developed:

And here we have yet another way for how a process can be automated by using “WITH TABLERESULTS” on a DBCC command. I think that this one is a particularly good one to show the possibilities – to get this information you have to hit multiple parts of the DBCC results, and repeat it for each file in each database. Doing this by using the 3604 trace flag, finding the appropriate piece and then proceeding on to the piece would be very time consuming to do manually.

Finally, a quick note here: there are better ways of getting the disk sector size – since you can get it with WMI calls, you can get it with PowerShell (or even dos), and there are also command line utilities that will also get you this information. This is just a way to do it from within SQL Server. Note also that this only gets the drives that contain database files on this SQL Server instance – if you are looking for other drives, then this won’t work for you.

Take a look at the other fields that are available in DBTABLE – you just might find another item that you’d like to be able to retrieve.

Previous related posts:

Persisting DBCC Output
Automating DBCC Page
Automating DBCC Page, Part 2
Automating DBCC Page, Part 3

This post is re-published from my original post on SQL Solutions Group.

SQLSaturday SpeedPASSes

I’ve been working with running SQLSaturdays in the Richmond, VA area for several years now. It seems that every year, I end up blogging something new. Frequently, this ends up being a script of some kind to help me out. This post follows this theme by introducing a script for working with SQLSaturday SpeedPASSes.

The SpeedPASS for SQLSaturday is used for many things. It has the attendee’s admission ticket, lunch ticket, name badge and raffle tickets for all of the vendors. What most organizers do for their local SQLSaturday is to do whatever it takes to entice the attendee to pre-print their ticket. However, it seems that there are always people that forget, or didn’t realize that they should have done this. Last year, at our event, we had about 40% of the people that needed their tickets printed out for them on the spot. This causes lines waiting as we hunt up their SpeedPASS on the registration site, or find it in an Excel document to get the link to those that have already been downloaded.

I mentioned getting the link in an Excel document. This is because PASS generates the SpeedPASS into a PDF file that has a GUID as its filename. Specifically, it is the “Invoice ID” guid in the attendees registration. The Excel document that the site administrators can download of the registrations has this, along with their name. So we look up the name, find the invoice id, and match this to the PDF file for that person. Open it up, and print it out. One by one. Does this sound like a cursor to you also?

Changing things up

This year, our organizing committee decided that this was too much hassle, and that we were going to print out the SpeedPASSes for everyone ahead of time and to have it ready for them as they walk in the door. What we have accomplished is to shift the line from waiting to print out their SpeedPASS, to finding their SpeedPASS. However, we have hundreds of these SpeedPASSes to deal with. Now we have to get all of the PDFs, open them up (one-by-one), and print them out. After they have all been printed, they need to be sorted by last name. Manually.

Hello… this is insane. Surely, there has to be a better way. So I started looking.

A better way…

My good friend, Mr. Google, found this post by Kendal Van Dyke. This post has a PowerShell script that will download and merge all of the PDFs for selected attendees into one big PDF. This enables printing out all of the SpeedPASSes at once, instead of one-by-one. However, I have a couple of problems with this script in it’s current form. First, the instructions for how to get the information from the SQLSaturday admin site have changed (they did do a major web site change last year). Secondly, it downloads all of the PDFs one-by-one, and puts them into a temporary directory, where they are all merged together. In file-name order. Not alphabetically. This means that the manual sorting is still necessary. But hey – it’s PowerShell. Surely we can come up with a way to do this sorting for us!

So I decided to re-write this script to suit my needs. Kendal’s script downloads the SpeedPASS PDF files one-by one. However, the admin site allows us to download all of them in one zip file. I like this approach better. I ended up making two major changes to the script. The first change requires pre-downloading and extracting all of the SpeedPASS files. The second change is to get them to merge alphabetically. Like Kendal’s script, this uses the PDFSharp assemblies. This requires using PowerShell 3.0 or higher.

The new instructions for using this script

  1. Download and save the PDFSharp Assemblies from http://pdfsharp.codeplex.com/releases.
    1. Unblock the zip file (right-click the file, select properties, and click “Unblock”. See Kendal’s post for pictures showing you how to do this), and save the files in the zip file to your desired location.
    2. Put this location in the $PdfSharpPath variable in the script.
  2. Log into the SQLSaturday admin site and navigate to Event Settings > Manage SpeedPASS.
    1. Cick the button to “Generate All” SpeedPASSes.
    2. Click the button to “Download SpeedPASSs” and save to disk.
    3. Extract the files from the zip file.
    4. Put this location in the $SpeedPASSPDFPath variable in the script.
  3. Go to Event Settings > Manage Registrations.
    1. Click the “Export to Excel” button.
    2. Save to disk.
  4. Open the Excel file and Sort as desired (I used lastname / firstname).
    1. If you only want to do the speakers, filter for complimentary lunches or not.
  5. Copy the InvoiceID column from Excel to a text file and save the file.
    1. Put the complete filename for this file in the $SpeedPASSUrlPath variable in the script.
  6. Ensure that the following variables are set. Run the script to merge the PDF files into a single PDF.
    1. Set the $PdfSharpPath variable to the location of PDFSharp DLL (from #1 above).
    2. Set the $SpeedPASSPDFPath variable to the location where the extracted PDF files are (from #2 above).
    3. Set the $SpeedPASSUrlPath variable to the location of the text file with the InvoiceID values in it (from #5 above).
  7. Finally, open the merged PDF file. You will see that they are ordered by how you ordered the Excel spreadsheet. When you’re ready, print them out.

The PowerShell script.

Some of it is exactly how Kendal’s is, but the Merge-PDF2 function is a re-write of his Merge-PDF function to handle the differences from above.

There you have it. Working with SQLSaturday SpeedPASSes to generate one PDF for all selected SpeedPASSes sorted by name. Eazy-Peazy.

You might be wondering what is up with that picture up above. Yes, it’s a Dr. Who picture. My friend, Grant Fritchey, challenged folks to incorporate this picture into a technical post – read his challenge here. I hope that I did it justice.

While you may not need to worry about the physical location of a row very often, every so often the need comes up. In a few of my earlier posts (here), I have needed this information to prove a point. This post shows how to identify a row’s physical location. It will also show the various methods used to work with this value once returned.

Acquiring the physical location of a row

SQL Server 2008 introduced a new virtual system column: “%%physloc%%”. “%%physloc%%” returns the file_id, page_id and slot_id information for the current row, in a binary format. Thankfully, SQL Server also includes a couple of functions to split this binary data into a more useful format. Unfortunately, Microsoft has not documented either the column or the functions.

The first of these functions is “sys.fn_PhysLocFormatter”. This scalar function receives the %%physloc%% virtual column as an input. It splits the input apart into the file, page and slot information, and then returns the information in a column formatted in the form file_id:page_id:slot_id. This is the function that I used in the above posts.

The second function is “sys.fn_PhysLocCracker”. As a table-valued function, it is used either directly in the FROM clause or by using the cross apply operator. It also receives the %%physloc%% virtual column as an input, and returns the file_id, page_id and slot_id as columns in a table. In at least one of my previous posts, this would have been a better function to have used.

Example usage:

Let’s see some examples of using the virtual column and these functions. In the following script, we start off by creating a temporary table with 700 rows of data. The first query after that uses the sys.fn_PhysLocFormatter function, returning all of the physical location information about the row into one column. The subsequent query uses the sys.fn_PhysLocCracker function, returning the file_id, page_id and slot_id values from the physical location of the row. This allows these values to be used in subsequent processing – say maybe for automating DBCC Page?

This returns:

physloc_query_results

So there you go. %%physloc%% identifies the physical location of a row. Then the sys.fn_PhysLocFormatter function will decipher this and expose it in a composite form, while the sys.fn_PhysLocCracker function will decipher it and return the values as columns in a result set. You can use whichever one suits your needs.

Performance

When I ran these statements with the “Include Actual Execution Plans” option enabled, I saw:

physloc_execution_plan

The bottom statement has the infamous “Table Valued Function” operator in it. SQL Server uses this operator when dealing with a multi-statement table-valued function. If you happened to have read my recent post comparing functions, you will recall that these are the worst performing type of functions. So I decided to look at the code of these functions, to see if sys.fn_PhysLocCracker could be improved upon. Running sp_helptext on these functions returns the following code:

sys.fn_PhysLocFormatter

sys.fn_PhysLocCracker

It seems obvious to me that the multi-statement function was created by cloning the scalar function. Furthermore, this code can be improved upon and converted into an inline table-valued function. An inline table-valued function for this functionality would look like this:

Now, since this is an undocumented function, I don’t expect that this function will ever be upgraded. But if you ever need it, you could create it yourself. Alternatively, you could just code it into the cross apply operator yourself, like so:

Which will return the same information, without the performance inhibiting multi-statement table-valued function being used.

In several of my last few blog posts, I’ve shared several methods of getting internal information from a database by using the DBCC PAGE command and utilizing the “WITH TABLERESULTS” option to be allowed to automate this process for further processing. This post will also do this, but in this case, we’ll be using it to bust a common myth—data in a clustered index is physically stored on disk in the order of the clustered index.

Busting this myth

To bust this myth, we’ll create a database, put a table with a clustered index into this database, and then we’ll add some rows in random order. Next, we will show that the rows are stored on the pages in logical order, and then we’ll take a deeper look at the page internals to see that the rows are not stored in physical order.

To start off with, let’s create a database and a table, and add a few rows to this table:

What we have is a table with 5 rows. The table’s clustered key is the RowID integer column. The rows are inserted so that the odd rows are inserted first, followed by the even rows.

Determine row’s physical location

At this point, let’s look at where the system reports these rows to be at. To do this, we’ll utilize two undocumented system commands. The first is %%physloc%%, which returns the physical location in a hexadecimal format. The second is fn_PhysLocFormatter, which converts this into a format of FileID:PageID:SlotID. This is just simply added to the select clause, so the query is:

This query produces these results:

DBCCPage3-1

From these results, you can see that the rows are all on the same page, and that the slot in the slot array on that page are incrementing for the appropriate RowID value. Remember also that the slot array is zero-based, where the first slot is slot #0. This is the logical order.

How inserts change the slot the row is in

To see this changing around as the rows are inserted, just put the select statement (which I just introduced) after each of the prior insert commands and run the script to create the table and add the rows again. You will initially see RowID=1 put into Slot #0, RowID =3 into Slot #1 and RowID=5 into slot #2. When you then add RowID=2, this needs to be between RowID #s 1 and 3, so #2 is now in slot array #1, #3 moves to slot array #2, and #5 is moved to slot array #3. When you add RowID=4, it gets inserted into slot #3 and RowID#5 is again pushed down, to slot #4. The logical order follows what we are expecting:

DBCCPage3-3

Check the physical storage

In order to show that these are not physically stored in this order, we will need to crack this page and look internally at where the data is actually stored. We will accomplish this with this code:

Which produces these results:

DBCCPage3-2.

The row’s offset is where the row starts at on the page

As we look at these results, pay close attention to the Offset. This is where the row physically starts on the page. You can see how for RowID #2, that this offset is higher than the offset for RowID #3… and even RowID #5. When the row was added, the data was added to the end of the other existing rows on the page, and the offset where this data starts was entered into the slot array, after having the remaining slot array entries pushed down in order to maintain the correct logical order. We can see this happening once again when RowID #4 is inserted.

And, finally, let’s use DBCC PAGE to crack open this page and look at the raw data. For this, we want to use dump style 2:

If you expand out the VALUE column, you can see it go from Row01 to Row03 to Row05, then to Row02 and Row04.

So there you go… the data is physically stored in the next available space on the page; however, the page’s slot array is in the clustered key order. Furthermore, we now know that we can look at the offset and determine the order on the page. And we also have yet another way to use DBCC PAGE in an automated manner to show this.

Previous related posts:

Persisting DBCC Output
Automating DBCC Page
Automating DBCC Page, Part 2

This post is re-published from my original post on SQL Solutions Group.

Today is yet another T-SQL Tuesday – the monthly blogging party started by Adam Machanic (b|t) to get everyone in the SQL Server blogosphere together and to blog about a topic. Today’s party is hosted by none other than Matt Gordon (b|t), and the topic that he has chosen for us to blog about is “Fixing old problems with Shiny New Toys”. Specifically, he says:

What I’d like to see from the blog responses for this T-SQL Tuesday is how you’ve used a “new” Microsoft data platform toy to fix an old problem.

I’ve decided that I’m going to turn this around. Sometimes, those Shiny New Toys cause problems. Lets talk about one such case.

Recently, I was helping a client with an issue where when a job would run, the CPU on all processors would max out. This in turn would cause other performance issues on the server:

After a bit of investigation, I was able to determine that the statement causing this was a MERGE statement. The MERGE statement was a Shiny New Toy introduced in SQL 2008. This MERGE statement had both a WHEN MATCHED and a WHEN NOT MATCHED BY TARGET clause inside it.

What does this mean?

BOL defines the MERGE statement as:

Performs insert, update, or delete operations on a target table based on the results of a join with a source table.

This is performed with just one pass of the target table.

There are three possible operations that can be used within the MERGE statement. These are:

WHEN MATCHED – this allows you to perform an UPDATE or DELETE against the target table using the source table. There is a row existing in both tables for this match. If you were performing just the individual UPDATE or DELETE statement, you would use an INNER JOIN between the source and target table.

WHEN NOT MATCHED BY TARGET – this allows you to perform an INSERT of rows into the target table from rows in the source table that are not in the target table. If you were performing just the INSERT statement, you would drive it with a SELECT statement that performs a LEFT OUTER JOIN between the source and target tables, checking for a (not-null) column in the target table being NULL (signifying that there wasn’t a matching row).

WHEN NOT MATCHED BY SOURCE – this allows you to perform an UPDATE or DELETE against the target table when the target table has rows that the source table does not. If you were performing just the UPDATE or DELETE statement, you would use a RIGHT OUTER JOIN between the source and target tables… or how most people would write it, a LEFT OUTER JOIN between the target and source tables.

Keep in mind that my examples above are what you would do if you were separating the commands. Individually, these different conditions perform just fine with the MERGE statement. However, when the MERGE statement combines multiple match conditions, then think about what SQL has to do. See the chart at this link for when the different types of join conditions are used in the MERGE statement.

In this case, it’s doing both a WHEN MATCHED and a WHEN NOT MATCHED BY TARGET. In order to process the target table in just one pass, it will frequently end up performing worse. It can also end up using more resources in the process. I frequently find it better to just use the individual statements.

By separating this out into the respective statements, each can be optimized to perform better. Which is what I did… an UPDATE statement followed by an INSERT statement. Not only did the job run faster overall, but this change also stopped the CPU from maxing out:

Ah, much better.

There are other issues with the MERGE statement. It has bugs… some of which can cause database corruption.

Here we have a Shiny New Toy (feature), supposed to make life easier, yet it causes problems. Until it can perform better (and the bugs are eliminated), I just don’t use it.

Beware the Shiny New Toys.