Thursday, August 16, 2012

NOLOCK!

Introduction

I see many developers writing SQL code and using NOLOCK as a "performance tool". I see NOLOCK spread liberally through scripts, applied to every single table as far as the eye can see. This technique is dangerous and I would like to share why.



Background

In order to understand the dangers of NOLOCK, I will start by giving a bit of background with regards to SQL Server storage, index structures and page splits.


SQL Server Storage

The fundamental unit of data storage in SQL Server is the page and pages are numbered contiguously from 0 to n. Disk I/O operations are performed at the page level i.e. SQL Server reads or writes whole data pages at a time. A page is 8Kb with 96 bytes reserved for the header. The header contains metadata about the page, and for the purposes of this article we are interested in m_pageId, m_nextPage and m_prevPage. These page numbers have a file number and a page number separated by a colon, for example 1:2870 is page number 2870 on file 1.










SQL Server Index Structure

Indexes store data on a page in order of the key columns of the index. If, for example, we created an index on an ID column with a type of INT the index would look something like this.








Page Splits

Page splits occur when data is inserted or updated and the target page for the operation does not have enough room for the new data. In this case a new page is created and the page headers (next page & prev page) are updated accordingly. This leads to index fragmentation where the physical order of the pages no longer represents the logical order of the index.

For example if we inserted a new row in our example above with an ID of 11 there would not be enough space on the page and we would end up with a split which looks something like this.



And similarly if we ran an update statement on the row with ID 7 and increased the size of the row (perhaps updating CompanyName from 'Belay' to 'EOH Microsoft Services') we would get another page split.


Notice how the physical order does not represent the logical order.


Reading Data

When data is read from the pages the order of the read is influenced by the use of NOLOCK. When a read without NOLOCK is issued the read occurs in logical order while a read with NOLOCK occurs in physical order. Please note: this order is influenced by other factors outside the scope of this article so the actual order may vary on a busy system but the reading data with and without NOLOCK is vastly different.


Without NOLOCK


With NOLOCK


We can demonstrate this with the following code:


USE AdventureWorks;
GO

SELECT TOP 5
        SalesOrderID,
        SalesOrderDetailID,
        OrderQty
FROM    Sales.SalesOrderDetail;

SELECT TOP 5
        SalesOrderID,
        SalesOrderDetailID,
        OrderQty
FROM    Sales.SalesOrderDetail WITH (NOLOCK);


which produces the following results on my system





The Dangers Of NOLOCK

The danger comes in when your read is happening at the same time as a page split. Due to you not issuing any locks while reading data an update or insert may occur while SQL Server is busy with your read. These updates and inserts may cause a page split on one of the pages that has just been read or is about to be read. If this page split causes a new page in front of or behind the read operation you may end up reading the same rows twice or you may miss rows completely. Data read with a NOLOCK is therefore possibly an estimate.

Allow me to demonstrate this.

We first setup a table to test this. If we build a clustered index on an UNIQUEIDENTIFIER (which is a GUID) we can simulate many page splits.



IF EXISTS(SELECT * FROM sys.tables WHERE name = 'AccountEntry')
BEGIN
      DROP TABLE AccountEntry;
END;

GO

CREATE TABLE AccountEntry
(ID UNIQUEIDENTIFIER PRIMARY KEY DEFAULT (NEWID()),
AccountID INT NOT NULL,
Amount DECIMAL(10, 2) NOT NULL);
GO

We also want to insert 10 rows with an amount of 10 for account number -1. This will give us an account balance of 100 for this particular account.



INSERT INTO AccountEntry (AccountID, Amount)
VALUES (-1,10), (-1,10), (-1,10), (-1,10), (-1,10),
       (-1,10), (-1,10), (-1,10), (-1,10), (-1,10);
GO


Now we want to generate many inserts into this table which will cause page splits to occur. I'm using a random number for the account ID, but note this random number is guaranteed to be greater than 0. This will ensure we are not changing the balance for our test account. Run the following code in a new window.



SET NOCOUNT ON;

WHILE 1 = 1
BEGIN
    INSERT INTO AccountEntry (AccountID, Amount)
    VALUES (CAST(RAND() * 10000 AS INT), 10);
END;
Finally, while the code above is running in a separate window, we run the following code. This code runs in a loop and checks the balance of our test account. This balance should always be 100. If we get a balance that is not 100 we print the balance to the screen and increase a counter. I want to see how long it will take to have 5 reads with the incorrect balance.


DECLARE @TotalAmount DECIMAL(10, 2),
        @InvalidCount SMALLINT = 0;


WHILE @InvalidCount < 5
BEGIN
    SELECT @TotalAmount = SUM(Amount)
    FROM AccountEntry WITH(NOLOCK)
    WHERE AccountID = -1;
   
    IF @TotalAmount <> 100
    BEGIN
        PRINT @TotalAmount;
        SET @InvalidCount = @InvalidCount + 1;
    END;
END;


And as we can see it is quite easy to select an incorrect balance using NOLOCK.



How Is NOLOCK Applied

There are 3 ways to apply NOLOCK, 2 of which are table hints and the other applies NOLOCK to your connection.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

SELECT * FROM Sales.SalesOrderDetail WITH (READUNCOMMITTED);

SELECT * FROM Sales.SalesOrderDetail WITH (NOLOCK);

When Can We Use NOLOCK

  • Tables that do not change i.e. look-up tables.
  • Databases that do not change i.e. warehouses and reporting systems that are loaded at set periods.
  • Data that is acceptable as an estimate i.e. trending graphs
The following two trending graphs have slightly different figures. These differences are not noticeable. 




And Now?

Now that we know why we shouldn't use NOLOCK, what do we do with that slow running, blocking query that NOLOCK magically fixed for us? We optimise of course.

1. Fine tune your code. Maybe it can be simplified or split up.
2. Read the execution plan(s).
3. Identify indexes that can be modified to support your query.
4. Identify indexes that you can create to support your query.
5. Go back to step 1.
6. Use an appropriate isolation level.


2 comments: