Gert-Jan's SQL Server articles - Fragmentation hurts performance

24-SEP-2013

You already know it, or at the very least, you have heard of it. And it is true: fragmentation hurts performance!

This article explains what fragmentation is, why it is bad for performance, what you can do to minimize fragmentation, what you can do to minimize the negative effects of fragmentation, and how to reduce fragmentation.

What is fragmentation?

Fragmentation is defined as any condition which causes more than the optimal amount of disk I/O to be performed in accessing a table, or causes the disk I/Os that are performed to take longer than they optimally would.

Optimal performance of SELECT queries occurs when data pages are as contiguous as possible within the database, and the data pages are packed as fully as possible.

What this means, is that you would get the best performance if you make sure that if multiple pages need to be read, that these pages have as little free space as possible, and that they are next to each other on the hard drive.

The storage stack

When you submit an SQL query, the query optimizer determines the execution plan and hands it over to the storage engine who will then execute it. It will determine whether it needs to physically read data off the hard drive (whether it needs to perform I/O), and if so, in which order to read the data. Such requests will trickle down through the operating system to the actual storage device.

Below is a visualization of all the layers from the query (bottom) to the storage device (top).

The Storage Stack
Level	Type	Description	Example 1	Example 2
1	Storage Device	The actual storage of the data	Hard Drive	Solid State Drive
2	Partition	Partitioning of the disk at Operating System level	harddisk0\partition0	harddisk0\partition1
3	Volume	The combination of part of, and/or a combination of partitions into a logical storage volume. This includes any mirroring, striping and/or any other techniques used as part of RAID.	C:	D:
4	Database	The database	master	test
5	File	Basic container for SQL Server data storage. Every file is either a Rows Data file or a Log file	C:\MSSQL\test.mdf	D:\MSSQL\test.ldf
6	File Group	Group of SQL Files. This is the basic container for data objects.	PRIMARY	HISTORIC
7	Extent	Group of 8 data pages	mixed	uniform
8	Page	Most detailed level of storage. A data page occupies 8 KB.	1,143,3	1,153,3
9	Table or Index	A table or index is assigned to a File Group	Customers	IX_Customers_name
10	Column	A column is part of a table row and possibly an index key	order_id	order_date

The table could go even further, down to the individual rows with their individual column values, but for the purpose of explaining fragmentation and its effect on performance, those levels are irrelevant.

What is relevant, but not covered in this article is that if a specific suitable column has a value that does not fit on the row, that it will be stored on another page, or multiple other pages (depending on the size of the value).

1. Storage Device

Hard Drive

A hard drive is a storage device that uses magnetized disks, and a disk head that reads or writes data while the disks are spinning. A typical hard drive has one disk with two surfaces (top and bottom). Each surface has one disk head.

Typically, a hard drive is very fast in reading information that is physically close together on the disk, because it can be served with sequential reads. A hard drive is considerably slower when the next bit that needs to be read is physically far away; this needs so called random reads.

In an ideal situation, a track of information is read, the disk head moves to the adjacent track, finds the next sector immediately, reads the rest of the track and moves on to the next adjacent track. This ideal situation described sequential reads of sector aligned disks.

When a random read occurs, the disk head has to move to a new location, stabilize, find the right sector that holds the information and then read it. There are typically hundreds of sectors on a track on modern hard drives. In the worst case, random reads requires the disk head to move after every indivual sector read, which would be extremely slow. The further the disk head has to move, the slower it is.

Solid State Drive

A Solid State Drive uses semiconductor chips to store data. What this means, is that it behaves completely different when it comes to sequential or random reads, because on SSD, they have about the same speed.

2. Partition

A partition is a section of a disk.

Since hard drives only have one disk head per disk surface, you should avoid multiple partitions on the same disk, because moving the disk head from one partition to the other is relatively expensive.

If you currently have multiple partitions on the same disk, then joining them will probably improve performance, because it would reduce the average seek time.

Multiple partitions is not an issue if you are using Solid State Drives.

Tip 1: Create just one partition per hard drive.

3. Volume

A volume is a logical disk, and is comprised of one or more partitions. In other words, a volume can span multiple partitions.

From a performance point of view, one volume should not have multiple partitions on the same disk, because that would constitute fragmentation if there is space between the partitions.

Typically, striping the volume over multiple disks improves performance. Typically, the performance improves with every additional disk that can be dedicated to the volume.

Tip 2: Limit the volume to one partition per hard drive.

Tip 3: Consider striping the volume over multiple hard drives.

4. Database

From a performance point of view, a database declaration is only the construct within which the database files will be created. So at the database level, fragmentation plays no role.

5. File

An SQL Server File is the file that is stored on the file system, managed by the Operating System, and used by SQL Server to store (data) objects.

If the file is stored on a hard drive and it is fragmented, then it takes more time to read. How much more time, that depends not just on amount of fragmentation, but also on where the data is actually stored. See Storage Device.

To give you an idea of the performance degradation, on my hard drive, a non-contiguous (read: fragmented) file that is spread over 4 times the minimal area (which the contiguous file would have occupied) performed about 2 percent slower.

If the file is created with its final size on a fragmented file system, you should expect such small effects on performance. However, it can get much much worse. If you created your database files too small, and the database needs to autogrow (which is the default setting), then your file fragments get stored all over the disk, and full database scans will take a much bigger performance hit.

The worst case is when your disk is out of space, and your database needs to grow. So you remove some files here and there and then allow the database to grow. Now this last extension of the database file will be extremely fragmented. You can expect up to 32 times performance degradation. So a query that would take 100 seconds in the nonfragmented part could now take up to 3200 seconds in the fragmented part. And I don't even think that that is the upper limit for such fragmentation. This performance degradation is an assumption based on the fragmented database test in the Table or Index section.

Tip 4: Create the database file big enough so it never has to (auto)grow.

Tip 5: Before you create the database file, defragment the hard drive.

Tip 6: Don't fill up the hard drive. Make sure there is space left after creating the database file.

6. File Group

An SQL Server File Group is a logical data file, and has exclusive control over one or more SQL Server files. In other words, a file group can include multipe files.

If you create a file group with two equally sized files, table and index data will be spread equally over these two files. If the first file is three times as big as the second file, 75% of the table data will be stored in the first file and 25% in the second file. Because of that, if you will be managing multiple files in the file group, it is best to have equally sized files, because that result in the best performance spread of data.

This data spreading is done at the Extent level, which is not very fine grained. Because of that striping is preferred over multiple files.

One of the advantages of a file group is that an individual file group can be marked as read-only. When you mark the file group as read-only, all objects that use that filegroup become read-only. Read-only tables and indexes have benefits over regular tables and indexes, because such tables don't require locks, and - provided you are running Enterprise Edition - enable Advanced Scanning. Advanced Scanning circumvents some fragmentation problems, because such scans will not follow the index' linked list of pages, and therefore don't suffer from pages that are out of order.

Tip 7: If you use multiple files in a file group, then make sure they are equally sized.

Tip 8: Mark the file group as read-only if all data in the file group is final.

7. Extent

The file group is the lowest level that you can configure. If you put anything in it, the engine will determine where to put it within the file group. And for the engine, it all starts with Extents.

An Extent is a group of 8 contiguous database pages. So it is a data structure of 64 kilobytes.

An extent is either mixed or uniform. If it is uniform, it only contains pages for the same object (table or index). Typically, all pages of a table or index are stored in uniform extents, with the exception of the first few pages, which are stored in a mixed extent. A mixed extent allows pages of different objects.

For example, if you create a table with 100 MB of data (1600 * 64 KB), then 1599 uniform extents are created and used, and for the remaining pages one mixed extent is created, or one or a few existing extents are used.

If you create and populate a big table in a nonfragmented database, you have the biggest chance that the extents to be contiguous, and the pages in them as well.

You can use the DBCC SHOWCONTIG command to inspect the fragmentation of extents. The line that reports it is Extent Scan Fragmentation. This line indicates what percentage of extents is fragmented, so you want this value to be as low as possible.

Here is example output of DBCC SHOWCONTIG for a table without any fragmentation. As you can see, the Extent Scan Fragmentation reports only 0.01% fragmentation.

DBCC SHOWCONTIG scanning 't' table...
Table: 't' (2105058535); index ID: 1, database ID: 6
TABLE level scan performed.
- Pages Scanned................................: 991384
- Extents Scanned..............................: 124403
- Extent Switches..............................: 124402
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 99.61% [123923:124403]
- Logical Scan Fragmentation ..................: 0.37%
- Extent Scan Fragmentation ...................: 0.01%
- Avg. Bytes Free per Page.....................: 360.0
- Avg. Page Density (full).....................: 95.55%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Below is example output of DBCC SHOWCONTIG for table that is completely fragmented. As you can see, this time, Extent Scan Fragmentation reports 82.24%.

DBCC SHOWCONTIG scanning 't' table...
Table: 't' (2105058535); index ID: 1, database ID: 6
TABLE level scan performed.
- Pages Scanned................................: 991108
- Extents Scanned..............................: 124213
- Extent Switches..............................: 991106
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 12.50% [123889:991107]
- Logical Scan Fragmentation ..................: 99.24%
- Extent Scan Fragmentation ...................: 82.24%
- Avg. Bytes Free per Page.....................: 2830.7
- Avg. Page Density (full).....................: 65.03%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

To get rid of extent fragmentation, you run DBCC DBREINDEX (= old syntax) or ALTER INDEX .. REBUILD (= new syntax).

If your table does not have a clustered index, it is called a heap. Please note that DBCC SHOWCONTIG results about a heap are meaningless. There are no options to defragment a heap. The only way to defragment a heap is to (temporarily) turn it into table by add a clustered index.

Tip 9: Rebuild the table or index if it is heavily fragmented

8. Page

A page is the lowest level object that is stored as a unit. So when a table row or index key is stored in this 8 kilobyte data structure, then (at some point) this entire page will be written to disk. If the engine needs to read a row or key, and it is not in the Buffer Cache, then the relevant page is read from disk.

Every page is managed within an extent. When it comes to fragmentation, you want the pages to be consecutive within an extent and over extents.

You can use the DBCC SHOWCONTIG command to inspect the fragmentation of pages. The line that reports it is Logical Scan Fragmentation. This line indicates what percentage of pages is fragmented, so you want this value to be as low as possible. For example output of DBCC SHOWCONTIG see Extent.

The most efficient way to get rid of page fragmentation, is to run DBCC DBREINDEX (= old syntax) or ALTER INDEX .. REBUILD (= new syntax).

Alternatively, you can use DBCC INDEXDEFRAG (= old syntax) or ALTER INDEX .. REORGANIZE (= new syntax) to remove the page fragmentation. Please note that this method will not defragment the extents.

Another form of fragmentation is unused space. When a page becomes empty, it is freed from the extent. Empty spots within extents mean that I/O gets wasted on unused space when read-ahead reads are issued. SQL Server documentation says that it allows the engine to "read up to 64 contiguous pages (512KB) from one file" (which translates to 8 extents), so every empty spot in these 8 extents will waste 1.6% of the I/O associated with the read-ahead read.

The system stored procedure sp_spaceused allows you to find the amount of unused space within a table and its indexes. This unused space basically translates to reseved pages, i.e. empty spots within the extents.

The remedy for unused pages is the same as for fragmented pages (see above).

Tip 9: Rebuild the table or index if it is heavily fragmented

Tip 10: Rebuild the table or index if there is a lot of unused space

9. Table or Index

A row or index key is always completely stored on one page. The only exception is data of very specific data types, such as text, varchar(max), etc.

This storage implementation is good for performance, because it means that at most one page has to be read to get the data.

It also means, that if there is no room left for a new row or key, that a page split will occur. In such a case, about half the rows or keys on the page will remain on the existing page, a new page is reserved and included in the index' linked lists, and the other half of the rows or keys is moved to this new page.

A page split typically means fragmentation, because both the split page and the newly reserved page will have a lot of unused space. And unless the row/key is inserted at the very beginning or very end of the (clustered) index, the newly reserved page will be not be "next to" the split page. In other words, page fragmentation and extent fragmention is very likely to occur.

The way to get rid of the fragmentation is to rebuild or reorganize it, as explained in the previous section.

INSERTs can cause page splits. DELETEs don't reverse this, they simply create unused space. Both are forms or fragmentation, although page splits usually have the bigger negative impact on performance.

If your table is handling many transactions, then it is a balancing act between unused space and page splits. The DBCC SHOWCONTIG shows not only the fragmentation of pages and extent, but also the average unused space within pages. You can find this under the label Avg. Bytes Free per Page. If the free space exceeds the (average) row size, then this unused space wastes I/O and memory; if you reindex/reorganize, you will reduce the unused space.

If you rebuild or reorganize the table regularly, you should use an appropriate fill factor. The fill factor determines how much free space each page will have after the rebuild/reorganization. A well chosen fill factor can prevent page splits for the most common Inserts (and Updates) between rebuilds, yet keeps the free space limited.

Tip 10: Rebuild the table or index if there is a lot of unused space

Tip 11: Choose an appropriate Fill Factor if you reindex periodically

10. Column

A row or index key consists of columns to hold the actual data.

If the column uses a varying length data type, such as varchar or SPARSE columns, then the row may grow or shrink depending on the data. In such a case, UPDATEs can lead to page splits.

The cost of page splits is usually higher than the cost of unused space. Because of that, you can consider changing the varying column data type to fixed size data type, especially for columns with relatively small values (for example up to 20 or 30 bytes per value). For example, you could change a varchar(20) column into a char(20) column.

Note that from a performance point of view, you should never create varying length columns for values less than 5 bytes, because the equivalent fixed length data type will be more efficient.

As mentioned in the Table or Index section, if many rows or keys are inserted in "the middle" of the index, this undoubtedly leads to fragmentation as a result of page splits. The opposite is true as well. If the rows or keys are inserted at the beginning or end of the index, the split page will stay full, and the newly reserved page may not cause fragmentation.

Because of that, an ever increasing key such as an IDENTITY column or Transaction Date is preferred over some arbitrary key. A completely random key such as NEWID() is the worst choice for a key from a performance point of view, because it virtually ensures massive fragmentation.

If you are currently using the NEWID() function to create uniqueidentifier values, then you should consider replacing it with the NEWSEQUENTIALID() function, which is available on SQL Server 2005 and later. If you don't need a uniqueidentifer then it is better to replace it with an INT IDENTITY or BIGINT IDENTITY.

Tip 10: Rebuild the table or index if there is a lot of unused space

Tip 12: Consider changing (n)varchar columns to (n)char columns

Tip 13: Avoid NEWID() like the plague. If possible, replace it with IDENTITY

Real Life Performance Numbers

This last section shows some real life numbers, to illustrate the magnitude of the different expressions of fragmentation.

All results below are based on tables that occupy approximately 8 gigabytes of disk space. The reason for a database with such a decent size is to get avoid trivial performance differences, but more importantly to eliminate most caching mechanism.

Because proper testing is hard. In this day and age, every possible layer has some kind of caching mechanism in place. From CPU to hard disk controller and everything in between.

The script below will create a 7931072 row table occupying 7961760 KB (plus or minus a few 8 KB pages).

CREATE TABLE t
(id     int not null identity(1,10) primary key clustered
,val    int not null
,filler char(950) null
)

insert into t (val) values (1)
declare @i int
set @i=95
while @i>0 begin
  set @i=@i-1
  insert into t (val) select top 100000 val from t
  checkpoint
end

The following query is used to test the performance of a clustered index scan.

DBCC DROPCLEANBUFFERS
set statistics io on
go
declare @start datetime
set @start=GETDATE()

select MIN(val)
from t

select DATEDIFF(ms,@start,getdate()) as elapsed_milliseconds
go
set statistics io off

On one machine, this database was created as close to the start (outer circle) of the disk as possible, in the middle (middle circle) and end (inner cirle) of a 1 TB disk. None of the three databases have any fragmentation, so these are ideal circumstances. With 128 MB memory available to SQL Server, these were the results of the query.

Clustered Index Scan Performance Relative To Location On Disk
Database location	8 GB Table Scan
Start / Outer circle	~65 seconds
Middle	~66 seconds
End / Inner circle	~99 seconds

As you can see, on this particular disk, the database at the end of the database is 50% slower.

The table below shows the effect if different types of caching on a relatively modern Windows 8 system with 10 GB of available memory.

Effect of caching on performance
Description	Situation	8 GB Table Scan
Disk Controller caching (?)	128 MB for SQL Server, other memory occupied, empty Buffer Cache, first run	~74 seconds
	128 MB for SQL Server, other memory occupied, empty Buffer Cache, second run	~72 seconds
	128 MB for SQL Server, other memory occupied, empty Buffer Cache, third run	~71 seconds
	128 MB for SQL Server, other memory occupied, empty Buffer Cache, fourth run	~71 seconds
OS caching (?)	128 MB for SQL Server, other memory free, empty Buffer Cache, next run	~63 seconds
SQL Buffer Cache	8.5 GB for SQL Server, Buffer Cache not cleared, next run	~54 seconds
	8.5 GB for SQL Server, Buffer Cache not cleared, next run	~2.2 seconds
	8.5 GB for SQL Server, Buffer Cache not cleared, next run	~8.6 seconds
	8.5 GB for SQL Server, Buffer Cache not cleared, next run	~3.0 seconds
	8.5 GB for SQL Server, Buffer Cache not cleared, next run	~1.4 seconds

The next table shows the differences between different hard drives.

Different Results On Different Hard Drives
Storage Device	8 GB Table Scan
Relatively new desktop Hard Drive	~65 seconds
Relatively new laptop Hard Drive	~150 seconds
Relatively old desktop Hard Drive	~177 seconds
Relatively old desktop Solid State Drive	~38 seconds

What this table shows, it that there is a huge difference in performance, simply because of better hardware, possibly in combination with better software, who knows. It shows that if you want to know what is best for your system, that you should test on your system.

The table below demonstrates how much fragmentation can affect performance.

Fragmented vs Contiguous
Description	Characteristics	DB 1, HD 1	DB 2, HD 1	DB 3, HD 1	DB 1, HD 2	DB 1, HD 3	DB 1, SSD
Optimal situation	Table size: ~8 GB Rows: ~8 Mln Logical Scan Fragmentation: 0.37% Extent Scan Fragmentation: 0.01% Avg. Page Density (full): 95.55%	~1.1 minutes	~1.1 minutes	~1.7 minutes	~2.5 minutes	~3.0 minutes	~0.6 minutes
Light and simple fragmentation	Table size: ~8 GB Rows: ~8 Mln Logical Scan Fragmentation: 50.12% Extent Scan Fragmentation: 0.01% Avg. Page Density (full): 95.55%	~1.3 minutes	~1.3 minutes	~2.0 minutes	~3.1 minutes	~3.3 minutes	~0.6 minutes
Worst case row distribution	Table size: ~8 GB Rows: ~5.4 Mln Logical Scan Fragmentation: 99.22% Extent Scan Fragmentation: 0.01% Avg. Page Density (full): 65.05%	~46.9 minutes	~48.9 minutes	~54.2 minutes	~63.6 minutes	~79.3 minutes	~79.7 minutes

There are two interesting things this table shows.

First of all, it shows the massive impact fragmentation can have. In the Worst case row distribution scenario, every inserted row had a random key. As you can see, the query that would run in 65 seconds in the optimal situation now needs more than 46 minutes to run.

The second interesting part, is that you can see that this particular Solid State Drive on this particular system suffered just as much from the fragmentation as the Hard Drive on the same system.

The table below shows how defragmentation affects performance.

Before and after defragmentation
Description	Characteristics	8 GB Table Scan
Worst case page and extent distribution	Table size: ~8 GB Spread over: ~16 GB Rows: ~5.4 Mln Logical Scan Fragmentation: 99.24% Extent Scan Fragmentation: 82.24% Avg. Page Density (full): 65.03%	~79.3 minutes
After index reorganization (DBCC INDEXDEFRAG)	Table size: ~5.4 GB Spread over: ~16 GB Rows: ~5.4 Mln Logical Scan Fragmentation: 0.28% Extent Scan Fragmentation: 83.24% Avg. Page Density (full): 95.40%	~4.1 minutes
After rebuilding the index (DBCC DBREINDEX)	Table size: ~5.4 GB Spread over: ~5.4 GB Rows: ~5.4 Mln Logical Scan Fragmentation: 0.01% Extent Scan Fragmentation: 0.14% Avg. Page Density (full): 95.55%	~1.7 minutes

The table demonstrates that DBCC INDEXDEFRAG (or ALTER INDEX .. REORGANIZE) only removes page fragmentation but doesn't reduce extent fragmentation. It also highlights the possible performance impact of such extent fragmentation.