Using correlation to improve query performance

SQL Server doesn't keep statistics on the correlation between nonclustered indexes and the clustered index (with one exception, see DATE_CORRELATION_OPTIMIZATION in Books Online). Instead, the optimizer assumes it has a low correlation; it assumes that a range of nonclustered index values is scattered all over the clustered index.

This assumption affects the optimizer's decision whether or not to use the nonclustered index. If there is a high correlation, the optimizer will overestimate the cost of using the nonclustered index, which can cause it to disqualify the index from the query plan evaluation, resulting in a suboptimal query plan. The performance difference can be big, even by orders of magnitude.

SQL Server MVP Deep Dives

I have written an article about this topic end of 2008 for the book SQL Server MVP Deep Dives, chapter 46 to be exact. It explains what it means to have a high correlation with the clustered index, why the optimizer can misjudge such situations, how to determine the correlation for your situation, and how to optimize your queries accordingly.

Back to SQL Server main menu. Mail your comments to