28-NOV-2009
SQL Server doesn't keep statistics on the correlation between nonclustered indexes
and the clustered index (with one exception, see DATE_CORRELATION_OPTIMIZATION in Books Online).
Instead, the optimizer assumes it has a low correlation; it assumes that a range of
nonclustered index values is scattered all over the clustered index.
This assumption affects the optimizer's decision whether or not to use the nonclustered index.
If there is a high correlation, the optimizer will overestimate the cost of using the
nonclustered index, which can cause it to disqualify the index from the query plan evaluation,
resulting in a suboptimal query plan. The performance difference can be big, even by orders
of magnitude.
SQL Server MVP Deep Dives
I have written an article about this topic end of 2008 for the book
SQL Server MVP Deep Dives, chapter 46 to
be exact. It explains what it means to have a high correlation with the clustered index, why
the optimizer can misjudge such situations, how to determine the correlation for your situation,
and how to optimize your queries accordingly.