The purpose of the following query is to retrieve all OrderLines that have a corresponding Product.
One would normally not write such a query. However, it is not uncommon to have a view with such a join,
and to only select columns from one of the tables in the view. Such scenario comes down to the query below.
CREATE TABLE Products(product_id int PRIMARY KEY,filler char(7000))
CREATE TABLE OrderLines
(id int PRIMARY KEY
,product_id int NOT NULL REFERENCES Products(product_id)
)
Table Products is a wide table with 4 rows
Table OrderLines is a narrow table with 8 rows, and has a foreign key constraint on a column that does not allows NULLs,
and that column references Products.
SELECT OrderLines.*
FROM OrderLines
INNER JOIN Products
ON Products.product_id = Orderlines.product_id
SQL Server 7.0
So let's see what SQL Server 7.0 did:
It looks up the Products row for each OrderLines row.
Obviously, there is no need to access table Products, because the foreign key constraint guarantees that
all product_ids will match, and that the join to Products will not cause duplication of rows, since there is
always exactly one matching row.
SQL Server 2000
SQL Server 2000 knows what to do with this query!
It only accesses the OrderLines tables, and does not needlessly look up rows in the Products table.
Unfortunately, this optimization only seems to take place for single column Primary Key/Foreign Key relations!
There is a similar scenario to the one above. It starts the same. A SELECT that only selects from OrderLines,
but joins to Products. The difference is, that in this case, column OrderLines.product_id is allowed to be NULL.
CREATE TABLE Products(product_id int PRIMARY KEY,filler char(7000))
CREATE TABLE OrderLines
(id int PRIMARY KEY
,product_id int NULL REFERENCES Products(product_id)
)
Table Products is a wide table with 64 rows
Table OrderLines is a narrow table with 112 rows, and has a foreign key constraint on a column that allows NULLs.
SELECT OrderLines.*
FROM OrderLines
INNER JOIN Products
ON Products.product_id = Orderlines.product_id
So what does SQL Server 2000 do this time?
For this variation, it hasn't gotten any smarter than its predecessor.
SQL Server 2005
SQL Server 2005 does only slightly better:
It still accesses all rows of table Products.
The improvement is in the fact that each row in Products will be accessed no more than 1 time.
Optimal solution
Getting the optimal query plan requires a rewrite. The optimal solution would simply add the predicate
product_id IS NOT NULL
and scan the OrderLines table.
SELECT OrderLines.*
FROM OrderLines
WHERE product_id IS NOT NULL
Like the example where product_id does not allow NULLs, this query plan only accesses the OrderLines table.
We will see when SQL Server's optimizer is smart enough to do this rewrite. Maybe in 2008?