No actual columns selected from referenced table

06-JUL-2008, last improvement 08-AUG-2011
The purpose of the following query is to retrieve all OrderLines that have a corresponding Product.
One would normally not write such a query. However, it is not uncommon to have a view with such a join, and to only select columns from one of the tables in the view. Such scenario comes down to the query below.

CREATE TABLE Products(product_id int PRIMARY KEY,filler char(7000))

CREATE TABLE OrderLines
(id         int          PRIMARY KEY
,product_id int NOT NULL REFERENCES Products(product_id)
)
Table Products is a wide table with 4 rows
Table OrderLines is a narrow table with 8 rows, and has a foreign key constraint on a column that does not allows NULLs, and that column references Products.
SELECT OrderLines.*
FROM       OrderLines
INNER JOIN Products
  ON  Products.product_id = Orderlines.product_id

SQL Server 7.0

So let's see what SQL Server 7.0 did:
Loop Join

It looks up the Products row for each OrderLines row.

Obviously, there is no need to access table Products, because the foreign key constraint guarantees that all product_ids will match, and that the join to Products will not cause duplication of rows, since there is always exactly one matching row.

SQL Server 2000

SQL Server 2000 knows what to do with this query!
Product is not accessed

It only accesses the OrderLines tables, and does not needlessly look up rows in the Products table.

Unfortunately, this optimization only seems to take place for single column Primary Key/Foreign Key relations!

There is a similar scenario to the one above. It starts the same. A SELECT that only selects from OrderLines, but joins to Products. The difference is, that in this case, column OrderLines.product_id is allowed to be NULL.

CREATE TABLE Products(product_id int PRIMARY KEY,filler char(7000))

CREATE TABLE OrderLines
(id         int          PRIMARY KEY
,product_id int     NULL REFERENCES Products(product_id)
)
Table Products is a wide table with 64 rows
Table OrderLines is a narrow table with 112 rows, and has a foreign key constraint on a column that allows NULLs.
SELECT OrderLines.*
FROM       OrderLines
INNER JOIN Products
  ON  Products.product_id = Orderlines.product_id
So what does SQL Server 2000 do this time?
Once again: a Loop Join

For this variation, it hasn't gotten any smarter than its predecessor.

SQL Server 2005

SQL Server 2005 does only slightly better:
Sorted OrderLines, merge joined to Products

It still accesses all rows of table Products. The improvement is in the fact that each row in Products will be accessed no more than 1 time.

Optimal solution

Getting the optimal query plan requires a rewrite. The optimal solution would simply add the predicate
product_id IS NOT NULL
and scan the OrderLines table.
SELECT OrderLines.*
FROM   OrderLines
WHERE product_id IS NOT NULL
Filtered Clustered Index Scan

Like the example where product_id does not allow NULLs, this query plan only accesses the OrderLines table.

We will see when SQL Server's optimizer is smart enough to do this rewrite. Maybe in 2008?


Back to SQL Server optimizer improvements. Mail your comments to gertjans@xs4all.nl.