Date range scans

13-MAR-2013
If you have a table with a FROM_DATE and TO_DATE column, and you want to query all rows that overlap a @FROM until @TO period, then this typically results in a partial index scan, and worst case in a table scan. In other words, usually it is not very fast.

This article shows example implementations of the Relational Interval tree technique for date and for datetime data types. It is based on work by Kriegel, Pötke and Seidl (Managing Intervals Efficiently in Object-Relational Databases), Laurent Martin (A Static Relational Interval Tree) and Itzik Ben-Gan (Efficient Interval Management in Microsoft SQL Server).

Key points for implementing Relational Interval Tree

The implementation for a Relational Interval tree requires you to add a "node" column, which represents a fork in the Relational Interval tree. This "node" column is used to quickly locate rows within the desired range.

To make access to this "node" column fast, two indexes are required. One index is tailored for the left part of the tree, and one index for the right part of the tree.

To determine the relevant nodes for you date range two functions are needed. One function is tailored for the left and middle part of the tree, and one function for the right part of the tree.

After this setup, to efficiently select date ranges, your query needs slight changes to make use of the Relational Interval Tree. You will have to join your table with the leftAndMiddleNodes function and again join your table with the rightNodes function and UNION ALL these two sets into one result set.

For more information on how Relational Interval Trees work, see the references above.

Datetime ranges

Below is a setup for datetime ranges, with a precision of 1 second, and a date range from 1970-01-01 00:00:01 thru 2038-01-19 03:14:07 (68 years). If you need a different date range, simply replace '19700101' with your new lower bound in the functions and table definition.

Below is a table with a datetime range (from "lower" to "upper"). The constraints in the table definition are to make sure that "lower" and "upper" actually signify a range, and that both "lower" and "upper" don't contain milliseconds, but only dates and times up to (and including) the second.
CREATE TABLE dbo.IntervalDatetimesRIT
(
  id    INT      NOT NULL,
  node  AS datediff(second,convert(datetime,'19700101',112),upper)
           - datediff(second,convert(datetime,'19700101',112),upper)
           % POWER(2, FLOOR(LOG((datediff(second,convert(datetime,'19700101',112),lower) - 1)
                                 ^ datediff(second,convert(datetime,'19700101',112),upper))/log(2))) PERSISTED NOT NULL,
  lower DATETIME NOT NULL,
  upper DATETIME NOT NULL,
  CONSTRAINT PK_IntervalDatetimesRIT PRIMARY KEY(id),
  CONSTRAINT CHK_IntervalDatetimesRIT_upper_ge_lower CHECK(upper >= lower),
  CONSTRAINT CHK_IntervalDatetimesRIT_lower_ms CHECK(DATEPART(ms,lower)=0),
  CONSTRAINT CHK_IntervalDatetimesRIT_upper_ms CHECK(DATEPART(ms,upper)=0)
);

CREATE INDEX node_lower_13313 ON dbo.IntervalDatetimesRIT(node, lower);
CREATE INDEX node_upper_13313 ON dbo.IntervalDatetimesRIT(node, upper);
The indexes in the above declarations are essential. If you have ranges per something (for example datetime ranges per "my_group") then you may have to add this something ("my_group") as leading column to both indexes.

Next, the two functions that you'll need in your queries

CREATE FUNCTION dbo.leftAndMiddleNodes(@lower AS DATETIME, @upper AS DATETIME)
  RETURNS @T TABLE
  (
     minNode INT NOT NULL
    ,maxNode INT NOT NULL
  )
AS
BEGIN
  DECLARE @node AS INT = 1073741824;
  DECLARE @step AS INT = @node / 2;

  WHILE @step >= 1
  BEGIN
    IF DATEADD(second,@node,convert(datetime,'19700101',112)) > @lower
      SET @node -= @step;
    ELSE IF @lower > DATEADD(second,@node,convert(datetime,'19700101',112))
    BEGIN
      INSERT INTO @T VALUES(@node,@node);
      SET @node += @step;
    END
    ELSE
      BREAK;
    SET @step /= 2;
  END;
  
  INSERT INTO @T VALUES(DATEDIFF(second,convert(datetime,'19700101',112),@lower),DATEDIFF(second,convert(datetime,'19700101',112),@upper));

  RETURN;
END;
CREATE FUNCTION dbo.rightNodes(@lower AS DATETIME, @upper AS DATETIME)
  RETURNS @T TABLE
  (
    node INT NOT NULL PRIMARY KEY
  )
AS
BEGIN
  DECLARE @node AS INT = 1073741824;
  DECLARE @step AS INT = @node / 2;

  WHILE @step >= 1
  BEGIN
    IF @upper > DATEADD(second,@node,convert(datetime,'19700101',112))
      SET @node += @step
    ELSE IF DATEADD(second,@node,convert(datetime,'19700101',112)) > @upper
    BEGIN
      INSERT INTO @T(node) VALUES(@node);
      SET @node -= @step
    END
    ELSE
      BREAK;
    SET @step /= 2;
  END;

  RETURN;
END;
And finally, the queries with and without the Relational Interval Tree.
/* Original/classic/slow query
DECLARE @l AS DATETIME = '20000701 00:00:00', @u AS DATETIME = '20000704 14:25:18';

SELECT id
FROM dbo.IntervalDatetimesRIT
WHERE upper >= @l AND @u >= lower
*/

-- Optimized query, using the Relational Interval Tree
DECLARE @l AS DATETIME = '20000701 00:00:00', @u AS DATETIME = '20000704 14:25:18';

SELECT I.id
FROM dbo.IntervalDatetimesRIT AS I
JOIN dbo.leftAndMiddleNodes(@l, @u) AS L
  ON  I.node BETWEEN L.minNode AND L.maxNode
  AND I.upper >= @l

UNION ALL

SELECT I.id
FROM dbo.IntervalDatetimesRIT AS I
JOIN dbo.rightNodes(@l, @u) AS R
  ON  I.node = R.node
  AND @u >= I.lower

Date ranges

Below is a setup for date ranges, with a precision of 1 day, and a date range from 1910-04-16 thru 2089-09-17 (179 years). If you need a different date range, simply replace '19100415' with your new lower bound in the functions and table definition.

Below is a table with a date range (from "lower" to "upper"). The constraints in the table definition are to make sure that "lower" and "upper" actually signify a range.
CREATE TABLE dbo.IntervalDatesRIT
(
  id    INT NOT NULL,
  node  AS cast(datediff(day,convert(date,'19100415',112),upper)
                - datediff(day,convert(date,'19100415',112),upper)
                % POWER(2, FLOOR(LOG((datediff(day,convert(date,'19100415',112),lower) - 1)
                                      ^ datediff(day,convert(date,'19100415',112),upper))/log(2)))
                -32768 as smallint) PERSISTED NOT NULL,
  lower DATE NOT NULL,
  upper DATE NOT NULL,
  CONSTRAINT PK_IntervalDatesRIT PRIMARY KEY(id),
  CONSTRAINT CHK_IntervalDatesRIT_upper_ge_lower CHECK(upper >= lower)
);

CREATE INDEX node_lower_13310 ON dbo.IntervalDatesRIT(node, lower);
CREATE INDEX node_upper_13310 ON dbo.IntervalDatesRIT(node, upper);
The indexes in the above declarations are essential. If you have ranges per something (for example datetime ranges per "my_group") then you may have to add this something ("my_group") as leading column to both indexes.

Next, the two functions that you'll need in your queries

CREATE FUNCTION dbo.leftAndMiddleNodes(@lower AS DATE, @upper AS DATE)
  RETURNS @T TABLE
  (
     minNode SMALLINT NOT NULL
    ,maxNode SMALLINT NOT NULL
  )
AS
BEGIN
  DECLARE @node AS INT = 32768;
  DECLARE @step AS INT = @node / 2;

  WHILE @step >= 1
  BEGIN
    IF DATEADD(day,@node,convert(date,'19100415',112)) > @lower
      SET @node -= @step;
    ELSE IF @lower > DATEADD(day,@node,convert(date,'19100415',112))
    BEGIN
      INSERT INTO @T VALUES(@node-32768,@node-32768);
      SET @node += @step;
    END
    ELSE
      BREAK;
    SET @step /= 2;
  END;
  INSERT INTO @T VALUES(DATEDIFF(day,convert(date,'19100415',112),@lower)-32768,DATEDIFF(day,convert(date,'19100415',112),@upper)-32768);

  RETURN;
END;
CREATE FUNCTION dbo.rightNodes(@lower AS DATE, @upper AS DATE)
  RETURNS @T TABLE
  (
    node SMALLINT NOT NULL PRIMARY KEY
  )
AS
BEGIN
  DECLARE @node AS INT = 32768;
  DECLARE @step AS INT = @node / 2;

  WHILE @step >= 1
  BEGIN
    IF @upper > DATEADD(day,@node,convert(date,'19100415',112))
      SET @node += @step
    ELSE IF DATEADD(day,@node,convert(date,'19100415',112)) > @upper
    BEGIN
      INSERT INTO @T(node) VALUES(@node-32768);
      SET @node -= @step
    END
    ELSE
      BREAK;
    SET @step /= 2;
  END;

  RETURN;
END;
And finally, the queries with and without the Relational Interval Tree.
/* Original/classic/slow query
DECLARE @l AS DATE = '20000701', @u AS DATE = '20000704';

SELECT id
FROM dbo.IntervalDatesRIT
WHERE upper >= @l AND @u >= lower
*/

-- Optimized query, using the Relational Interval Tree
DECLARE @l AS DATE = '20000701', @u AS DATE = '20000704';

SELECT I.id
FROM dbo.IntervalDatesRIT AS I
JOIN dbo.leftAndMiddleNodes(@l, @u) AS L
  ON  I.node BETWEEN L.minNode AND L.maxNode
  AND I.upper >= @l

UNION ALL

SELECT I.id
FROM dbo.IntervalDatesRIT AS I
JOIN dbo.rightNodes(@l, @u) AS R
  ON  I.node = R.node
  AND @u >= I.lower


Back to SQL Server main menu. Mail your comments to gertjans@xs4all.nl.