Solving Department Change Time Periods with ROW_NUMBER() and CROSS APPLY in SQL Server: A Gaps-and-Islands Approach

Keywords: SQL Server | ROW_NUMBER() | CROSS APPLY | Gaps-and-Islands | Time Series Analysis

Abstract: This paper delves into the classic Gaps-and-Islands problem in SQL Server when handling employee department change histories. Through a detailed case study, it demonstrates how to combine the ROW_NUMBER() window function with CROSS APPLY operations to identify continuous time periods and generate start and end dates for each department. The article explains the core algorithm logic, including data sorting, group identification, and endpoint calculation, while providing complete executable code examples. This method avoids simple partitioning limitations and is suitable for complex time-series data analysis scenarios.

In database applications, handling continuous intervals in time-series data, known as the Gaps-and-Islands problem, is a common challenge. This paper explores how to use the ROW_NUMBER() window function and CROSS APPLY operations in SQL Server 2008 R2 and later versions to address employee department change time periods, based on a specific case study.

Problem Background and Data Example

Assume an EmployeeHistory table that records employee department and supervisor changes over time. Sample data is as follows:

EmployeeID Date      DepartmentID SupervisorID
10001      20130101  001          10009
10001      20130909  001          10019
10001      20131201  002          10018
10001      20140501  002          10017
10001      20141001  001          10015
10001      20141201  001          10014

Employee 10001 changes departments from 001 to 002 and back to 001. The goal is to output the start and end dates for each continuous department period:

EmployeeID DateStart DateEnd  DepartmentID 
10001      20130101  20131201 001
10001      20131201  20141001 002
10001      20141001  NULL     001

A simple ROW_NUMBER() OVER (PARTITION BY EmployeeID, DepartmentID ORDER BY Date) cannot directly solve this, as department IDs repeat.

Core Solution Overview

The best answer employs a multi-step approach, combining temporary tables, ROW_NUMBER(), and CROSS APPLY. Key steps include: data sorting and initialization, group identification calculation, end date determination, and result filtering.

Detailed Implementation Steps

First, create a test table and insert data:

CREATE TABLE Source
(
  EmployeeID int,
  DateStarted date,
  DepartmentID int
)

INSERT INTO Source
VALUES
(10001,'2013-01-01',001),
(10001,'2013-09-09',001),
(10001,'2013-12-01',002),
(10001,'2014-05-01',002),
(10001,'2014-10-01',001),
(10001,'2014-12-01',001)

Next, use ROW_NUMBER() to sort the data and initialize a temporary table:

SELECT *, 
  ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY DateStarted) AS EntryRank,
  newid() as GroupKey,
  CAST(NULL AS date) AS EndDate
INTO #RankedData
FROM Source
;

Here, EntryRank numbers each employee record in chronological order, GroupKey is used for subsequent grouping, and EndDate is initialized as NULL.

Group Identification Calculation

Use CROSS APPLY and NOT EXISTS conditions to determine the start group key for each record:

UPDATE #RankedData
SET GroupKey = beginDate.GroupKey
FROM #RankedData sup
  CROSS APPLY 
  (
    SELECT TOP 1 GroupKey
    FROM #RankedData sub 
    WHERE sub.EmployeeID = sup.EmployeeID AND
      sub.DepartmentID = sup.DepartmentID AND
      NOT EXISTS 
        (
          SELECT * 
          FROM #RankedData bot 
          WHERE bot.EmployeeID = sup.EmployeeID AND
            bot.EntryRank BETWEEN sub.EntryRank AND sup.EntryRank AND
            bot.DepartmentID <> sup.DepartmentID
        )
      ORDER BY DateStarted ASC
    ) beginDate (GroupKey);

This step ensures that records within the same continuous time period share the same GroupKey, achieved by checking for department changes in between.

End Date Determination

Calculate the end date for each time period:

UPDATE #RankedData
SET EndDate = nextGroup.DateStarted
FROM #RankedData sup
  CROSS APPLY 
  (
    SELECT TOP 1 DateStarted
    FROM #RankedData sub
    WHERE sub.EmployeeID = sup.EmployeeID AND
      sub.DepartmentID <> sup.DepartmentID AND
      sub.EntryRank > sup.EntryRank
    ORDER BY EntryRank ASC
  ) nextGroup (DateStarted);

Here, EndDate is set to the start date of the next record with a different department, with the last period as NULL.

Result Filtering and Output

Finally, filter the first record of each group for output:

SELECT * FROM 
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY GroupKey ORDER BY EntryRank ASC) AS GroupRank FROM #RankedData
) FinalRanking
WHERE GroupRank = 1
ORDER BY EntryRank;

The output matches the expected results, showing continuous time periods for each department.

Supplementary References and Optimizations

Other answers mention using self-join methods, such as:

;WITH x 
 AS (SELECT *, 
            Row_number() 
              OVER( 
                partition BY employeeid 
                ORDER BY datestart) rn 
     FROM   employeehistory) 
SELECT * 
FROM   x x1 
   LEFT OUTER JOIN x x2 
                ON x1.rn = x2.rn + 1

This method identifies change points by joining adjacent rows but may require additional processing to aggregate time periods. In contrast, the main solution is clearer and more extensible.

Performance and Applicability Analysis

This solution is suitable for SQL Server 2008 and later, leveraging the efficiency of window functions and CROSS APPLY. For large datasets, consider indexing optimizations, such as creating indexes on EmployeeID and DateStarted. Time complexity primarily depends on sorting and join operations, typically O(n log n).

Conclusion

By combining ROW_NUMBER() and CROSS APPLY, the Gaps-and-Islands problem can be effectively solved, especially in complex time-series scenarios like department changes. This approach not only provides accurate start and end date calculations but also demonstrates the flexible application of SQL Server's advanced features. In practice, adjustments can be made based on specific needs, such as handling multiple employees or adding more filter conditions.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.