Multiple Approaches to Retrieve the Top Row per Group in SQL

Oct 29, 2025 · Programming · 11 views · 7.8

Keywords: SQL_Group_Query | Window_Function | ROW_NUMBER | CROSS_APPLY | Database_Design

Abstract: This technical paper comprehensively analyzes various methods for retrieving the first row from each group in SQL, with emphasis on ROW_NUMBER() window function, CROSS APPLY operator, and TOP WITH TIES approach. Through detailed code examples and performance comparisons, it provides practical guidance for selecting optimal solutions in different scenarios. The paper also discusses database normalization trade-offs and implementation considerations.

Problem Context and Requirements

In database querying, there is frequent need to retrieve the first row from each group of data. This requirement is particularly common in scenarios such as status tracking and historical record queries. Taking a document status log table as an example, which records status change history for each document, we need to obtain the latest status record for each document.

Core Solution: ROW_NUMBER() Window Function

The ROW_NUMBER() window function is the most commonly used and flexible method for retrieving the first row per group. This function assigns unique sequential numbers to rows within each partition, and by filtering for rows with number 1, we can obtain the first row of each partition.

WITH cte AS (
   SELECT *,
         ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
   FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1

The above code first creates a Common Table Expression (CTE) using the ROW_NUMBER() function partitioned by DocumentID and ordered by DateCreated in descending order. ROW_NUMBER() assigns sequential numbers starting from 1 to rows within each partition, with the most recent date record receiving number 1. The outer query filters for the first row of each partition using WHERE rn = 1.

Alternative Approach: CROSS APPLY Operator

CROSS APPLY is a SQL Server-specific operator that can be used to correlate subqueries, particularly suitable for retrieving the first row per group.

SELECT d.DocumentID, ds.Status, ds.DateCreated 
FROM Documents AS d 
CROSS APPLY 
    (SELECT TOP 1 Status, DateCreated
     FROM DocumentStatusLogs 
     WHERE DocumentID = d.DocumentID
     ORDER BY DateCreated DESC) AS ds

This method starts from the parent Documents table and uses CROSS APPLY to retrieve the latest status record for each document. The subquery uses a combination of TOP 1 and ORDER BY to obtain the most recent record for each document.

Concise Approach: TOP WITH TIES

TOP WITH TIES combined with ROW_NUMBER() provides more concise syntax:

SELECT TOP 1 WITH TIES
   DocumentID,
   Status,
   DateCreated
FROM DocumentStatusLogs
ORDER BY ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC)

This approach leverages the characteristic of TOP WITH TIES, which returns all tied rows when there are ties in the ORDER BY clause. By using the sequence number generated by ROW_NUMBER() as the ordering basis, it ensures that only the first row of each group is returned.

Performance Comparison and Applicable Scenarios

The ROW_NUMBER() method typically offers good performance when handling large datasets, especially when multiple columns need to be included in the results. The CROSS APPLY method is efficient when correlating with parent table queries but is limited to SQL Server environments. TOP WITH TIES has concise syntax but poorer readability in complex queries.

For cases with tied values, ROW_NUMBER() arbitrarily selects one row, while DENSE_RANK() or RANK() can handle tied situations. If all tied first rows need to be returned, DENSE_RANK() should be used:

WITH cte AS (
   SELECT *,
         DENSE_RANK() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS dr
   FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE dr = 1

Database Design Considerations

From a database normalization perspective, storing status history separately in a log table is a reasonable design, as it completely records status change history. If frequent queries for the latest status are needed, consider adding a CurrentStatus field in the parent table for denormalization, but this requires additional maintenance logic (such as triggers) to ensure data consistency.

Example of denormalized parent table:

CREATE TABLE Documents (
   DocumentID INT PRIMARY KEY,
   Title VARCHAR(255),
   Content TEXT,
   DateCreated DATETIME,
   CurrentStatus VARCHAR(50)
)

Practical Implementation Recommendations

When selecting specific implementation methods, factors such as database system version, data volume size, and query frequency should be considered. For SQL Server 2005 and later versions, the ROW_NUMBER() method is recommended due to its powerful functionality and good performance. CROSS APPLY is a good choice when correlation with parent table queries is needed. For simple first-row queries, TOP WITH TIES provides the most concise syntax.

Regardless of the chosen method, it is recommended to establish appropriate indexes on DocumentID and DateCreated columns to improve query performance. For status log tables, a composite index (DocumentID, DateCreated DESC) can significantly optimize the performance of first-row per group queries.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.