Keywords: Oracle Database | String Concatenation | SYS_CONNECT_BY_PATH | ROW_NUMBER | LISTAGG Function
Abstract: This article provides an in-depth exploration of various methods to achieve multi-row string concatenation in Oracle databases without using stored procedures. It focuses on the hierarchical query approach based on ROW_NUMBER and SYS_CONNECT_BY_PATH, detailing its implementation principles, performance characteristics, and applicable scenarios. The paper compares the advantages and disadvantages of LISTAGG and WM_CONCAT functions, offering complete code examples and performance optimization recommendations. It also discusses strategies for handling string length limitations, providing comprehensive technical references for developers implementing efficient data aggregation in practical projects.
Technical Background and Problem Definition
In database application development, there is often a need to concatenate multiple rows of data into comma-separated strings based on specific groupings. This requirement is particularly common in scenarios such as report generation, data export, and front-end display. Oracle Database provides multiple implementation methods, but the selection strategies vary significantly across different versions and scenarios.
Core Solution: Hierarchical Query Aggregation
The hierarchical query approach based on ROW_NUMBER window function and SYS_CONNECT_BY_PATH is a classic solution for string concatenation. The advantage of this method lies in its good compatibility, suitable for all versions from Oracle 10g onwards. Its core concept involves simulating tree structure traversal to gradually build concatenated strings.
The specific implementation code is as follows:
SELECT question_id,
LTRIM(MAX(SYS_CONNECT_BY_PATH(element_id,','))
KEEP (DENSE_RANK LAST ORDER BY curr),',') AS elements
FROM (SELECT question_id,
element_id,
ROW_NUMBER() OVER (PARTITION BY question_id ORDER BY element_id) AS curr,
ROW_NUMBER() OVER (PARTITION BY question_id ORDER BY element_id) -1 AS prev
FROM your_table)
GROUP BY question_id
CONNECT BY prev = PRIOR curr AND question_id = PRIOR question_id
START WITH curr = 1;Code analysis: First, the inner query generates sequential numbers for element_id within each question_id group, where curr represents the current row number and prev represents the previous row number. The outer query uses CONNECT BY to build hierarchical relationships, accumulating strings through the SYS_CONNECT_BY_PATH function during traversal. The LTRIM function is used to remove extra commas at the beginning of the result string.
Alternative Solution Comparison
The LISTAGG function is an official string aggregation function introduced in Oracle 11gR2, with concise and clear syntax:
SELECT question_id,
LISTAGG(element_id, ',') WITHIN GROUP (ORDER BY element_id)
FROM your_table
GROUP BY question_id;The advantages of this function include high execution efficiency and strong code readability. However, it has limitations when handling extremely long strings, with VARCHAR2 type supporting a maximum of 4000 characters. Oracle 12cR2 version added the ON OVERFLOW clause to address this issue.
Although the WM_CONCAT function was available in some older versions, it is an unsupported internal function that has been removed in Oracle 12c and is not recommended for production environments.
Performance Optimization and Best Practices
In practical applications, data volume size and performance requirements must be considered. For aggregation operations with large data volumes, it is recommended to: establish appropriate indexes on element_id to improve sorting efficiency; pre-evaluate data scale and adopt corresponding handling strategies for situations that may exceed the 4000-character limit.
Referencing optimization ideas from stored procedure scenarios, performance can be significantly improved by reducing database round trips. Although this article discusses non-stored procedure solutions, this optimization philosophy is equally applicable to other database operation scenarios.
Application Scenario Expansion
String concatenation technology is not only applicable to simple ID splicing but can also be extended to more complex business scenarios. For example, when generating CSV format reports, multiple fields can be aggregated; when building dynamic SQL statements, table names or condition lists can be aggregated.
It is important to note that when parsing concatenated strings in applications, the uniqueness and consistency of separators should be ensured to avoid parsing errors. For data containing special characters, escape handling mechanisms also need to be considered.