Technical Implementation of Converting Comma-Separated Strings into Individual Rows in SQL Server

Nov 03, 2025 · Programming · 17 views · 7.8

Keywords: SQL Server | String Splitting | Recursive CTE | Comma Separated | Data Normalization

Abstract: This paper comprehensively examines multiple technical approaches for splitting comma-separated strings into individual rows in SQL Server 2008. It provides in-depth analysis of recursive CTE implementation principles and compares alternative methods including XML parsing and Tally table approaches. Through complete code examples and performance analysis, it offers practical solutions for handling denormalized data storage scenarios while discussing applicability and limitations of each method.

Problem Context and Challenges

Storing comma-separated strings in relational databases is generally considered a violation of normalization principles. However, in practical application scenarios, particularly in legacy systems or specific business requirements, this data storage approach persists. The core challenge addressed in this paper is how to effectively split comma-separated values stored in single fields into independent row records within SQL Server 2008 environment.

Detailed Analysis of Recursive CTE Method

Recursive Common Table Expressions (CTE) provide an elegant solution for string splitting. This method employs recursive iteration to progressively extract each element from the string.

WITH tmp(SomeID, OtherID, DataItem, String) AS
(
    SELECT
        SomeID,
        OtherID,
        LEFT(String, CHARINDEX(',', String + ',') - 1),
        STUFF(String, 1, CHARINDEX(',', String + ','), '')
    FROM Testdata
    UNION all

    SELECT
        SomeID,
        OtherID,
        LEFT(String, CHARINDEX(',', String + ',') - 1),
        STUFF(String, 1, CHARINDEX(',', String + ','), '')
    FROM tmp
    WHERE
        String > ''
)
SELECT
    SomeID,
    OtherID,
    DataItem
FROM tmp
ORDER BY SomeID;

The implementation mechanism comprises two critical components: the anchor query handles initial processing and extracts the first element, while the recursive query continuously processes the remaining string until all elements are extracted. The CHARINDEX function locates delimiter positions, while LEFT and STUFF functions collaborate to handle string truncation and remainder processing.

Comparative Analysis of Alternative Technical Approaches

XML Parsing Method

The XML approach converts comma-separated strings into XML format and utilizes XPath queries to extract individual elements:

SELECT A.OtherID,  
     Split.a.value('.', 'VARCHAR(100)') AS Data  
FROM  
(
    SELECT OtherID,  
        CAST ('<M>' + REPLACE(Data, ',', '</M><M>') + '</M>' AS XML) AS Data  
    FROM Table1
) AS A CROSS APPLY Data.nodes ('/M') AS Split(a);

This method performs well with medium-length strings but may encounter limitations when processing extremely long strings or special characters.

Tally Table Function Method

For scenarios requiring high-performance processing, the Tally table method offers an optimized solution:

CREATE FUNCTION [dbo].[DelimitedSplit8K]
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
WITH E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
               SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
               SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
E2(N) AS (SELECT 1 FROM E1 a, E1 b),
E4(N) AS (SELECT 1 FROM E2 a, E2 b),
cteTally(N) AS (SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4),
cteStart(N1) AS (SELECT 1 UNION ALL SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter),
cteLen(N1,L1) AS(SELECT s.N1, ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000) FROM cteStart s)
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
       Item = SUBSTRING(@pString, l.N1, l.L1) FROM cteLen l;

Performance Considerations and Best Practices

The recursive CTE method in SQL Server 2008 defaults to a recursion depth limit of 100 levels. For strings containing more than 100 elements, explicit MAXRECURSION option setting is required. In practical applications, it's recommended to select appropriate methods based on data characteristics: recursive CTE suits medium-complexity splitting requirements, Tally table method fits high-performance scenarios, while XML method shows advantages in processing structured data.

Improvements in Modern SQL Server Versions

Starting from SQL Server 2016, Microsoft introduced the built-in STRING_SPLIT function, significantly simplifying string splitting operations:

SELECT OtherID, cs.Value AS SplitData
FROM yourtable
CROSS APPLY STRING_SPLIT(Data, ',') cs

This built-in function demonstrates substantial improvements in both performance and usability, recommended as the primary choice in systems supporting this version.

Application Scenarios and Limitations

The technologies discussed in this paper primarily apply to the following scenarios: legacy system maintenance, rapid prototyping development, and temporary solutions for specific business requirements. It's important to note that, from a long-term perspective, normalizing data into appropriate relational models remains the recommended best practice.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.