T-SQL String Splitting Implementation Methods in SQL Server 2008 R2

Abstract: This article provides a comprehensive analysis of various technical approaches for implementing string splitting in SQL Server 2008 R2 environments. It focuses on user-defined functions based on WHILE loops, which demonstrate excellent compatibility and stability. Alternative solutions using number tables and recursive CTEs are also discussed, along with the built-in STRING_SPLIT function introduced in SQL Server 2016. Through complete code examples and performance comparisons, the article offers practical string splitting solutions for users of different SQL Server versions.

Introduction

String splitting is a common and essential requirement in database development, particularly when dealing with comma-separated list data that needs to be broken down into individual values. SQL Server 2008 R2, as a widely used database version, lacks built-in string splitting functions, necessitating developer-implemented solutions.

WHILE Loop Split Function Implementation

The user-defined function based on WHILE loops represents the most classic and stable string splitting method. This approach centers on iterating through the string, identifying delimiter positions, and extracting substrings sequentially.

CREATE FUNCTION dbo.splitstring ( @stringToSplit VARCHAR(MAX) )
RETURNS
 @returnList TABLE ([Name] [nvarchar] (500))
AS
BEGIN
 DECLARE @name NVARCHAR(255)
 DECLARE @pos INT

 WHILE CHARINDEX(',', @stringToSplit) > 0
 BEGIN
  SELECT @pos  = CHARINDEX(',', @stringToSplit)  
  SELECT @name = SUBSTRING(@stringToSplit, 1, @pos-1)

  INSERT INTO @returnList 
  SELECT @name

  SELECT @stringToSplit = SUBSTRING(@stringToSplit, @pos+1, LEN(@stringToSplit)-@pos)
 END

 INSERT INTO @returnList
 SELECT @stringToSplit

 RETURN
END

The function operates by first declaring necessary variables, including @name for storing the current substring and @pos for recording the delimiter position. Within the WHILE loop, the CHARINDEX function locates the comma position, followed by the SUBSTRING function extracting the substring preceding the comma. Each iteration inserts the extracted substring into the return table and updates the original string to the remaining portion. After the loop completes, the final substring is inserted into the return table.

Utilizing this function is straightforward:

SELECT * FROM dbo.splitstring('91,12,65,78,56,789')

This method's advantage lies in its clear code logic, ease of comprehension, and maintenance. For most application scenarios, its performance adequately meets requirements. However, with extremely long strings, WHILE loop performance may become a bottleneck.

Set-Based Approach Using Number Tables

As an alternative to WHILE loops, the set-based method utilizing number tables offers superior performance. This technique generates number sequences using system tables and implements string splitting through set operations.

CREATE FUNCTION dbo.SplitString
(
  @List     nvarchar(max),
  @Delim    nvarchar(255)
)
RETURNS TABLE
AS
  RETURN ( SELECT [Value] FROM 
  ( 
    SELECT [Value] = LTRIM(RTRIM(SUBSTRING(@List, [Number],
      CHARINDEX(@Delim, @List + @Delim, [Number]) - [Number])))
    FROM (SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
      FROM sys.all_columns) AS x WHERE Number <= LEN(@List)
      AND SUBSTRING(@Delim + @List, [Number], DATALENGTH(@Delim)/2) = @Delim
    ) AS y
  );
GO

This function's core mechanism involves generating number sequences via the ROW_NUMBER() function, calculating each substring's start position and length through combinations of SUBSTRING and CHARINDEX functions. This approach eliminates loop operations, delivering better performance with large datasets.

However, this method has a limitation: the split string length cannot exceed the row count in the sys.all_columns table. In SQL Server 2017's model database, this constraint is approximately 9,980 characters. For longer strings, dedicated number tables must be created.

Recursive CTE Implementation

For environments where system tables cannot be used, recursive CTEs provide another solution:

CREATE FUNCTION dbo.SplitString
(
  @List     nvarchar(max),
  @Delim    nvarchar(255)
)
RETURNS TABLE WITH SCHEMABINDING
AS
   RETURN ( WITH n(n) AS (SELECT 1 UNION ALL SELECT n+1 
       FROM n WHERE n <= LEN(@List))
       SELECT [Value] = SUBSTRING(@List, n, 
       CHARINDEX(@Delim, @List + @Delim, n) - n)
       FROM n WHERE n <= LEN(@List)
      AND SUBSTRING(@Delim + @List, n, DATALENGTH(@Delim)/2) = @Delim
   );
GO

The recursive CTE method generates number sequences recursively, avoiding external table dependencies. However, recursion depth limitations must be considered, as SQL Server defaults to a maximum recursion depth of 100. For strings exceeding 100 characters, OPTION (MAXRECURSION 0) or specific maximum recursion values must be added to external queries.

SQL Server 2016 STRING_SPLIT Function

Starting with SQL Server 2016, Microsoft introduced the built-in STRING_SPLIT function, significantly simplifying string splitting operations:

SELECT * FROM STRING_SPLIT('a,b', ',')

The STRING_SPLIT function usage is extremely simple, requiring only the string to split and the delimiter. The function returns a single-column table containing all split substrings. In SQL Server 2022 and later versions, ordinal output column enabling is supported:

SELECT * FROM STRING_SPLIT('Lorem ipsum dolor sit amet.', ' ', 1)

Using STRING_SPLIT requires a database compatibility level of at least 130. Compatibility level can be checked and modified using:

-- Check compatibility level
SELECT compatibility_level FROM sys.databases WHERE name = 'YourDatabaseName';

-- Modify compatibility level
ALTER DATABASE YourDatabaseName SET COMPATIBILITY_LEVEL = 130;

Performance Comparison and Selection Recommendations

Different methods exhibit significant performance variations:

WHILE Loop Method: Simple, understandable code suitable for small to medium data volumes and infrequent processing scenarios
Number Table Method: Excellent performance, ideal for large datasets but constrained by number table size
Recursive CTE Method: No external dependencies but limited by recursion depth
STRING_SPLIT: Best performance but restricted to SQL Server 2016 and later versions

In practical projects, method selection should consider database version, data volume, performance requirements, and maintenance costs. For SQL Server 2008 R2 users, number table-based implementations are recommended, with WHILE loop methods suitable for lower performance requirements.

Practical Application Scenarios

String splitting functions have extensive applications in database development:

-- Parse product tags
SELECT ProductId, Name, value
FROM Product
CROSS APPLY STRING_SPLIT(Tags, ',');

-- Count tag usage frequency
SELECT value as tag, COUNT(*) AS [number_of_articles]
FROM Product
CROSS APPLY STRING_SPLIT(Tags, ',')
GROUP BY value
HAVING COUNT(*) > 2
ORDER BY COUNT(*) DESC;

These application scenarios demonstrate the crucial role of string splitting in data parsing, statistical analysis, and query optimization.

Conclusion

String splitting constitutes a fundamental and vital functionality in SQL Server development. Within SQL Server 2008 R2 environments, user-defined functions based on WHILE loops provide stable and reliable solutions. With SQL Server version upgrades, number table-based methods and built-in STRING_SPLIT functions offer superior performance options. Developers should select the most appropriate string splitting approach based on specific project requirements and environmental constraints.

Regardless of the chosen method, boundary case handling—such as empty strings, consecutive delimiters, and string length limitations—must be addressed. Proper error handling and performance optimization are essential for ensuring stable string splitting functionality.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.