Technical Research on Splitting Delimiter-Separated Values into Multiple Rows in SQL

Nov 22, 2025 · Programming · 9 views · 7.8

Keywords: SQL splitting | delimiter processing | multiple row conversion | MySQL techniques | data normalization

Abstract: This paper provides an in-depth exploration of techniques for splitting delimiter-separated field values into multiple row records in MySQL databases. By analyzing solutions based on numbers tables and alternative approaches using temporary number sequences, it details the usage techniques of SUBSTRING_INDEX function, optimization strategies for join conditions, and performance considerations. The article systematically explains the practical application value of delimiter splitting in scenarios such as data normalization and ETL processing through concrete code examples.

Introduction

In database design and data processing, there is often a need to split field values containing delimiters into multiple row records. This requirement commonly occurs in scenarios such as data normalization, log analysis, and ETL processing. This paper provides an in-depth exploration of techniques for splitting delimiter-separated values into multiple rows within the MySQL database environment.

Problem Description and Requirements Analysis

Consider the following data table structure:

id | name
1  | a,b,c
2  | b

The expected output result is:

id | name
1  | a
1  | b
1  | c
2  | b

The core challenge of this transformation lies in how to split comma-separated string values into independent row records while maintaining the association with the original records.

Solution Based on Numbers Table

If a predefined numbers table exists in the system, the following efficient solution can be used:

select
  tablename.id,
  SUBSTRING_INDEX(SUBSTRING_INDEX(tablename.name, ',', numbers.n), ',', -1) name
from
  numbers inner join tablename
  on CHAR_LENGTH(tablename.name)
     -CHAR_LENGTH(REPLACE(tablename.name, ',', ''))>=numbers.n-1
order by
  id, n

Technical Principle Analysis

The core technical points of this solution include:

Nested Usage of SUBSTRING_INDEX Function: The outer SUBSTRING_INDEX function extracts the content after the last delimiter by specifying the -1 parameter. The inner function locates the position of the nth delimiter based on the sequence number n from the numbers table.

Join Condition Optimization: By calculating the difference between the original string length and the string length after removing delimiters, the number of delimiters contained in the string is determined. The join condition ensures that the sequence number from the numbers table does not exceed the actual number of delimiters plus one.

Role of Numbers Table: The numbers table provides a continuous sequence of numbers used to generate the number of rows that need to be split. Each sequence number corresponds to a position of a substring that needs to be extracted.

Alternative Solution Using Temporary Number Sequence

In the absence of a predefined numbers table, UNION ALL can be used to create a temporary number sequence:

select
  tablename.id,
  SUBSTRING_INDEX(SUBSTRING_INDEX(tablename.name, ',', numbers.n), ',', -1) name
from
  (select 1 n union all
   select 2 union all select 3 union all
   select 4 union all select 5) numbers INNER JOIN tablename
  on CHAR_LENGTH(tablename.name)
     -CHAR_LENGTH(REPLACE(tablename.name, ',', ''))>=numbers.n-1
order by
  id, n

Solution Comparison and Selection

The two solutions are functionally equivalent but differ in performance and maintainability:

Predefined Numbers Table Solution: Suitable for scenarios requiring frequent splitting operations. The numbers table can be reused, offering better performance.

Temporary Number Sequence Solution: Suitable for occasional splitting needs, avoiding the overhead of creating additional tables, but requires estimating the maximum number of splits.

Extended Applications and Optimization

Referring to relevant technical discussions, delimiter splitting techniques can be extended to more complex scenarios:

Multi-character Delimiter Processing: By adjusting the parameters of the SUBSTRING_INDEX function, delimiters of any length can be processed.

Performance Optimization Strategies: For large-scale data splitting, it is recommended to use pre-computed delimiter position indexes to avoid repeated string length calculations.

Error Handling Mechanisms: In practical applications, boundary cases such as null value handling and delimiter escaping need to be considered.

Practical Application Scenarios

Delimiter splitting technology has important application value in the following scenarios:

Data Normalization: Converting non-first normal form data into forms that comply with relational database standards.

Log Analysis: Processing log records containing multiple event information to facilitate subsequent statistical analysis.

Data Integration: Handling heterogeneous data formats from different data sources during ETL processes.

Conclusion

This paper systematically elaborates on the technical methods for splitting delimiter-separated values into multiple row records in SQL. Through in-depth analysis of two solutions based on numbers tables and temporary number sequences, it provides complete technical implementation details and optimization suggestions. These techniques have broad application prospects in practical data processing work, effectively improving the efficiency and standardization of data processing.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.