Multiple Approaches to String Splitting in Oracle PL/SQL

Keywords: Oracle | PL/SQL | String_Splitting | Pipelined_Functions | REGEXP_SUBSTR

Abstract: This paper provides an in-depth exploration of various techniques for string splitting in Oracle PL/SQL. It focuses on custom pipelined function implementations, detailing core algorithms and code structures. The study compares alternative methods including REGEXP_SUBSTR regular expressions and APEX utility functions, offering comprehensive technical guidance for different string splitting scenarios through complete code examples and performance analysis.

Technical Background of String Splitting

String splitting represents a fundamental requirement in database development. Particularly in data normalization processes, ETL workflows, and report generation scenarios, there is often a need to separate strings containing multiple tokens based on specific delimiters and process the resulting segments as individual records.

Custom Pipelined Function Implementation

Leveraging Oracle's type system and pipelined function capabilities, we can construct an efficient custom splitting function. First, we define the string collection type:

CREATE OR REPLACE TYPE split_tbl AS TABLE OF VARCHAR2(32767);

Next, we implement the core splitting function using classical string processing algorithms:

CREATE OR REPLACE FUNCTION split(
    p_list VARCHAR2,
    p_del VARCHAR2 := ','
) RETURN split_tbl PIPELINED
IS
    l_idx PLS_INTEGER;
    l_list VARCHAR2(32767) := p_list;
    l_value VARCHAR2(32767);
BEGIN
    LOOP
        l_idx := INSTR(l_list, p_del);
        IF l_idx > 0 THEN
            PIPE ROW(SUBSTR(l_list, 1, l_idx - 1));
            l_list := SUBSTR(l_list, l_idx + LENGTH(p_del));
        ELSE
            PIPE ROW(l_list);
            EXIT;
        END IF;
    END LOOP;
    RETURN;
END split;

Function Usage Examples

The function can be used directly in SQL queries or invoked within other PL/SQL programs:

-- Usage in SQL queries
SELECT * FROM TABLE(split('foo,bar,zoo'));

-- Usage in PL/SQL programs
DECLARE
    tokens split_tbl;
BEGIN
    tokens := split('apple,banana,orange');
    FOR i IN tokens.FIRST..tokens.LAST LOOP
        DBMS_OUTPUT.PUT_LINE(tokens(i));
    END LOOP;
END;

Alternative Implementation Approaches

Beyond custom functions, collection-based implementations offer another viable approach:

CREATE OR REPLACE FUNCTION split2(
    list IN VARCHAR2,
    delimiter IN VARCHAR2 DEFAULT ','
) RETURN split_tbl AS
    splitted split_tbl := split_tbl();
    i PLS_INTEGER := 0;
    list_ VARCHAR2(32767) := list;
BEGIN
    LOOP
        i := INSTR(list_, delimiter);
        IF i > 0 THEN
            splitted.EXTEND(1);
            splitted(splitted.LAST) := SUBSTR(list_, 1, i - 1);
            list_ := SUBSTR(list_, i + LENGTH(delimiter));
        ELSE
            splitted.EXTEND(1);
            splitted(splitted.LAST) := list_;
            RETURN splitted;
        END IF;
    END LOOP;
END;

Regular Expression Methodology

For Oracle 10g and later versions, string splitting can be achieved using REGEXP_SUBSTR combined with hierarchical queries:

SELECT REGEXP_SUBSTR('SMITH,ALLEN,WARD,JONES', '[^,]+', 1, LEVEL) AS token
FROM DUAL
CONNECT BY REGEXP_SUBSTR('SMITH,ALLEN,WARD,JONES', '[^,]+', 1, LEVEL) IS NOT NULL;

APEX Utility Package Solutions

Within APEX environments, built-in string processing functions provide convenient alternatives:

-- Newer APEX versions
SELECT * FROM TABLE(APEX_STRING.SPLIT('value1,value2,value3', ','));

-- Legacy APEX versions (deprecated)
APEX_UTIL.STRING_TO_TABLE('delimited,string');

Performance Analysis and Selection Guidelines

Custom pipelined functions demonstrate superior performance when handling large datasets, especially in scenarios requiring repeated usage. While regular expression methods offer syntactic simplicity, they may encounter performance limitations with lengthy strings. APEX solutions provide optimal integration within APEX application environments.

Practical Application Scenarios

Data migration projects frequently require processing CSV format data imports:

DECLARE
    csv_data VARCHAR2(4000) := 'John,Doe,30,Engineer';
    tokens split_tbl;
BEGIN
    tokens := split(csv_data);
    INSERT INTO employees (first_name, last_name, age, position)
    VALUES (tokens(1), tokens(2), TO_NUMBER(tokens(3)), tokens(4));
END;

Appropriate selection of string splitting methodologies significantly enhances data processing efficiency and code maintainability.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.