Keywords: Oracle | PL/SQL | String_Splitting | Pipelined_Functions | REGEXP_SUBSTR
Abstract: This paper provides an in-depth exploration of various techniques for string splitting in Oracle PL/SQL. It focuses on custom pipelined function implementations, detailing core algorithms and code structures. The study compares alternative methods including REGEXP_SUBSTR regular expressions and APEX utility functions, offering comprehensive technical guidance for different string splitting scenarios through complete code examples and performance analysis.
Technical Background of String Splitting
String splitting represents a fundamental requirement in database development. Particularly in data normalization processes, ETL workflows, and report generation scenarios, there is often a need to separate strings containing multiple tokens based on specific delimiters and process the resulting segments as individual records.
Custom Pipelined Function Implementation
Leveraging Oracle's type system and pipelined function capabilities, we can construct an efficient custom splitting function. First, we define the string collection type:
CREATE OR REPLACE TYPE split_tbl AS TABLE OF VARCHAR2(32767);Next, we implement the core splitting function using classical string processing algorithms:
CREATE OR REPLACE FUNCTION split(
p_list VARCHAR2,
p_del VARCHAR2 := ','
) RETURN split_tbl PIPELINED
IS
l_idx PLS_INTEGER;
l_list VARCHAR2(32767) := p_list;
l_value VARCHAR2(32767);
BEGIN
LOOP
l_idx := INSTR(l_list, p_del);
IF l_idx > 0 THEN
PIPE ROW(SUBSTR(l_list, 1, l_idx - 1));
l_list := SUBSTR(l_list, l_idx + LENGTH(p_del));
ELSE
PIPE ROW(l_list);
EXIT;
END IF;
END LOOP;
RETURN;
END split;Function Usage Examples
The function can be used directly in SQL queries or invoked within other PL/SQL programs:
-- Usage in SQL queries
SELECT * FROM TABLE(split('foo,bar,zoo'));
-- Usage in PL/SQL programs
DECLARE
tokens split_tbl;
BEGIN
tokens := split('apple,banana,orange');
FOR i IN tokens.FIRST..tokens.LAST LOOP
DBMS_OUTPUT.PUT_LINE(tokens(i));
END LOOP;
END;Alternative Implementation Approaches
Beyond custom functions, collection-based implementations offer another viable approach:
CREATE OR REPLACE FUNCTION split2(
list IN VARCHAR2,
delimiter IN VARCHAR2 DEFAULT ','
) RETURN split_tbl AS
splitted split_tbl := split_tbl();
i PLS_INTEGER := 0;
list_ VARCHAR2(32767) := list;
BEGIN
LOOP
i := INSTR(list_, delimiter);
IF i > 0 THEN
splitted.EXTEND(1);
splitted(splitted.LAST) := SUBSTR(list_, 1, i - 1);
list_ := SUBSTR(list_, i + LENGTH(delimiter));
ELSE
splitted.EXTEND(1);
splitted(splitted.LAST) := list_;
RETURN splitted;
END IF;
END LOOP;
END;Regular Expression Methodology
For Oracle 10g and later versions, string splitting can be achieved using REGEXP_SUBSTR combined with hierarchical queries:
SELECT REGEXP_SUBSTR('SMITH,ALLEN,WARD,JONES', '[^,]+', 1, LEVEL) AS token
FROM DUAL
CONNECT BY REGEXP_SUBSTR('SMITH,ALLEN,WARD,JONES', '[^,]+', 1, LEVEL) IS NOT NULL;APEX Utility Package Solutions
Within APEX environments, built-in string processing functions provide convenient alternatives:
-- Newer APEX versions
SELECT * FROM TABLE(APEX_STRING.SPLIT('value1,value2,value3', ','));
-- Legacy APEX versions (deprecated)
APEX_UTIL.STRING_TO_TABLE('delimited,string');Performance Analysis and Selection Guidelines
Custom pipelined functions demonstrate superior performance when handling large datasets, especially in scenarios requiring repeated usage. While regular expression methods offer syntactic simplicity, they may encounter performance limitations with lengthy strings. APEX solutions provide optimal integration within APEX application environments.
Practical Application Scenarios
Data migration projects frequently require processing CSV format data imports:
DECLARE
csv_data VARCHAR2(4000) := 'John,Doe,30,Engineer';
tokens split_tbl;
BEGIN
tokens := split(csv_data);
INSERT INTO employees (first_name, last_name, age, position)
VALUES (tokens(1), tokens(2), TO_NUMBER(tokens(3)), tokens(4));
END;Appropriate selection of string splitting methodologies significantly enhances data processing efficiency and code maintainability.