Extracting Numbers from Strings in SQL: Implementation Methods

Nov 14, 2025 · Programming · 12 views · 7.8

Keywords: SQL Server | String Processing | Number Extraction | User-Defined Function | PATINDEX Function

Abstract: This technical article provides a comprehensive analysis of various methods for extracting pure numeric values from alphanumeric strings in SQL Server. Focusing on the user-defined function (UDF) approach as the primary solution, the article examines the core implementation using PATINDEX and STUFF functions in iterative loops. Alternative subquery-based methods are compared, and extended scenarios for handling multiple number groups are discussed. Complete code examples, performance analysis, and best practices are included to offer database developers practical string processing solutions.

Problem Background and Requirements Analysis

In real-world database applications, there is often a need to extract pure numeric portions from strings containing both letters and numbers. The user-provided example data illustrates three typical mixed string formats:

string 1: 003Preliminary Examination Plan
string 2: Coordination005
string 3: Balance1000sheet

The expected output should be:

string 1: 003
string 2: 005
string 3: 1000

This requirement is common in data processing, report generation, and data cleaning scenarios, particularly when dealing with unstructured text data.

Core Solution: User-Defined Function

Based on the best answer solution, we create a user-defined function (UDF) to implement number extraction. The core logic involves iteratively identifying and removing all non-numeric characters.

CREATE FUNCTION dbo.udf_GetNumeric
(
  @strAlphaNumeric VARCHAR(256)
)
RETURNS VARCHAR(256)
AS
BEGIN
  DECLARE @intAlpha INT
  SET @intAlpha = PATINDEX('%[^0-9]%', @strAlphaNumeric)
  BEGIN
    WHILE @intAlpha > 0
    BEGIN
      SET @strAlphaNumeric = STUFF(@strAlphaNumeric, @intAlpha, 1, '' )
      SET @intAlpha = PATINDEX('%[^0-9]%', @strAlphaNumeric )
    END
  END
  RETURN ISNULL(@strAlphaNumeric, 0)
END
GO

Function Logic Detailed Explanation

The execution flow of this function can be divided into several key steps:

  1. Initialization: Use PATINDEX function to find the position of the first non-numeric character
  2. Loop Processing: Continuously remove all non-numeric characters through WHILE loop
  3. Character Removal: Use STUFF function to replace non-numeric characters with empty strings
  4. Position Update: Recalculate the position of the next non-numeric character after each removal
  5. Result Return: Return the processed pure numeric string

Function Usage Example

After creating the function, it can be directly called in queries:

SELECT dbo.udf_GetNumeric(column_name) as extracted_numbers
FROM table_name

Alternative Solution Analysis

Besides the UDF approach, there exists an alternative implementation based on subqueries. This method achieves number extraction through nested queries and string function combinations:

SELECT LEFT(subsrt, PATINDEX('%[^0-9]%', subsrt + 't') - 1) as extracted_numbers
FROM (
    SELECT subsrt = SUBSTRING(string, pos, LEN(string))
    FROM (
        SELECT string, pos = PATINDEX('%[0-9]%', string)
        FROM @temp
    ) d
) t

Solution Comparison

<table> <tr><th>Solution</th><th>Advantages</th><th>Disadvantages</th><th>Applicable Scenarios</th></tr> <tr><td>UDF Function</td><td>High reusability, easy maintenance</td><td>Requires creating function object</td><td>Production environments with frequent use</td></tr> <tr><td>Subquery Solution</td><td>No additional objects needed</td><td>Higher code complexity</td><td>One-time queries or temporary use</td></tr>

Extended Scenario: Multiple Number Group Extraction

The reference article mentions a more complex scenario: extracting multiple number groups from a single string. For example, the string "abcd 12345 efghij 678910" contains two number groups: 12345 and 678910.

For such scenarios, specialized string splitting functions combined with CROSS APPLY can be used:

SELECT s.SomeID, ps.Item as extracted_number
FROM #Something s
CROSS APPLY dbo.PatternSplitCM(s.SomeValue, '%[0-9]%') ps
WHERE ps.Matched = 1

Multi-Column Output Implementation

If multiple number groups need to be output to different columns, PIVOT or conditional aggregation can be used:

SELECT SomeID,
       MAX(CASE WHEN ItemNumber = 1 THEN Item END) as Number1,
       MAX(CASE WHEN ItemNumber = 2 THEN Item END) as Number2,
       MAX(CASE WHEN ItemNumber = 3 THEN Item END) as Number3
FROM #Something s
CROSS APPLY dbo.PatternSplitCM(s.SomeValue, '%[0-9]%') ps
WHERE ps.Matched = 1
GROUP BY SomeID

Performance Optimization Recommendations

When processing large volumes of data, the performance of number extraction operations is crucial. Here are several optimization suggestions:

  1. Avoid direct function usage in WHERE clauses: This can cause full table scans and impact query performance
  2. Consider using computed columns: For data that doesn't change frequently, create computed columns to store extraction results
  3. Use more efficient string functions: In some cases, TRY_CONVERT or TRY_CAST may provide better performance
  4. Batch processing: For large datasets, consider using batch processing techniques

Error Handling and Edge Cases

In practical applications, various edge cases need to be considered to ensure function robustness:

Practical Application Case

Assume we have a product information table where product codes mix letters and numbers:

CREATE TABLE Products (
    ProductID int,
    ProductCode varchar(50),
    ProductName varchar(100)
)

INSERT INTO Products VALUES 
(1, 'P003Laptop', 'High-performance Laptop'),
(2, 'MON005', 'Monitor Device'),
(3, 'KB1000Wireless', 'Wireless Keyboard')

Using our created UDF function to extract numeric parts from product codes:

SELECT 
    ProductID,
    ProductCode,
    dbo.udf_GetNumeric(ProductCode) as NumericPart,
    ProductName
FROM Products

The execution results will display the pure numeric portion of each product code, facilitating subsequent data analysis and processing.

Summary and Best Practices

Extracting numbers from strings is a common requirement in SQL data processing. Through the methods introduced in this article, developers can choose appropriate implementation solutions based on specific scenarios:

  1. For simple single-group number extraction, UDF functions provide optimal reusability and maintainability
  2. For temporary requirements, subquery-based solutions are more convenient
  3. For complex multiple-group number extraction, string splitting functions combined with conditional aggregation are needed

In practical applications, it's recommended to comprehensively consider data volume, performance requirements, and maintenance costs to select the most suitable technical solution. Meanwhile, thorough testing and error handling are key factors in ensuring stable operation of the solution.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.