Extracting Numbers from Strings in C: Implementation and Optimization Based on strtol Function

Dec 08, 2025 · Programming · 8 views · 7.8

Keywords: C Programming | String Processing | Number Extraction | strtol Function | sscanf Function

Abstract: This paper comprehensively explores multiple methods for extracting numbers from strings in C, with a focus on the efficient implementation mechanism of the strtol function. By comparing strtol and sscanf approaches, it details the core principles of number detection, conversion, and error handling, providing complete code examples and performance optimization suggestions. The article also discusses practical issues such as handling negative numbers, boundary conditions, and memory safety, offering thorough technical reference for C developers.

Technical Background and Challenges of Number Extraction

In C programming practice, extracting numbers from strings containing mixed characters is a common yet challenging task. Taking the string "ab234cid*(s349*(20kd" as an example, we need to extract the number sequences 234, 349, and 20. The complexity of this problem lies in: numbers may appear at any position in the string, numbers may have variable lengths, strings may contain various non-numeric characters, and special cases like signs need to be considered.

Core Solution Based on strtol

The strtol (string to long) function is specifically designed in the C standard library to convert strings to long integer values, with its prototype defined in the <stdlib.h> header. The advantage of this function is its ability to intelligently handle numeric portions in strings and automatically skip leading whitespace characters.

Below is the complete implementation code based on strtol:

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int main() {
    char *str = "ab234cid*(s349*(20kd";
    char *p = str;
    
    while (*p) {
        // Detect digits or signed numbers
        if (isdigit(*p) || ((*p == '-' || *p == '+') && isdigit(*(p + 1)))) {
            char *endptr;
            long val = strtol(p, &endptr, 10);
            
            // Verify if conversion succeeded
            if (p != endptr) {
                printf("Extracted number: %ld\n", val);
                p = endptr;  // Move to position after the number
            }
        } else {
            p++;  // Move to next character
        }
    }
    
    return 0;
}

In-depth Analysis of Implementation Mechanism

The core logic of the above code can be divided into three key steps:

  1. Number Detection Phase: Use the isdigit() function to detect if the current character is a digit (0-9). To handle signed numbers, the code also checks if the current character is '-' or '+' and the next character is a digit. This design ensures that both negative numbers (e.g., "-123") and positive numbers (e.g., "+456") are correctly identified.
  2. Number Conversion Phase: When a number starting position is detected, call strtol(p, &endptr, 10). The parameters here mean: p points to the string position where the number begins, endptr receives the address of the next character after conversion, and 10 indicates decimal conversion. The function automatically parses consecutive digit characters until a non-digit character is encountered.
  3. Pointer Update Phase: After successful conversion, comparing p and endptr confirms whether a number was converted. If they differ, it indicates successful number extraction, and p is updated to endptr, directly skipping the processed number portion to avoid repeated scanning.

Error Handling and Boundary Conditions

In practical applications, various boundary conditions and error handling need to be considered:

Alternative Approach: Implementation Based on sscanf

In addition to the strtol approach, the sscanf function with scan sets can also be used for number extraction. This method uses %*[^0123456789] in the format string to skip all non-digit characters:

#include <stdio.h>

int main() {
    const char* s = "ab234cid*(s349*(20kd";
    int total_n = 0;
    int n;
    int value;
    
    while (1 == sscanf(s + total_n, "%*[^0123456789]%d%n", &value, &n)) {
        total_n += n;
        printf("Extracted number: %d\n", value);
    }
    
    return 0;
}

The principle of this method is: %*[^0123456789] matches and discards all non-digit characters, %d reads the integer, and %n records the number of characters read this time. By accumulating total_n, the processed string position is tracked.

Solution Comparison and Selection Recommendations

Both approaches have their advantages and disadvantages:

<table> <tr> <th>Comparison Dimension</th> <th>strtol Approach</th> <th>sscanf Approach</th> </tr> <tr> <td>Performance</td> <td>Higher, direct character pointer manipulation</td> <td>Lower, involves format parsing overhead</td> </tr> <tr> <td>Flexibility</td> <td>High, fine-grained control over conversion process</td> <td>Medium, depends on format strings</td> </tr> <tr> <td>Error Handling</td> <td>Comprehensive, provides overflow detection</td> <td>Limited, less error information</td> </tr> <tr> <td>Code Readability</td> <td>Medium, requires understanding pointer operations</td> <td>High, format strings are intuitive</td> </tr>

For most application scenarios, the strtol approach is recommended as it offers better performance and more comprehensive error handling mechanisms. The sscanf approach should only be considered for rapid prototyping or handling simple formats.

Extended Applications and Optimization Suggestions

Based on the core number extraction logic, applications can be further extended:

  1. Floating-point Number Extraction: Use the strtod function instead of strtol to extract floating-point numbers. The number detection logic needs adjustment to recognize decimal points and fractional parts.
  2. Batch Processing Optimization: When processing large volumes of strings, consider using a state machine to optimize the scanning process and reduce function call overhead.
  3. Multi-base Support: The third parameter of strtol can specify the base (2-36), making it possible to extract numbers in different bases such as binary, octal, and hexadecimal.

By deeply understanding the working principles of the strtol function and flexibly applying string processing techniques, developers can efficiently solve various number extraction problems, enhancing the robustness and performance of C programs.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.