Natural Sorting Algorithm: Correctly Sorting Strings with Numbers in Python

Dec 06, 2025 · Programming · 11 views · 7.8

Keywords: Python | natural sorting | regex

Abstract: This article delves into the method of natural sorting (human sorting) for strings containing numbers in Python. By analyzing the core mechanisms of regex splitting and type conversion, it explains in detail how to achieve sorting by numerical value rather than lexicographical order. Complete code implementations for integers and floats are provided, along with discussions on performance optimization and practical applications.

Problem Background and Challenges

In programming practice, sorting strings that contain numbers is a common yet often overlooked issue. When using Python's built-in sort() method, strings are sorted lexicographically, which can lead to counterintuitive results. For example, given a list of strings ["something1", "something12", "something17", "something2", "something25", "something29"], calling sort() directly yields ["something1", "something12", "something17", "something2", "something25", "something29"], where "something12" comes before "something2" because the character '1' is lexicographically smaller than '2', ignoring that the number 12 is greater than 2.

Core Principles of Natural Sorting

Natural sorting, also known as human sorting, aims to mimic how humans sort mixed text and numbers. The core idea is to split the string into text segments and numeric segments, converting numbers to numerical types for comparison rather than treating them as strings. This ensures numbers are sorted by their numerical value, while text parts are still handled lexicographically.

Key steps in implementing natural sorting include:

  1. Splitting the String: Use regular expressions to split the string into an alternating sequence of text and numbers. For example, "something12" is split into ["something", "12"].
  2. Type Conversion: Convert numeric segments to integers or floats for numerical comparison.
  3. Custom Sorting Key: Utilize the key parameter of Python's sort() method to return a list as the sorting basis.

Implementation for Integer Sorting

Below is an efficient implementation for integers, based on Toothy's optimized version:

import re

def atoi(text):
    return int(text) if text.isdigit() else text

def natural_keys(text):
    return [atoi(c) for c in re.split(r'(\d+)', text)]

alist = [
    "something1",
    "something12",
    "something17",
    "something2",
    "something25",
    "something29"
]
alist.sort(key=natural_keys)
print(alist)  # Output: ['something1', 'something2', 'something12', 'something17', 'something25', 'something29']

Code Analysis:

Extension for Floating-Point Sorting

For strings containing floating-point numbers, adjust the regex to match float characters and modify the type conversion function accordingly:

import re

def atof(text):
    try:
        retval = float(text)
    except ValueError:
        retval = text
    return retval

def natural_keys(text):
    return [atof(c) for c in re.split(r'[+-]?([0-9]+(?:[.][0-9]*)?|[.][0-9]+)', text)]

alist = [
    "something1",
    "something2",
    "something1.0",
    "something1.25",
    "something1.105"
]
alist.sort(key=natural_keys)
print(alist)  # Output: ['something1', 'something1.0', 'something1.105', 'something1.25', 'something2']

Key Improvements:

Performance Analysis and Optimization

The performance of natural sorting primarily depends on the overhead of regex splitting and type conversion. Toothy's implementation significantly improves speed by simplifying logic and using isdigit() checks compared to the original version. In practical applications, for large datasets, consider precompiling regex patterns or using caching mechanisms for further optimization.

For example, precompiling regex patterns:

import re
int_pattern = re.compile(r'(\d+)')
float_pattern = re.compile(r'[+-]?([0-9]+(?:[.][0-9]*)?|[.][0-9]+)')
# Use pattern.split(text) in functions

Application Scenarios and Considerations

Natural sorting is widely used in file naming (e.g., "file1.txt", "file10.txt"), version number sorting (e.g., "v1.2", "v1.10"), and data cleaning scenarios. When implementing, note:

Through this in-depth analysis, readers should master the core techniques of natural sorting and apply them flexibly in real-world projects to improve data processing quality and user experience.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.