Keywords: Python | natural sort | string sorting | natsort | regular expressions
Abstract: This article explores the implementation of natural sorting for strings in Python. It begins by introducing the concept of natural sorting and the limitations of the built-in sorted() function. It then details the use of the natsort library for robust natural sorting, along with custom solutions based on regular expressions. Advanced features such as case-insensitive sorting and the os_sorted function are discussed. The article explains core concepts in an accessible way, using code examples to illustrate points, and recommends the natsort library for handling complex cases.
Introduction
Natural sorting is a method that treats numeric parts in strings as numbers rather than lexicographically. In Python, the built-in sorted() function sorts strings lexicographically, leading to unnatural orderings, such as placing "Elm11" before "Elm2".
Using the natsort Library
The natsort library provides a robust solution for natural sorting. Key functions include natsorted() and natsort_keygen().
>>> from natsort import natsorted, ns
>>> x = ['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
>>> natsorted(x, key=lambda y: y.lower())
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']For case-insensitive sorting, use the alg parameter.
>>> natsorted(x, alg=ns.IGNORECASE)
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']Custom Implementations
If you prefer a custom solution, you can use regular expressions to split strings and convert numeric parts.
import re
def natural_sort_key(s):
return [int(text) if text.isdigit() else text.lower()
for text in re.split('([0-9]+)', s)]
sorted_list = sorted(original_list, key=natural_sort_key)Advanced Features
Since version 7.1.0, natsort includes os_sorted for sorting like the operating system's file browser.
>>> from natsort import os_sorted
>>> os_sorted(list_of_paths)Conclusion
For most use cases, the natsort library is recommended due to its generality and ease of use. Custom functions are suitable for simple cases but may not handle all edge conditions.