Efficient Methods for Retrieving Immediate Subdirectories in Python: A Comprehensive Performance Analysis

Nov 25, 2025 · Programming · 8 views · 7.8

Keywords: Python | Directory_Traversal | Performance_Optimization | File_System | os.scandir

Abstract: This paper provides an in-depth exploration of various methods for obtaining immediate subdirectories in Python, with a focus on performance comparisons among os.scandir(), os.listdir(), os.walk(), glob, and pathlib. Through detailed benchmarking data, it demonstrates the significant efficiency advantages of os.scandir() while discussing the appropriate use cases and considerations for each approach. The article includes complete code examples and practical recommendations to help developers select the most suitable directory traversal solution.

Introduction

Retrieving immediate subdirectories from a specified directory is a common requirement in file system operations. Whether for batch file processing, directory structure analysis, or automated script writing, efficient access to subdirectory lists is essential. Python, as a powerful programming language, offers multiple approaches to achieve this functionality, with significant differences in performance and applicability among these methods.

Core Method Comparison

The Python standard library provides various methods for obtaining subdirectories, each with distinct characteristics and suitable scenarios. Below we analyze several primary approaches in detail.

The os.scandir() Method

os.scandir(), introduced in Python 3.5, is an efficient directory traversal method that returns a generator object, significantly improving directory scanning performance. The basic usage is as follows:

import os

path = "/example/path"
list_subfolders_with_paths = [f.path for f in os.scandir(path) if f.is_dir()]

The main advantages of this method include:

os.listdir() with Filtering

The traditional os.listdir() method combined with directory checking can also achieve subdirectory retrieval:

import os

def get_immediate_subdirectories(a_dir):
    return [name for name in os.listdir(a_dir)
            if os.path.isdir(os.path.join(a_dir, name))]

While this approach is intuitive and easy to understand, it suffers from relatively lower performance due to additional system calls required to determine whether each file is a directory.

The os.walk() Method

os.walk() is typically used for recursive directory tree traversal but can be adapted for immediate subdirectory retrieval by limiting traversal depth:

import os

list_subfolders_with_paths = []
for root, dirs, files in os.walk(path):
    for dir in dirs:
        list_subfolders_with_paths.append(os.path.join(root, dir))
    break

Although powerful, this method is overly heavyweight for scenarios requiring only immediate subdirectories.

The glob Module Approach

The glob module provides functionality based on Unix-style path pattern matching:

from glob import glob

paths = glob(path + '/*/')

This method features concise syntax but requires attention to the fact that returned paths include trailing slashes.

The pathlib Module Approach

Introduced in Python 3.4, pathlib offers object-oriented path operations:

from pathlib import Path

p = Path(path)
list_subfolders_with_paths = [x for x in p.iterdir() if x.is_dir()]

This approach provides excellent code readability but exhibits relatively lower performance.

Performance Testing and Analysis

To objectively compare the performance of various methods, we conducted detailed benchmark tests. The testing environment consisted of Windows 7 x64 system with Python 3.8.1, using a test directory containing 440 subdirectories.

Testing Code

import os
import pathlib
import timeit
import glob

path = r"<example_path>"

def test_scandir():
    list_subfolders_with_paths = [f.path for f in os.scandir(path) if f.is_dir()]

def test_listdir():
    list_subfolders_with_paths = [os.path.join(path, f) for f in os.listdir(path) if os.path.isdir(os.path.join(path, f))]

def test_walk():
    list_subfolders_with_paths = []
    for root, dirs, files in os.walk(path):
        for dir in dirs:
            list_subfolders_with_paths.append(os.path.join(root, dir))
        break

def test_glob():
    list_subfolders_with_paths = glob.glob(path + '/*/')

def test_listdir_filter():
    list_subfolders_with_paths = list(filter(os.path.isdir, [os.path.join(path, f) for f in os.listdir(path)]))

def test_pathlib():
    p = pathlib.Path(path)
    list_subfolders_with_paths = [x for x in p.iterdir() if x.is_dir()]

print(f"Scandir:          {timeit.timeit(test_scandir, number=1000):.3f}")
print(f"Listdir:          {timeit.timeit(test_listdir, number=1000):.3f}")
print(f"Walk:             {timeit.timeit(test_walk, number=1000):.3f}")
print(f"Glob:             {timeit.timeit(test_glob, number=1000):.3f}")
print(f"Listdir (filter): {timeit.timeit(test_listdir_filter, number=1000):.3f}")
print(f"Pathlib:          {timeit.timeit(test_pathlib, number=1000):.3f}")

Performance Results

The test results reveal significant performance differences among the methods:

The data clearly demonstrates that the os.scandir() method outperforms other approaches by factors ranging from 3 to 37, showing substantial performance advantages.

Technical Principle Analysis

Performance Advantages of os.scandir()

The superior performance of os.scandir() stems from several technical principles:

Performance Bottlenecks in Other Methods

In contrast, other methods exhibit clear performance bottlenecks:

Practical Application Recommendations

Selection Criteria

When choosing a method for subdirectory retrieval, consider the following factors:

Best Practices

Based on performance test results and technical analysis, we recommend the following best practices:

  1. Always prioritize os.scandir() in Python 3.5 and above
  2. Use f.name instead of f.path when directory names rather than full paths are needed
  3. For large-scale directory traversal, consider using generator expressions to reduce memory usage
  4. In scenarios requiring natural sorting, employ specialized sorting functions

Comparison with Other Systems

Examining similar functionality in Unix systems reveals interesting parallels. Common Unix commands include:

ls -d */
find /var/foo -maxdepth 1 -mindepth 1 -type d

These commands also exhibit performance differences, with find typically faster than ls, showing similarities to the Python performance test results.

Conclusion

Through comprehensive performance testing and technical analysis, we can draw a clear conclusion: os.scandir() is the optimal choice for retrieving immediate subdirectories in Python. It not only significantly outperforms other methods but also provides rich file information interfaces. While other methods retain value for backward compatibility or specific functional requirements, os.scandir() should be the preferred solution for most scenarios.

As the Python ecosystem continues to evolve, we anticipate the emergence of more optimized file system operation interfaces. However, for the present, mastering and appropriately utilizing the efficient os.scandir() tool will substantially enhance the performance of file processing-related applications.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.