A Comprehensive Guide to Recursively Retrieving All Files in a Directory Using MATLAB

Dec 04, 2025 · Programming · 19 views · 7.8

Keywords: MATLAB | recursive file retrieval | directory traversal | dir function | getAllFiles | dirPlus

Abstract: This article provides an in-depth exploration of methods for recursively obtaining all files under a specific directory in MATLAB. It begins by introducing the basic usage of MATLAB's built-in dir function and its enhanced recursive search capability introduced in R2016b, where the **/*.m pattern conveniently retrieves all .m files across subdirectories. The paper then details the implementation principles of a custom recursive function getAllFiles, which collects all file paths by traversing directory structures, distinguishing files from folders, excluding special directories (. and ..), and recursively calling itself. The article also discusses advanced features of third-party tools like dirPlus.m, including regular expression filtering and custom validation functions, offering solutions for complex file screening needs. Finally, practical code examples demonstrate how to apply these methods in batch file processing scenarios, helping readers choose the most suitable implementation based on specific requirements.

Core Methods for Recursive File Retrieval in MATLAB

When performing file system operations in MATLAB, there is often a need to obtain lists of files from specific directories and all their subdirectories. This requirement is particularly common in scenarios such as batch data processing and automated script writing. This article systematically introduces three main implementation approaches: direct use of MATLAB's built-in functions, writing custom recursive functions, and applying third-party enhanced tools.

Recursive Search Capability of the Built-in dir Function

Since MATLAB R2016b, the dir function has included recursive search capabilities. By using the ** wildcard in search patterns, matching files from all subdirectories can be retrieved at once. For example, to obtain all .m files in the current folder and all its subfolders, simply execute:

dirData = dir('**/*.m');

This method is concise and efficient, especially suitable for scenarios requiring only file extension-based filtering. The returned dirData is a structure array containing fields such as filename, folder, date, bytes, and isdir. It is important to note that the ** wildcard can only be used at the beginning of the path or before the filename, not for recursive matching of intermediate directories.

Implementation of the Custom Recursive Function getAllFiles

For more complex file filtering requirements, or to implement recursive search in older MATLAB versions, custom functions are necessary. Below is a complete implementation of the getAllFiles function:

function fileList = getAllFiles(dirName)
  dirData = dir(dirName);
  dirIndex = [dirData.isdir];
  fileList = {dirData(~dirIndex).name}';
  if ~isempty(fileList)
    fileList = cellfun(@(x) fullfile(dirName,x), fileList, 'UniformOutput', false);
  end
  subDirs = {dirData(dirIndex).name};
  validIndex = ~ismember(subDirs,{'.','..'});
  for iDir = find(validIndex)
    nextDir = fullfile(dirName,subDirs{iDir});
    fileList = [fileList; getAllFiles(nextDir)];
  end
end

The execution flow of this function can be divided into four key steps: first, obtaining current directory information and distinguishing files from folders; then adding full paths to the file list; next filtering valid subdirectories (excluding . and ..); and finally recursively calling itself for each subdirectory. This depth-first traversal algorithm ensures access to all terminal nodes of the directory tree.

Advanced File Filtering and the dirPlus Tool

When filtering based on file attributes (such as size or modification time) or content is required, third-party tools like dirPlus.m provide more powerful capabilities. This tool is available on both MathWorks File Exchange and GitHub, with main enhanced features including:

For example, to obtain all .mat files larger than 1MB in the D:\dic directory, one could write a validation function checking file size combined with the regular expression .*\.mat$ for filtering. This flexibility makes it an ideal choice for complex file management tasks.

Practical Applications and Performance Considerations

In practical applications, the choice of method depends on specific requirements:

  1. Simple extension-based filtering: use built-in dir('**/*.ext')
  2. Cross-version compatibility or complex traversal: use the getAllFiles custom function
  3. Advanced filtering needs: adopt third-party tools like dirPlus

Regarding performance, for directory trees containing large numbers of files, recursive algorithms may incur significant memory overhead. Optimization can be achieved through: using loops instead of partial recursion, delaying path concatenation, and processing files in batches. Particularly when handling tens of thousands of files, proper algorithm design can significantly improve execution efficiency.

Code Examples and Best Practices

The following example demonstrates how to combine file retrieval with subsequent processing:

% Get all files under D:\dic
fileList = getAllFiles('D:\dic');

% Process each file iteratively
for i = 1:length(fileList)
    [~, name, ext] = fileparts(fileList{i});
    if strcmp(ext, '.txt')
        processTextFile(fileList{i});
    elseif strcmp(ext, '.csv')
        processCSVFile(fileList{i});
    end
end

Recommended best practices include: always using the fullfile function to build cross-platform compatible paths, preallocating arrays before loops to improve performance, and adding exception handling mechanisms for directories with restricted access. For production environment code, consider incorporating progress indicators and logging functionality.

Conclusion and Extensions

MATLAB offers multi-level solutions for recursive file retrieval, ranging from simple built-in features to fully custom implementations. Understanding the principles and applicable scenarios of these methods enables developers to choose optimal solutions based on project requirements. As MATLAB versions continue to evolve, file operation APIs may become further enriched, but the core algorithmic concepts of recursive traversal will maintain their value. For particularly complex file system operations, combining Java or .NET interfaces for lower-level control can also be considered.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.