Keywords: Python | Linux | FileSystem | DiskSpace | VFS
Abstract: This article explores methods to determine the file system partition containing a given file or directory in Linux using Python and retrieve usage statistics such as total size and free space. Focusing on the `df` command as the primary solution, it also covers the `os.statvfs` system call and the `shutil.disk_usage` function for Python 3.3+, with code examples and in-depth analysis of their pros and cons.
Introduction
In Linux environments, managing files and directories often requires knowledge of the underlying file system partition and its resource usage, such as total capacity and available space. This is crucial for automation scripts and system monitoring. This article details several Python-based approaches to achieve this efficiently.
Methodology Overview
Three main methods are discussed: using the external `df` command for comprehensive information including partition name, mount point, and statistics; the `os.statvfs` system call for direct statistics without partition details; and the `shutil.disk_usage` function available in Python 3.3+ for a simplified interface. Each method has its use cases and limitations.
Using the `df` Command for a Complete Solution
As the recommended solution, the `df` command outputs detailed file system information. By invoking it via Python's `subprocess` module, one can parse the output to extract device name, mount point, and usage stats. Below is an enhanced code example that handles potential variations in output format.
import subprocess
def get_filesystem_info(filename):
"""
Retrieve partition information and statistics for the file system containing the given file.
"""
try:
# Invoke df command
process = subprocess.Popen(['df', filename], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, error = process.communicate()
if process.returncode != 0:
raise RuntimeError(f"df command failed: {error.decode('utf-8', errors='ignore')}")
# Parse output
lines = output.decode('utf-8').strip().split('\n')
if len(lines) < 2:
raise ValueError("Unexpected df output format, cannot parse.")
data_line = lines[1] # Skip header
parts = data_line.split()
if len(parts) >= 6:
device = parts[0]
# df outputs in 1K blocks by default
total_blocks = int(parts[1])
used_blocks = int(parts[2])
available_blocks = int(parts[3])
use_percentage = parts[4]
mount_point = parts[5]
block_size = 1024 # 1K blocks
total_bytes = total_blocks * block_size
used_bytes = used_blocks * block_size
available_bytes = available_blocks * block_size
return {
'device': device,
'mount_point': mount_point,
'total_bytes': total_bytes,
'used_bytes': used_bytes,
'available_bytes': available_bytes,
'use_percentage': use_percentage
}
else:
raise ValueError("Insufficient data in df output for full parsing.")
except Exception as e:
print(f"Error: {e}")
return None
# Example usage
if __name__ == "__main__":
info = get_filesystem_info("/home/foo/bar/baz")
if info:
print(f"Device: {info['device']}, Mount Point: {info['mount_point']}, Total Size: {info['total_bytes']} bytes, Available Space: {info['available_bytes']} bytes")
This code calls `df` with the filename and parses the output. Note that `df` output may vary across Linux distributions, so robust parsing (e.g., using regex or checking headers) might be needed in practice. The code assumes a block size of 1K, which is `df`'s default, but options like `-B` can specify other sizes.
Using `os.statvfs` for Usage Statistics
If only usage statistics are needed without partition details, the `os.statvfs` system call provides a direct Python interface. Below is a code example to compute total and available space.
import os
def get_statvfs_info(path):
"""
Use os.statvfs to retrieve file system statistics.
"""
try:
stat = os.statvfs(path)
frsize = stat.f_frsize
total_blocks = stat.f_blocks
free_blocks = stat.f_bfree
available_blocks = stat.f_bavail
total_bytes = frsize * total_blocks
free_bytes = frsize * free_blocks
available_bytes = frsize * available_blocks
return {
'total_bytes': total_bytes,
'free_bytes': free_bytes,
'available_bytes': available_bytes,
'block_size': frsize
}
except OSError as e:
print(f"Cannot access path {path}: {e}")
return None
# Example usage
if __name__ == "__main__":
info = get_statvfs_info("/home/foo/bar/baz")
if info:
print(f"Total Size: {info['total_bytes']} bytes, Actual Free: {info['free_bytes']} bytes, User Available: {info['available_bytes']} bytes")
This method works on all systems supporting `statvfs`, but it doesn't provide the partition device name. It relies on the file system block size `f_frsize`, which is typically fixed but may vary by file system type.
Python 3.3+ `shutil.disk_usage` Function
For Python 3.3 and later, the standard library offers `shutil.disk_usage`, a higher-level wrapper for usage statistics. Here's an example.
import shutil
def get_disk_usage_info(path):
"""
Use shutil.disk_usage to get disk usage information.
"""
try:
total, used, free = shutil.disk_usage(path)
return {
'total_bytes': total,
'used_bytes': used,
'free_bytes': free
}
except OSError as e:
print(f"Failed to get disk usage: {e}")
return None
# Example usage
if __name__ == "__main__":
info = get_disk_usage_info("/home/foo/bar/baz")
if info:
print(f"Total Size: {info['total_bytes']} bytes, Used: {info['used_bytes']} bytes, Free: {info['free_bytes']} bytes")
`shutil.disk_usage` internally may use `os.statvfs`, but it provides a cleaner interface with byte values. However, it also lacks partition device information and is limited to Python 3.3+ environments.
Discussion and Comparison
Each method has trade-offs. Using `df` gives complete device and mount point info but depends on external commands and may require robust parsing due to output variability. `os.statvfs` is pure Python with better cross-platform compatibility but misses partition identification. `shutil.disk_usage` is simplest for Python 3.3+ but is also limited to statistics.
In practice, choose based on needs: use `df` for comprehensive data; `os.statvfs` or `shutil.disk_usage` for statistics in controlled environments. Performance-wise, `os.statvfs` is often most efficient due to direct system calls, avoiding subprocess overhead.
Conclusion
This article outlines multiple Python-based approaches to retrieve file system partition and usage statistics in Linux. The `df` command offers a complete solution for device-aware scenarios, while `os.statvfs` and `shutil.disk_usage` cater to simplified statistical needs. Select the appropriate method based on application requirements, considering portability and robustness.