Keywords: Python packaging | non-code files | setup.py | package_data | MANIFEST.in
Abstract: This article provides a comprehensive exploration of two primary methods for effectively integrating non-code files (such as license files, configuration files, etc.) in Python project packaging: using the package_data parameter in setuptools and creating a MANIFEST.in file. It details the applicable scenarios, configuration specifics, and practical examples for each approach, helping developers choose the most suitable file inclusion strategy based on project requirements. Through comparative analysis, the article also reveals the different behaviors of these methods in source distribution and installation processes, offering thorough technical guidance for Python packaging.
Core Challenges in Including Non-Code Files in Python Packaging
In Python project development, in addition to source code files (with .py extensions), it is often necessary to include various non-code files, such as license files (license.txt), configuration files, data files, and documentation. These files are crucial for the completeness and functionality of the project, but when using standard distutils or setuptools for packaging, only Python module files are included by default, causing non-code files to be ignored during distribution and installation.
Using the package_data Parameter in setuptools
The most direct and powerful method is to specify non-code files for inclusion via the package_data parameter in setuptools. setuptools is an enhanced version of distutils, offering more flexible file management capabilities. Here is a complete configuration example:
from setuptools import setup, find_packages
setup(
name='your_project_name',
version='0.1',
description='A description.',
packages=find_packages(exclude=['ez_setup', 'tests', 'tests.*']),
package_data={'': ['license.txt']},
include_package_data=True,
install_requires=[],
)
In this configuration, the package_data parameter is a dictionary where keys are package names (an empty string denotes all packages) and values are lists of file patterns. For example, {'': ['license.txt']} indicates that a file named license.txt should be included in all packages. File patterns support wildcards, such as *.txt or path/to/resources/*.txt, allowing flexible matching of multiple files.
The key configuration item include_package_data=True ensures that files specified in package_data are correctly included when building distribution packages (e.g., sdist or bdist). This method is particularly suitable for files located within the package directory structure, ensuring that files maintain their original path structure after installation.
Alternative Approach Using MANIFEST.in Files
For more complex file inclusion needs, especially when files are located outside the package (e.g., in the project root directory), using a MANIFEST.in file is another effective method. MANIFEST.in is a template file used to explicitly specify which files should be included in the source distribution (sdist). Its basic syntax includes directives like include and recursive-include. For example:
include requirements.txt
recursive-include data *
This example includes the requirements.txt file from the root directory and recursively includes all contents of the data directory. MANIFEST.in provides finer-grained control, making it suitable for managing large numbers of non-standard files or directories.
It is important to note that using MANIFEST.in alone typically only ensures files are included in the source distribution package. To copy these files to the package directory within site-packages during installation, it is still necessary to set include_package_data=True in setup(). This combined approach addresses both distribution and installation requirements.
Method Comparison and Best Practices
package_data and MANIFEST.in each have advantages and are suited to different scenarios:
- package_data: More suitable for files located inside packages, with simple and direct configuration that automatically handles file paths. It is a native feature of
setuptoolsand integrates well with package discovery (e.g.,find_packages). - MANIFEST.in: Offers more flexible file selection mechanisms, supporting wildcards and recursive inclusion, particularly ideal for managing files outside packages or complex directory structures. However, it requires maintaining an additional configuration file.
In practical projects, it is recommended to choose the appropriate method based on file location and project structure. For most standard projects, using package_data is the preferred choice; for projects requiring inclusion of numerous external resources or documentation, combining MANIFEST.in may be more effective. Avoid circumventing limitations by renaming file extensions (e.g., changing .txt to .py), as this can break file semantics and potentially cause runtime errors.
Practical Examples and Considerations
Assume a project structure as follows:
project/
├── mypackage/
│ ├── __init__.py
│ ├── module.py
│ └── data/
│ └── config.json
├── LICENSE.txt
└── README.md
To include LICENSE.txt and data/config.json, the following setup.py configuration can be used:
from setuptools import setup, find_packages
setup(
name='project',
packages=find_packages(),
package_data={
'': ['LICENSE.txt'],
'mypackage': ['data/*.json']
},
include_package_data=True,
)
Alternatively, using MANIFEST.in:
include LICENSE.txt
recursive-include mypackage/data *.json
When testing packaging, it is essential to verify that files are correctly included. Use python setup.py sdist to create a source distribution package and inspect the contents of the generated .tar.gz file. Installation testing can be performed via pip install . in a virtual environment to confirm files appear in the appropriate locations within site-packages.
Conclusion and Extended Resources
Effectively managing non-code files is a critical aspect of Python project packaging. By appropriately using package_data and MANIFEST.in, developers can ensure the completeness and portability of project distribution packages. For more advanced needs, such as dynamic file inclusion or custom build processes, refer to setuptools extension mechanisms and community best practices. Staying updated with developments in the Python packaging ecosystem (e.g., modern tools like flit and poetry) also helps optimize project maintenance experiences.