Configuring the Default Cache Directory in Hugging Face Transformers: Methods and Best Practices

Dec 07, 2025 · Programming · 9 views · 7.8

Keywords: Hugging Face Transformers | cache directory | HF_HOME

Abstract: This article provides a comprehensive guide on configuring the default cache directory in Hugging Face Transformers. It primarily focuses on using the environment variable HF_HOME or directly specifying the cache_dir parameter in code, replacing the deprecated TRANSFORMERS_CACHE. The analysis further explores the priority rules for cache directories and their impact on other Hugging Face libraries, supported by practical code examples and system-level configuration recommendations.

1. Problem Background and Solution Overview

When using the Hugging Face Transformers library, the default cache directory is typically located in the user's home directory, such as ~/.cache/huggingface/hub. When this directory lacks sufficient disk capacity, it becomes necessary to modify the default configuration. There are two main solutions: setting a global default via environment variables, or specifying the cache directory directly in code using the cache_dir parameter. Among these, the HF_HOME environment variable is the current best practice, as it not only affects Transformers but also applies to other Hugging Face libraries like datasets, thereby providing more comprehensive cache management.

2. Using Environment Variables to Configure Cache Directory

By setting environment variables, the default cache directory can be changed globally. In the past, TRANSFORMERS_CACHE was commonly used, but according to the latest documentation, it has been deprecated and will be removed in version v5. It is recommended to use HF_HOME to adhere to best practices. Here are specific setup methods:

In Python code, set the environment variable before importing the Transformers library:

import os
os.environ['HF_HOME'] = '/blabla/cache/'

In a Bash environment, set it via the command line:

export HF_HOME=/blabla/cache/

The advantage of this method is its global scope; once set, all operations using Hugging Face libraries will store caches in the specified directory. According to the documentation, the priority rules for cache directories are: first, use HUGGINGFACE_HUB_CACHE or TRANSFORMERS_CACHE; second, use HF_HOME; and finally, use XDG_CACHE_HOME + /huggingface. The default value of HF_HOME is $XDG_CACHE_HOME/huggingface, where XDG_CACHE_HOME is typically ~/.cache, resulting in the common default directory ~/.cache/huggingface.

3. Directly Specifying Cache Directory in Code

In addition to environment variables, the cache directory can be directly specified each time the from_pretrained method is called by using the cache_dir parameter. This method offers greater flexibility, allowing different cache directories to be set for different models or tasks. Here is an example:

tokenizer = AutoTokenizer.from_pretrained("roberta-base", cache_dir="new_cache_dir/")
model = AutoModelForMaskedLM.from_pretrained("roberta-base", cache_dir="new_cache_dir/")

In this example, the cache_dir parameter is set to "new_cache_dir/", which will cause the model and tokenizer cache files to be stored in that directory. The advantage of this method is its fine-grained control, suitable for scenarios where different models need to be stored in different locations. However, if the same cache directory is required across multiple scenarios, using environment variables is more convenient.

4. System-Level Configuration and Recommendations

To better manage caches, consider using symbolic links at the system level to handle cases where environment variables are not set. For example, link the specified cache directory to the default ~/.cache/huggingface directory to prevent issues when HF_HOME is not configured. In Bash, use the following command:

ln -s /path/to/cache/directory ~/.cache/huggingface

Before executing this command, ensure that the original ~/.cache/huggingface directory does not exist or has been moved to avoid conflicts. The advantage of this method is its robustness; even if environment variables are not correctly set in some cases, caches can still be stored in the expected location. Additionally, the HF_HOME environment variable is not only used by Transformers but also adopted by other Hugging Face libraries such as datasets, meaning that setting HF_HOME can more effectively manage caches for all Hugging Face libraries, reducing disk space waste.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.