Keywords: GitHub | command line download | curl | wget | API authentication
Abstract: This article provides a detailed exploration of how to download individual or multiple specific files from GitHub using the command line, without cloning the entire repository. Based on the best answer, it systematically introduces methods using curl and wget tools with GitHub raw file links, covering both public and private repositories. Additional practical tips from other answers, such as using the ?raw=true parameter in the new interface, are included. Through in-depth analysis of Git storage mechanisms and API calls, this paper offers a complete technical implementation suitable for developers and system administrators.
Introduction
In software development, there is often a need to retrieve specific files from a GitHub repository without cloning the entire project. This saves time and bandwidth, especially when only a few files are required. This article systematically explains how to achieve this via the command line, based on high-scoring answers from Stack Overflow.
GitHub File Storage Mechanism
GitHub uses Git as its version control system, where files are stored as objects in the .git directory. Direct access to individual files via the Git protocol is not feasible because Git stores file contents as blob objects, organized through tree and commit objects. Therefore, it is necessary to use GitHub's web interface or API to obtain raw file content.
Methods for Downloading Files from Public Repositories
For public repositories, the simplest method is to use the raw file link. On the GitHub file page, click the "Raw" button in the top-right corner to get the raw URL. For example, the file URL might be: https://github.com/username/repository/raw/master/path/to/file. Here, master can be replaced with a branch name, tag, or commit hash.
Use command-line tools like curl or wget to download the file:
curl -o filename https://github.com/username/repository/raw/master/path/to/fileor
wget -O filename https://github.com/username/repository/raw/master/path/to/fileThese commands save the file locally, with the filename specified by the -o or -O parameter.
Handling Private Repositories
For private repositories, authentication is required. First, create an access token in your GitHub account settings with appropriate permissions (e.g., repo permission). Then, use the GitHub API to download the file:
curl \
-H 'Authorization: token YOUR_TOKEN' \
-H 'Accept: application/vnd.github.v3.raw' \
-O \
-L 'https://api.github.com/repos/owner/repo/contents/path/to/file'Here, -H sets HTTP headers, the Authorization header includes the token, and the Accept header specifies the raw data format. The -O option makes curl save the output with the remote filename, and -L follows redirects. The API endpoint format is /repos/:owner/:repo/contents/:path.
Additional Tips
Other answers provide practical methods. For example, in the new GitHub interface (around June 2020), you can add the ?raw=true parameter to the file URL for direct download:
wget https://github.com/username/repository/blob/master/path/to/file?raw=trueThis method simplifies the process but may not work in all cases. Additionally, scripts can be written to automate downloading multiple files, improving efficiency.
In-Depth Analysis
GitHub's raw file links point to a web server that extracts file content from the Git repository and returns it. The API method is more flexible, supporting authentication and custom media types. Using the application/vnd.github.v3.raw media type ensures raw content is retrieved instead of a JSON response.
In practice, it is recommended to prioritize the API method for its stability and rich features. For simple needs, the raw link method is sufficient. Note that rate limits may apply when downloading files, especially for API calls.
Conclusion
Downloading specific files from GitHub via the command line is an efficient workflow. Public repositories can use raw links with curl/wget, while private repositories require API calls with access tokens. This article's methods are based on best practices and supplemented by other answers, providing a comprehensive solution. Developers should choose the appropriate method based on their specific scenarios to optimize development efficiency.