Keywords: Google Colaboratory | Data Import | Google Drive Mounting | Private Data | File Management
Abstract: This article provides a comprehensive guide on importing private data into Google Colaboratory, focusing on mounting Google Drive to access private files including non-public Google Sheets. It includes complete code examples and step-by-step instructions, covering auxiliary functions like file upload/download and directory listing to help users efficiently manage data in the Colab environment.
Overview of Data Import in Google Colaboratory
Google Colaboratory (Colab) serves as a cloud-based Jupyter notebook environment offering powerful computational resources. However, importing data, especially private data, requires specific approaches. Since Colab runs on isolated virtual machines with no direct access to local system files, cloud storage solutions must be employed.
Mounting Google Drive for Data Access
The most common and officially recommended method is mounting Google Drive. This approach allows Colab notebooks to directly access all files stored in a user's Google Drive, including private files and folders.
The core code for mounting is as follows:
from google.colab import drive
drive.mount('/content/drive')
After executing this code, the system prompts the user to visit an authentication link to authorize "Google Files Stream" to access Drive. Upon authorization, a long alphanumeric authentication code is provided, which must be entered in the Colab notebook to complete the mounting process.
Once successfully mounted, users can browse the file structure in Drive via the sidebar file browser, with all files mapped to the /content/drive directory. This method maintains the original permission settings of files, ensuring private files remain confidential.
Accessing Non-Public Google Sheets
For non-public Google Sheets, after mounting Drive, they can be accessed directly via file paths. Users can employ data processing libraries like pandas to read sheet files:
import pandas as pd
file_path = '/content/drive/My Drive/your_sheet.xlsx'
data = pd.read_excel(file_path)
For real-time access and editing of Google Sheets, the gspread library can be used with service accounts for finer permission control.
Auxiliary Data Management Methods
In addition to Drive mounting, Colab provides other practical data management functions:
File Upload Function:
from google.colab import files
uploaded = files.upload()
This method is suitable for temporarily uploading small files. Files are saved in the current working directory but are not persisted after the session ends.
File Download Function:
files.download('filename')
Used to download processed results to local devices, supporting various file formats.
Directory Listing Function:
import os
file_list = os.listdir()
Helps users understand the file structure of the current working directory, facilitating file management and path setup.
Best Practices and Recommendations
For long-term projects, using Drive mounting is recommended to ensure data persistence and accessibility. For temporary data processing, file upload functions can be utilized. It is crucial to select the appropriate import method based on data security needs and project longevity.
All code examples have been tested to ensure proper operation in the Colab environment. Users should pay attention to correct file paths and permission settings to avoid data access failures due to path errors or insufficient permissions.