Keywords: MySQL | CSV Import | Data Migration | LOAD DATA INFILE | Large Dataset Processing
Abstract: This article provides a comprehensive guide on importing large CSV files (e.g., containing 1.4 million rows) into MySQL Workbench. It analyzes common issues like file path errors and field delimiters, offering complete LOAD DATA INFILE syntax solutions including proper use of ENCLOSED BY clause. GUI import methods are introduced as alternatives, with in-depth analysis of MySQL data import mechanisms and performance optimization strategies.
Problem Background and Challenges
When dealing with large-scale datasets, traditional spreadsheet software like Excel has significant row limitations (typically around 1 million rows), making it impractical to handle CSV files containing 1.4 million rows of data. MySQL Workbench, as a professional database management tool, provides robust data import capabilities that effectively address such large-scale data processing requirements.
Core Solution: LOAD DATA INFILE Command
MySQL's LOAD DATA INFILE command is the most efficient method for importing large CSV files. Below is the complete syntax structure and parameter explanation:
LOAD DATA LOCAL INFILE '/full/path/to/model.csv'
INTO TABLE test.dummy
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';
Key Parameter Details
File Path Specification: The absolute path to the CSV file must be provided to avoid file not found errors caused by relative paths. On Windows systems, the path format is C:\Users\username\Documents\model.csv, while on Linux/macOS it's /home/username/documents/model.csv.
ENCLOSED BY Clause: This is the crucial parameter for handling fields containing commas. When field values in the CSV file contain commas themselves, quotation marks are used to enclose the field values. ENCLOSED BY '"' ensures MySQL correctly identifies field boundaries and prevents data parsing errors.
Table Structure Design and Data Mapping
Before importing, ensure the target table structure corresponds to the CSV file columns:
CREATE TABLE test.dummy (
ServiceAreaCodes VARCHAR(50),
PhoneNumbers VARCHAR(20),
Preferences TEXT,
Opstyp VARCHAR(10),
PhoneTyp VARCHAR(10)
);
The column name mapping relationship is: "Service Area Code" → ServiceAreaCodes, "Phone Numbers" → PhoneNumbers, and so on. Pay attention to column name differences and appropriate data type selection.
Alternative Approach: GUI Import Method
For smaller datasets, MySQL Workbench's graphical interface import functionality can be used:
- Open a query window and execute
SELECT * FROM dummy - Select "Import" option from the menu bar
- Click the "Apply" button in the interface below the result grid
This method is suitable for smaller data volumes that don't require complex field processing, though performance may be inferior to command-line approach when handling 1.4 million rows.
Error Troubleshooting and Optimization Recommendations
Common Error Analysis:
model.CSV file not found: Typically caused by incorrect file paths or file permission issues- Data truncation errors: Insufficient column length or data type mismatches in target table
- Character encoding issues: Ensure CSV file and database use the same character set (UTF-8 recommended)
Performance Optimization Strategies:
- Disable indexes and foreign key constraints before import, rebuild them after completion
- Use
LOCALkeyword to read files from client side, avoiding server file permission issues - For extremely large datasets, consider batch importing or using professional ETL tools
In-depth Technical Principle Analysis
MySQL's LOAD DATA INFILE command employs a streaming processing mechanism that reads CSV files line by line and directly inserts data into the database, avoiding memory overflow risks. Compared to traditional INSERT statements, this method reduces SQL parsing overhead and network transmission latency, making it particularly suitable for processing million-scale large datasets.
Precise configuration of field terminators and line terminators ensures accurate data parsing, while the ENCLOSED BY parameter provides flexibility in handling complex field content, forming the technical foundation for reliable data import operations.