Keywords: C# | Excel Reading | OLEDB | OpenXML | LINQ Queries
Abstract: This article provides a comprehensive exploration of various technical solutions for reading Excel files in C# programs. It focuses on the traditional approach using OLEDB providers, which directly access Excel files through ADO.NET connection strings, load worksheet data into DataSets, and support LINQ queries for data processing. Additionally, it introduces two parsing methods of the OpenXML SDK: the DOM approach suitable for small files with strong typing, and the SAX method employing stream reading to handle large Excel files while avoiding memory overflow. The article demonstrates practical applications and performance characteristics through complete code examples.
Introduction
In modern software development, interaction with Microsoft Excel files is a common requirement. Many business scenarios necessitate extracting data from Excel worksheets and performing automated processing. While traditional text export and parsing methods are feasible, they suffer from numerous manual steps and low efficiency. This article systematically introduces multiple technical solutions for reading Excel files in the C# environment.
Reading Excel Using OLEDB Providers
OLEDB (Object Linking and Embedding Database) provides a standardized method for directly accessing Excel files. Through Microsoft.Jet.OLEDB.4.0 or Microsoft.ACE.OLEDB.12.0 providers, Excel files can be treated as database tables for querying.
The basic implementation steps are as follows: first, construct a connection string specifying the Excel file path and version information; then use OleDbDataAdapter to execute SQL queries; finally, populate the results into a DataSet.
var fileName = string.Format("{0}\\fileNameHere", Directory.GetCurrentDirectory());
var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", fileName);
var adapter = new OleDbDataAdapter("SELECT * FROM [workSheetNameHere$]", connectionString);
var ds = new DataSet();
adapter.Fill(ds, "anyNameHere");
DataTable data = ds.Tables["anyNameHere"];
LINQ Integration and Data Transformation
After converting DataTable to Enumerable, LINQ (Language Integrated Query) can be utilized for efficient data filtering and transformation. This approach is particularly suitable for scenarios requiring the construction of business objects from Excel data.
var data = ds.Tables["anyNameHere"].AsEnumerable();
var query = data.Where(x => x.Field<string>("phoneNumber") != string.Empty).Select(x =>
new MyContact
{
firstName= x.Field<string>("First Name"),
lastName = x.Field<string>("Last Name"),
phoneNumber =x.Field<string>("Phone Number"),
});
OpenXML SDK Methods
For scenarios requiring finer control or handling large Excel files, the Microsoft OpenXML SDK offers two distinct parsing strategies.
DOM Parsing Method
The DOM (Document Object Model) method loads the entire Excel document into memory, providing access through strongly-typed classes. This method features intuitive code and is suitable for small to medium-sized files.
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart ?? spreadsheetDocument.AddWorkbookPart();
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
string? text;
foreach (Row r in sheetData.Elements<Row>())
{
foreach (Cell c in r.Elements<Cell>())
{
text = c?.CellValue?.Text;
Console.Write(text + " ");
}
}
SAX Parsing Method
The SAX (Simple API for XML) method employs stream reading, processing Excel file content element by element. This approach has low memory footprint and is particularly suitable for handling large Excel files of hundreds of megabytes.
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart ?? spreadsheetDocument.AddWorkbookPart();
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
string text;
while (reader.Read())
{
if (reader.ElementType == typeof(CellValue))
{
text = reader.GetText();
Console.Write(text + " ");
}
}
Performance Comparison and Application Scenarios
The OLEDB method is suitable for rapid prototyping and simple data extraction tasks, offering concise code and a gentle learning curve. The OpenXML SDK provides richer functionality and control capabilities: the DOM method is appropriate for scenarios requiring complex operations on Excel files, while the SAX method is ideal for handling extremely large files.
In actual projects, the choice of method depends on specific requirements: file size, performance demands, functional complexity, and the development team's familiarity are all important considerations.
Best Practice Recommendations
Regardless of the chosen method, attention should be paid to error handling, resource release, and performance optimization. For the OLEDB method, ensure proper configuration of connection strings and permissions; for the OpenXML method, rationally select DOM or SAX strategies to avoid memory issues.
Additionally, it is recommended to perform data validation and type conversion when reading Excel data to ensure the robustness of business logic. For production environment applications, consideration should also be given to logging, monitoring, and exception handling mechanisms.