Keywords: EPPlus | Excel data reading | C# reflection | strongly-typed conversion | data integration
Abstract: This article explores in detail how to use the EPPlus library in C# to read table data from Excel files and convert it into strongly-typed object collections. By analyzing best-practice code, it covers identifying table headers, handling data type conversions (particularly the challenge of numbers stored as double in Excel), and using reflection for dynamic property mapping. The content spans from basic file operations to advanced data transformation, providing reusable extension methods and test examples to help developers efficiently manage Excel data integration tasks.
In modern software development, reading structured data from Excel files is a common requirement, especially in scenarios like data import, report generation, or system integration. EPPlus, as a powerful .NET library, offers rich APIs for manipulating Excel files, supporting reading, writing, and formatting. However, directly mapping Excel table data to strongly-typed object collections is not a built-in feature, requiring developers to implement it using reflection and custom logic. Based on a typical technical Q&A case, this article delves into how to efficiently read Excel tables with EPPlus and store data in List<T>, where T is a user-defined data class.
Problem Context and Core Challenges
The user's goal is to read an Excel table containing a header row (e.g., Id, Name, Gender) and multiple data rows. The data from each column needs to be categorized based on the header and stored in a List<ExcelData>, where ExcelData is a class with corresponding properties (e.g., Id, Name, Gender). This allows direct data access via property names, such as outputting data.Id + data.Name + data.Gender in a loop. The core challenges are: EPPlus does not provide a native method to convert tables directly to objects, and numbers in Excel are typically stored as double, requiring proper type conversion to avoid data loss or errors.
Solution Overview
The best answer provides a generic extension method ConvertTableToObjects<T> that dynamically maps Excel table data to objects of a specified type T using reflection. The method first obtains the table's address range, extracts cell data, groups it by rows, and determines column names and data types based on the header row. Then, it iterates through data rows, assigning values according to property names and types, with special handling for double to other types (e.g., int, DateTime). Finally, it returns an IEnumerable<T> collection ready for further operations.
Detailed Code Implementation
Below is a simplified code example illustrating the logic. For clarity, error handling and edge cases are omitted; refer to the original answer for the complete code.
public static IEnumerable<T> ConvertTableToObjects<T>(this ExcelTable table) where T : new()
{
// Get the start and end positions of the table
var start = table.Address.Start;
var end = table.Address.End;
var cells = new List<ExcelRangeBase>();
// Collect all cell data
for (var r = start.Row; r <= end.Row; r++)
for (var c = start.Column; c <= end.Column; c++)
cells.Add(table.WorkSheet.Cells[r, c]);
// Group by row, assuming the first row is the header
var groups = cells.GroupBy(cell => cell.Start.Row).ToList();
var colnames = groups.First()
.Select((hcell, idx) => new { Name = hcell.Value.ToString(), index = idx })
.Where(o => typeof(T).GetProperties().Select(p => p.Name).Contains(o.Name))
.ToList();
// Skip the header row and process data rows
var rowvalues = groups.Skip(1).Select(cg => cg.Select(c => c.Value).ToList());
return rowvalues.Select(row =>
{
var obj = new T();
colnames.ForEach(col =>
{
var prop = typeof(T).GetProperty(col.Name);
var val = row[col.index];
if (val is double) // Handle Excel numbers
{
var unboxedVal = (double)val;
if (prop.PropertyType == typeof(int))
prop.SetValue(obj, (int)unboxedVal);
else if (prop.PropertyType == typeof(DateTime))
prop.SetValue(obj, DateTime.FromOADate(unboxedVal));
else
prop.SetValue(obj, unboxedVal);
}
else
prop.SetValue(obj, val);
});
return obj;
});
}
This method extends the ExcelTable class, allowing any table to be easily converted to an object collection. For example, with an ExcelData class, use it as: var dataList = worksheet.Tables.First().ConvertTableToObjects<ExcelData>().ToList();.
Practical Application and Testing
To validate the method's effectiveness, a simple test can be written. Assuming the Excel file contains a table as described earlier, test code might look like:
[TestMethod]
public void TestTableConversion()
{
var file = new FileInfo(@"C:\ExcelFile.xlsx");
using (var package = new ExcelPackage(file))
{
var worksheet = package.Workbook.Worksheets[1];
var table = worksheet.Tables.First();
var result = table.ConvertTableToObjects<ExcelData>().ToList();
foreach (var item in result)
Console.WriteLine($"{item.Id}{item.Name}{item.Gender}");
// Output: 1JohnMale, 2MariaFemale, 3DanielUnknown
}
}
This test demonstrates loading an Excel from a file, extracting the table, and converting the data. The output should match expectations, proving the method's correctness.
Supplementary References and Optimization Tips
Beyond the best answer, other responses provide basic methods, such as directly iterating cells (Answer 1), but lack type safety and structured output. In real projects, it's advisable to incorporate error handling (e.g., for null values or invalid types) and performance optimizations (like caching reflection results). Additionally, for large Excel datasets, consider streaming or asynchronous operations to avoid memory issues.
Conclusion
Reading Excel tables and converting them to object collections with EPPlus is an efficient data processing approach, especially suited for scenarios requiring strong typing. The method introduced here leverages reflection and extension methods, offering a flexible and reusable solution. Developers can adapt the code based on specific needs, such as supporting more complex data types or custom mapping rules. Overall, mastering these techniques will significantly enhance efficiency and code quality in handling Excel data.