Efficiently Writing Large Excel Files with Apache POI: Avoiding Common Performance Pitfalls

Keywords: Apache POI | Large Excel Writing | SXSSF Streaming API | Performance Optimization | Java Data Processing

Abstract: This article examines key performance issues when using the Apache POI library to write large result sets to Excel files. By analyzing a common error case—repeatedly calling the Workbook.write() method within an inner loop, which causes abnormal file growth and memory waste—it delves into POI's operational mechanisms. The article further introduces SXSSF (Streaming API) as an optimization solution, efficiently handling millions of records by setting memory window sizes and compressing temporary files. Core insights include proper management of workbook write timing, understanding POI's memory model, and leveraging SXSSF for low-memory large-data exports. These techniques are of practical value for Java developers converting JDBC result sets to Excel.

Problem Background and Common Error Analysis

In Java development, using the Apache POI library to export large database result sets (ResultSet) to Excel files is a common task. However, when data volumes reach thousands to millions of records, developers often encounter performance bottlenecks and file corruption issues. A typical error example involves calling the Workbook.write() method within an inner loop while writing each row of data. Semantically, the Workbook.write() method is designed to write the entire workbook content to an output stream, not incrementally. Thus, each call redundantly writes all previously added rows, causing exponential file growth. For instance, with n rows and m columns, erroneous code executes n * m write operations instead of one. This wastes CPU and memory resources and may produce invalid Excel files, as reported where opening the file displays only the first row.

POI Workflow and Memory Management

Apache POI offers two main APIs for Excel files: HSSF for legacy .xls format and XSSF/SXSSF for .xlsx format. The standard XSSF API builds an in-memory DOM structure of the entire workbook, suitable for small datasets. For large result sets, this can lead to OutOfMemoryError. POI's Workbook class follows a single-write principle: after adding all data, call write() once to serialize the complete content to a file. In the erroneous code, repeated writes in the inner loop disrupt this mechanism, generating redundant data streams. The correct approach is to move wb.write(bos) outside the loop, executing it only once after data population to ensure file integrity.

Optimization Solution: Using SXSSF Streaming API

For large datasets, Apache POI version 3.8 and above provides SXSSF (Streaming Usermodel API) as an efficient solution. SXSSF creates a workbook based on a template (e.g., XSSFWorkbook) and controls cached rows by setting a memory window size (e.g., setRandomAccessWindowSize(100)). Excess rows are flushed to disk temporary files, significantly reducing memory usage. The following code example demonstrates basic SXSSF usage:

import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import org.apache.poi.xssf.streaming.SXSSFSheet;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Cell;

public class LargeExcelWriter {
    public void writeLargeResultSet(ResultSet rs, String outputPath) throws Exception {
        // Create SXSSF workbook, set compression for temporary files to save disk space
        SXSSFWorkbook wb = new SXSSFWorkbook();
        wb.setCompressTempFiles(true);
        SXSSFSheet sheet = wb.createSheet("Data");
        sheet.setRandomAccessWindowSize(100); // Keep only 100 rows in memory
        
        // Write header
        Row headerRow = sheet.createRow(0);
        ResultSetMetaData metaData = rs.getMetaData();
        int columnCount = metaData.getColumnCount();
        for (int i = 0; i < columnCount; i++) {
            Cell cell = headerRow.createCell(i);
            cell.setCellValue(metaData.getColumnLabel(i + 1));
        }
        
        // Stream data rows
        int rowIndex = 1;
        while (rs.next()) {
            Row row = sheet.createRow(rowIndex);
            for (int col = 0; col < columnCount; col++) {
                Cell cell = row.createCell(col);
                cell.setCellValue(rs.getString(col + 1)); // Assume string data type
            }
            rowIndex++;
        }
        
        // Write to file once
        try (FileOutputStream out = new FileOutputStream(outputPath)) {
            wb.write(out);
        }
        wb.dispose(); // Clean up temporary files
    }
}

This method avoids memory bottlenecks through streaming, suitable for exporting millions of records. Key configurations include setRandomAccessWindowSize() to control memory usage, setCompressTempFiles(true) to reduce disk I/O, and calling dispose() finally to release resources.

Performance Comparison and Best Practices

In performance tests, the erroneous method (repeated writes) causes abnormal file growth and long runtimes, while SXSSF maintains stable memory usage with linear write time increases. For example, exporting 1 million rows × 10 columns, the erroneous method might generate multi-GB temporary files, whereas SXSSF requires only a few hundred MB of memory. Best practices include always calling write() outside loops, using SXSSF for large datasets, monitoring JVM memory usage, and considering pagination or asynchronous exports for user experience optimization. Developers should also refer to POI official documentation to ensure dependency library compatibility (e.g., poi-ooxml, xmlbeans).

Conclusion

Efficiently writing large Excel files requires a deep understanding of Apache POI's API semantics and memory model. Avoiding Workbook.write() calls in inner loops is key to preventing performance issues, and adopting SXSSF streaming API significantly enhances processing capabilities. By properly managing write timing and leveraging optimized configurations, developers can reliably achieve large-scale data conversion from JDBC result sets to Excel, meeting enterprise application demands.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.

Problem Background and Common Error Analysis

POI Workflow and Memory Management

Optimization Solution: Using SXSSF Streaming API

Performance Comparison and Best Practices

Conclusion

Cite this article