流式生成Excel檔案

你最紅發表於2019-03-04

當我們要匯出資料庫資料到Excel檔案時,如果資料量特別大,那麼可能需要耗費較多記憶體造成OOM。即使沒有OOM,也有可能因為生成Excel檔案的時間太久導致請求超時。這時候就需要POI的SXSSF(org.apache.poi.xssf.streaming)功能了。

Excel兩種格式

  • Excel 97(-2007) file format

  • Excel 2007 OOXML (.xlsx) file format

HSSF is the POI Project`s pure Java implementation of the Excel `97(-2007) file format. XSSF is the POI Project`s pure Java implementation of the Excel 2007 OOXML (.xlsx) file format.

HSSF and XSSF provides ways to read spreadsheets create, modify, read and write XLS spreadsheets. They provide:

  • low level structures for those with special needs
  • an eventmodel api for efficient read-only access
  • a full usermodel api for creating, reading and modifying XLS files

Since 3.8-beta3, POI provides a low-memory footprint SXSSF API built on top of XSSF.

SXSSF is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

In auto-flush mode the size of the access window can be specified, to hold a certain number of rows in memory. When that value is reached, the creation of an additional row causes the row with the lowest index to to be removed from the access window and written to disk. Or, the window size can be set to grow dynamically; it can be trimmed periodically by an explicit call to flushRows(int keepRows) as needed.

Due to the streaming nature of the implementation, there are the following limitations when compared to XSSF:

  • Only a limited number of rows are accessible at a point in time.
  • Sheet.clone() is not supported.
  • Formula evaluation is not supported

SXSSF

SXSSF是如何減小記憶體消耗的呢?它通過將資料寫到臨時檔案來減少記憶體使用,降低發生OOM錯誤的概率。

// turn off auto-flushing and accumulate all rows in memory
SXSSFWorkbook wb = new SXSSFWorkbook(-1); 
複製程式碼

你也可以在構造方法裡,指定-1來關閉自動寫入資料到檔案,將所有資料內容保持在記憶體裡。

雖然這裡處理了記憶體OOM的問題,但是還是必須將全部資料寫到一個臨時檔案之後才能響應請求,請求超時的問題沒有解決。

流式生成

Excel 2007 OOXML (.xlsx) 檔案格式其實本質上是一個zip檔案,我們可以把.xlsx檔案字尾名改為.zip,然後解壓:

$ mv output.xlsx output.zip
$ unzip output.zip
$ tree output/
output/
├── [Content_Types].xml
├── _rels
├── docProps
│   ├── app.xml
│   └── core.xml
└── xl
    ├── _rels
    │   └── workbook.xml.rels
    ├── sharedStrings.xml
    ├── styles.xml
    ├── workbook.xml
    └── worksheets
        └── sheet1.xml

5 directories, 8 files
複製程式碼

我們可以看到這個Excel檔案解壓後包含了上面那些檔案,其中styles是我們定義的樣式格式(包括字型、文字大小、顏色、居中等屬性),worksheets目錄下是我們的資料內容。

通過具體分析資料格式,我們可以自己控制xlsx檔案的寫入過程,將資料直接寫到響應流上而非臨時檔案就可以完美解決請求超時的問題。

示例程式碼

XSSFWorkbook wb = new XSSFWorkbook()
XSSFCellStyle headerStyle = genHeaderStyle(wb)
sheets.each { sheet ->
    def xssfSheet = wb.createSheet(sheet.name)
    sheet.setXSSFSheet(xssfSheet)
    sheet.setHeaderStyle(headerStyle)
}
File template = genTemplateFile(wb)
ZipOutputStream zos = new ZipOutputStream(responseStream);
ZipFile templateZip = new ZipFile(template);
Enumeration<ZipEntry> templateEntries = templateZip.entries();
try {
  while (templateEntries.hasMoreElements()) {
    // copy all template content to the ZipOutputStream zos
    // except the sheet itself
  }
  zos.putNextEntry(new ZipEntry(sheetName)); // now the sheet
  OutputStreamWriter sheetOut = new OutputStreamWriter(zos, "UTF-8");
  try {
    sheetOut.write("<?xml version="1.0" encoding="UTF-8"?>");
    sheetOut.write("<worksheet><sheetData>");
    // write the content – rows and cells
    sheetOut.write("</sheetData></worksheet>");
  } finally { sheetOut.close(); }
} finally { zos.close(); }
複製程式碼

其中,template包含了一些索引資訊,比如建了哪些樣式、幾個sheet等,這些資訊是放到ZIP檔案的最前面的,最後才是sheet內容資料。

我的部落格原文地址
blog.yu000hong.com/2018/07/24/…

相關文章