java 讀 大檔案excel 記憶體溢位 解決
Office軟體一直是一個誨譽參半的軟體,廣大普通計算機使用者用Office來滿足日常辦公需求,於是就產生了很多生產資料和文件,需要和企業單位的專用辦公系統對接,而Office的解析工作一直是程式設計師非常頭痛的問題,經常招致程式設計師的謾罵,也被譽為是微軟最爛的發明之一。POI的誕生解決了Excel的解析難題(POI即“討厭的電子表格”,確實很討厭,我也很討厭Excel),但如果用不好POI,也會導致程式出現一些BUG,例如記憶體溢位,假空行,公式等等問題。下面介紹一種解決POI讀取Excel記憶體溢位的問題。
POI讀取Excel有兩種模式,一種是使用者模式,一種是SAX模式,將xlsx格式的文件轉換成CVS格式後再進行處理使用者模式相信大家都很清楚,也是POI常用的方式,使用者模式API介面豐富,我們可以很容易的使用POI的API讀取Excel,但使用者模式消耗的記憶體很大,當遇到很多sheet、大資料網格、假空行、公式等問題時,很容易導致記憶體溢位。POI官方推薦解決記憶體溢位的方式使用CVS格式解析,我們不可能手工將Excel檔案轉換成CVS格式再上傳,這樣做太麻煩了,好再POI給出了xlsx轉換CVS的例子,基於這個例子我進行了一下改造,即可解決使用者模式讀取Excel記憶體溢位的問題。下面附上程式碼:
maven
<!-- poi office -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.15</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.15</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-schemas</artifactId>
<version>3.15</version>
</dependency>
java
package com.thinkgem.jeesite.common.utils.excel;
import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
/* ====================================================================
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==================================================================== */
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.ParserConfigurationException;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackageAccess;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.util.SAXHelper;
import org.apache.poi.xssf.eventusermodel.ReadOnlySharedStringsTable;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler;
import org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor;
import org.apache.poi.xssf.model.StylesTable;
import org.apache.poi.xssf.usermodel.XSSFComment;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
/**
* A rudimentary XLSX -> CSV processor modeled on the
* POI sample program XLS2CSVmra from the package
* org.apache.poi.hssf.eventusermodel.examples.
* As with the HSSF version, this tries to spot missing
* rows and cells, and output empty entries for them.
* <p/>
* Data sheets are read using a SAX parser to keep the
* memory footprint relatively small, so this should be
* able to read enormous workbooks. The styles table and
* the shared-string table must be kept in memory. The
* standard POI styles table class is used, but a custom
* (read-only) class is used for the shared string table
* because the standard POI SharedStringsTable grows very
* quickly with the number of unique strings.
* <p/>
* For a more advanced implementation of SAX event parsing
* of XLSX files, see {@link XSSFEventBasedExcelExtractor}
* and {@link XSSFSheetXMLHandler}. Note that for many cases,
* it may be possible to simply use those with a custom
* {@link SheetContentsHandler} and no SAX code needed of
* your own!
*/
/**
*
* @author lhy
*
*/
public class XLSX2CSV {
/**
* Uses the XSSF Event SAX helpers to do most of the work
* of parsing the Sheet XML, and outputs the contents
* as a (basic) CSV.
*/
private List<String[]> rows = new ArrayList<String[]>();
private final OPCPackage xlsxPackage;
/**
* Number of columns to read starting with leftmost
*/
private int minColumns;
/**
* Destination for data
*/
private class SheetToCSV implements SheetContentsHandler {
private String[] record;
private int minColumns;
private int thisColumn = 0;
public SheetToCSV(int minColumns) {
super();
this.minColumns = minColumns;
}
@Override
public void startRow(int rowNum) {
record=new String[this.minColumns];
// System.out.println("################################:"+rowNum);
}
@Override
public void endRow(int rowNum) {
thisColumn=0;
rows.add(this.record);
//System.out.println("**********************************");
}
@Override
public void cell(String cellReference, String formattedValue, XSSFComment comment) {
if(thisColumn<this.minColumns)
record[thisColumn]=formattedValue;
thisColumn++;
//System.out.print(formattedValue);
//System.out.println("&&&&&&&&&&&&&&&&&&&&&&&&");
}
@Override
public void headerFooter(String text, boolean isHeader, String tagName) {
// Skip, no headers or footers in CSV
}
}
/**
* Creates a new XLSX -> CSV converter
*
* @param pkg The XLSX package to process
* @param output The PrintStream to output the CSV to
* @param minColumns The minimum number of columns to output, or -1 for no minimum
*/
public XLSX2CSV(OPCPackage pkg, int minColumns) {
this.xlsxPackage = pkg;
this.minColumns = minColumns;
}
/**
* Parses and shows the content of one sheet
* using the specified styles and shared-strings tables.
*
* @param styles
* @param strings
* @param sheetInputStream
*/
public void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, SheetContentsHandler sheetHandler,InputStream sheetInputStream)
throws IOException, ParserConfigurationException, SAXException {
DataFormatter formatter = new DataFormatter();
InputSource sheetSource = new InputSource(sheetInputStream);
try {
XMLReader sheetParser = SAXHelper.newXMLReader();
ContentHandler handler = new XSSFSheetXMLHandler(styles, null, strings, sheetHandler, formatter, false);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
} catch (ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
}
/**
* Initiates the processing of the XLS workbook file to CSV.
*
* @throws IOException
* @throws OpenXML4JException
* @throws ParserConfigurationException
* @throws SAXException
*/
public List<String[]> process() throws IOException, OpenXML4JException, ParserConfigurationException, SAXException {
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(this.xlsxPackage);
XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);
StylesTable styles = xssfReader.getStylesTable();
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
int index = 0;
while (iter.hasNext()) {
InputStream stream = iter.next();
String sheetName = iter.getSheetName();
//this.output.println();
//this.output.println(sheetName + " [index=" + index + "]:");
processSheet(styles, strings, new SheetToCSV(this.minColumns), stream);
stream.close();
++index;
}
return this.rows;
}
/**
* 得到excel的記錄
* @param excelPath
* @param minColumns 輸出多少列
* @return
* @throws Exception
*/
public static List<String[]> getRecords(String excelPath,int minColumns) throws Exception{
File xlsxFile = new File(excelPath);
if (!xlsxFile.exists()) {
System.err.println("Not found or not a file: " + xlsxFile.getPath());
return null;
}
// The package open is instantaneous, as it should be.
OPCPackage p = OPCPackage.open(xlsxFile.getPath(), PackageAccess.READ);
XLSX2CSV xlsx2csv = new XLSX2CSV(p, minColumns);
List<String[]>list=xlsx2csv.process();
p.close();
return list;
}
public static void main(String[] args) throws Exception {
/* File xlsxFile = new File("/home/lhy/QA資料20170516210605.xlsx");
if (!xlsxFile.exists()) {
System.err.println("Not found or not a file: " + xlsxFile.getPath());
return;
}
// The package open is instantaneous, as it should be.
OPCPackage p = OPCPackage.open(xlsxFile.getPath(), PackageAccess.READ);
XLSX2CSV xlsx2csv = new XLSX2CSV(p, 4);*/
List<String[]>list=getRecords("/home/lhy/d3a65e38583640eaa6e81343311f6d38.xls",4);
// p.close();
for(int i=0;i<list.size();i++)
{
System.out.println("******************:"+i);
for(String a:list.get(i))
{
System.out.println(a);
System.out.println("------------------------");
}
System.out.println("#####################");
}
}
}
相關文章
- java 匯出 excel 最佳實踐,java 大檔案 excel 避免OOM(記憶體溢位) excel 工具框架JavaExcelOOM記憶體溢位框架
- 解決SqlServer執行指令碼,檔案過大,記憶體溢位問題SQLServer指令碼記憶體溢位
- Java棧溢位|記憶體洩漏|記憶體溢位Java記憶體溢位
- java向excel 寫入海量資料記憶體溢位問題 解決JavaExcel記憶體溢位
- Java記憶體溢位Java記憶體溢位
- 阿里大佬講解Java記憶體溢位示例(堆溢位、棧溢位)阿里Java記憶體溢位
- Windows Tomcat 記憶體溢位解決方法WindowsTomcat記憶體溢位
- jvm記憶體設定及記憶體溢位、解決方案JVM記憶體溢位
- tomcat記憶體溢位:PermGen space解決方法Tomcat記憶體溢位
- Jmeter記憶體溢位:java.lang.OutOfMemoryError: Java heap space解決思路JMeter記憶體溢位JavaError
- Java程式碼執行記憶體溢位詳解及解決方案Java記憶體溢位
- 記憶體溢位記憶體溢位
- tomcat伺服器記憶體溢位解決方法Tomcat伺服器記憶體溢位
- Java EasyExcel 匯出報記憶體溢位如何解決JavaExcel記憶體溢位
- JAVA記憶體區域與記憶體溢位異常Java記憶體溢位
- [Java基礎]記憶體洩漏和記憶體溢位Java記憶體溢位
- java記憶體溢位和記憶體洩漏的區別Java記憶體溢位
- 記憶體溢位和記憶體洩露記憶體溢位記憶體洩露
- JVM(2)-Java記憶體區域與記憶體溢位異常JVMJava記憶體溢位
- 【記憶體洩漏和記憶體溢位】JavaScript之深入淺出理解記憶體洩漏和記憶體溢位記憶體溢位JavaScript
- vue專案編譯node記憶體溢位Vue編譯記憶體溢位
- JVM——記憶體洩漏與記憶體溢位JVM記憶體溢位
- 手動寫java記憶體溢位 java.lang.StackOverflowErrorJava記憶體溢位Error
- Java記憶體溢位OutOfMemoryError的產生與排查Java記憶體溢位Error
- Java服務假死後續之記憶體溢位Java記憶體溢位
- 記一次網頁記憶體溢位分析及解決實踐網頁記憶體溢位
- JavaScript之記憶體溢位和記憶體洩漏JavaScript記憶體溢位
- eclipse中啟動專案報記憶體溢位問題通過修改配置解決Eclipse記憶體溢位
- POI實現大資料EXCLE匯入匯出,解決記憶體溢位問題大資料記憶體溢位
- 直播平臺搭建,Java 記憶體溢位的排查方法Java記憶體溢位
- Java記憶體區域與記憶體溢位異常(JVM學習系列1)Java記憶體溢位JVM
- JVM學習-02-Java記憶體區域與記憶體溢位異常JVMJava記憶體溢位
- Laravel 使用 laravel-excel擴充套件包(maatwebsite/excel)匯入報記憶體溢位LaravelExcel套件Web記憶體溢位
- return new物件造成溢位記憶體物件記憶體
- mybatis-plus getOne 記憶體溢位MyBatis記憶體溢位
- 手動寫java記憶體溢位 java.lang.OutOfMemoryError: PermGen spaceJava記憶體溢位Error
- JVM 發生記憶體溢位的 8 種原因、及解決辦法JVM記憶體溢位
- Java虛擬機器01——Java記憶體資料區域和記憶體溢位異常Java虛擬機記憶體溢位
- 深入理解Java虛擬機器-Java記憶體區域與記憶體溢位異常Java虛擬機記憶體溢位