前言
前段時間一直使用到word文件轉pdf或者pdf轉word,尋思著用Java應該是可以實現的,於是花了點時間寫了個檔案轉換工具
原始碼weloe/FileConversion (github.com)
主要功能就是word和pdf的檔案轉換,如下
- pdf 轉 word
- pdf 轉 圖片
- word 轉 圖片
- word 轉 html
- word 轉 pdf
實現方法
主要使用了pdfbox Apache PDFBox | A Java PDF Library以及spire.doc Free Spire.Doc for Java | 100% 免費 Java Word 元件 (e-iceblue.cn)兩個工具包
pom.xml
<repositories>
<repository>
<id>com.e-iceblue</id>
<url>http://repo.e-iceblue.cn/repository/maven-public/</url>
</repository>
</repositories>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.4</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.doc.free</artifactId>
<version>3.9.0</version>
</dependency>
</dependencies>
策略介面
public interface FileConversion {
boolean isSupport(String s);
String convert(String pathName,String dirAndFileName) throws Exception;
}
PDF轉圖片實現
public class PDF2Image implements FileConversion{
private String suffix = ".jpg";
public static final int DEFAULT_DPI = 150;
@Override
public boolean isSupport(String s) {
return "pdf2image".equals(s);
}
@Override
public String convert(String pathName,String dirAndFileName) throws Exception {
String outPath = dirAndFileName + suffix;
if(Files.exists(Paths.get(outPath))){
throw new RuntimeException(outPath+" 檔案已存在");
}
pdf2multiImage(pathName,outPath,DEFAULT_DPI);
return outPath;
}
/**
* pdf轉圖片
* 多頁PDF會每頁轉換為一張圖片,下面會有多頁組合成一頁的方法
*
* @param pdfFile pdf檔案路徑
* @param outPath 圖片輸出路徑
* @param dpi 相當於圖片的解析度,值越大越清晰,但是轉換時間變長
*/
public void pdf2multiImage(String pdfFile, String outPath, int dpi) {
if (dpi <= 0) {
// 如果沒有設定DPI,預設設定為150
dpi = DEFAULT_DPI;
}
try (PDDocument pdf = PDDocument.load(new FileInputStream(pdfFile))) {
int actSize = pdf.getNumberOfPages();
List<BufferedImage> picList = new ArrayList<>();
for (int i = 0; i < actSize; i++) {
BufferedImage image = new PDFRenderer(pdf).renderImageWithDPI(i, dpi, ImageType.RGB);
picList.add(image);
}
// 組合圖片
ImageUtil.yPic(picList, outPath);
} catch (IOException e) {
e.printStackTrace();
}
}
}
PDF轉word實現
public class PDF2Word implements FileConversion {
private String suffix = ".doc";
@Override
public boolean isSupport(String s) {
return "pdf2word".equals(s);
}
/**
*
* @param pathName
* @throws IOException
*/
@Override
public String convert(String pathName,String dirAndFileName) throws Exception {
String outPath = dirAndFileName + suffix;
if(Files.exists(Paths.get(outPath))){
throw new RuntimeException(outPath+" 檔案已存在");
}
pdf2word(pathName, outPath);
return outPath;
}
private void pdf2word(String pathName, String outPath) throws IOException {
PDDocument doc = PDDocument.load(new File(pathName));
int pagenumber = doc.getNumberOfPages();
// 建立檔案
createFile(Paths.get(outPath));
FileOutputStream fos = new FileOutputStream(outPath);
Writer writer = new OutputStreamWriter(fos, "UTF-8");
PDFTextStripper stripper = new PDFTextStripper();
stripper.setSortByPosition(true);//排序
stripper.setStartPage(1);//設定轉換的開始頁
stripper.setEndPage(pagenumber);//設定轉換的結束頁
stripper.writeText(doc, writer);
writer.close();
doc.close();
}
}
word轉html
public class Word2HTML implements FileConversion{
private String suffix = ".html";
@Override
public boolean isSupport(String s) {
return "word2html".equals(s);
}
@Override
public String convert(String pathName, String dirAndFileName) {
String outPath = dirAndFileName + suffix;
if(Files.exists(Paths.get(outPath))){
throw new RuntimeException(outPath+" 檔案已存在");
}
Document doc = new Document();
doc.loadFromFile(pathName);
doc.saveToFile(outPath, FileFormat.Html);
doc.dispose();
return outPath;
}
}
word轉圖片
public class Word2Image implements FileConversion{
private String suffix = ".jpg";
@Override
public boolean isSupport(String s) {
return "word2image".equals(s);
}
@Override
public String convert(String pathName, String dirAndFileName) throws Exception {
String outPath = dirAndFileName + suffix;
if(Files.exists(Paths.get(outPath))){
throw new RuntimeException(outPath+" 檔案已存在");
}
Document doc = new Document();
//載入檔案
doc.loadFromFile(pathName);
//上傳文件頁數,也是最後要生成的圖片數
Integer pageCount = doc.getPageCount();
// 引數第一個和第三個都寫死 第二個引數就是生成圖片數
BufferedImage[] image = doc.saveToImages(0, pageCount, ImageType.Bitmap);
// 組合圖片
List<BufferedImage> imageList = Arrays.asList(image);
ImageUtil.yPic(imageList, outPath);
return outPath;
}
}
word轉pdf
public class Word2PDF implements FileConversion{
private String suffix = ".pdf";
@Override
public boolean isSupport(String s) {
return "word2pdf".equals(s);
}
@Override
public String convert(String pathName, String dirAndFileName) throws Exception {
String outPath = dirAndFileName + suffix;
if(Files.exists(Paths.get(outPath))){
throw new RuntimeException(outPath+" 檔案已存在");
}
//載入word
Document document = new Document();
document.loadFromFile(pathName, FileFormat.Docx);
//儲存結果檔案
document.saveToFile(outPath, FileFormat.PDF);
document.close();
return outPath;
}
}
使用
輸入轉換方法,檔案路徑,輸出路徑(輸出路徑如果輸入'null'則為檔案同目錄下同名不同字尾檔案)
轉換方法可選項:
- pdf2word
- pdf2image
- word2html
- word2image
- word2pdf
例如輸入:
pdf2word D:\test\testpdf.pdf null
控制檯輸出:
轉換方法: pdf2word 檔案: D:\test\testFile.pdf
轉換成功!檔案路徑: D:\test\testFile.doc