Python程式設計從0到1(實戰篇:提取Word表格儲存到Excel)

weixin_33831196發表於2018-03-28

今天突然有一個需求,要把統計局網站下載的Word文件裡的表格提取出來,放到Excel表中,便於下一步進行資料分析。

1. 引入擴充套件庫

# -*- coding: utf-8 -*-
import docx
from docx import Document
import xlwt;
import xlrd;
import glob

2. 讀取Word文件中的表格

def readdoc(filename):    
    doc = docx.Document(filename)
    tables = []
    for table in doc.tables:
        table_temp = []
        for row in table.rows:
            row_temp = []
            for cell in row.cells:
                row_temp.append(cell.text)
            table_temp.append(row_temp)
        tables.append(table_temp)
    return tables

3. 寫入Excel檔案

def writeExcel(tables,filename):
    Sheet_index = 0
    workbook = xlwt.Workbook(encoding='utf-8')
    for table in tables:
        worksheet = workbook.add_sheet('sheet' + str(Sheet_index),cell_overwrite_ok = True)
        Sheet_index = Sheet_index + 1
        for rows in table:
            r = table.index(rows)
            for cell in rows:
                c = rows.index(cell)
                print(r,c,cell)
                worksheet.write(r,c,cell)
    workbook.save(filename[:-5] + ".xls")

4. 遍歷目錄下所有docx檔案,並生成同名Excel檔案

filenames = glob.glob("jtdoc/*.docx")
for filename in filenames:
    tables = readdoc(filename)
    writeExcel(tables,filename)

相關文章