環境
python:3.4.4
準備xml檔案
首先新建一個xml檔案,countries.xml。內容是在python官網上看到的。
<?xml version="1.0"?> <data> <country name="Liechtenstein"> <rank>1</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/> <neighbor name="Switzerland" direction="W"/> </country> <country name="Singapore"> <rank>4</rank> <year>2011</year> <gdppc>59900</gdppc> <neighbor name="Malaysia" direction="N"/> </country> <country name="Panama"> <rank>68</rank> <year>2011</year> <gdppc>13600</gdppc> <neighbor name="Costa Rica" direction="W"/> <neighbor name="Colombia" direction="E"/> </country> </data>
準備python檔案
新建一個test_DOM.py,用來解析xml檔案。
#!/usr/bin/python # -*- coding: UTF-8 -*- from xml.dom.minidom import parse import xml.dom.minidom DOMTree = xml.dom.minidom.parse("countries.xml") collection = DOMTree.documentElement if collection.hasAttribute("data"): print ("Root element : %s" % collection.getAttribute("data")) countries = collection.getElementsByTagName("country") for country in countries: print ("*****Country*****") if country.hasAttribute("name"): print ("Name: %s" % country.getAttribute("name")) rank = country.getElementsByTagName('rank')[0] print ("Rank: %s" % rank.childNodes[0].data) year = country.getElementsByTagName('year')[0] print ("Year: %s" % year.childNodes[0].data) gdppc = country.getElementsByTagName('gdppc')[0] print ("Gdppc: %s" % gdppc.childNodes[0].data) neighbors = country.getElementsByTagName('neighbor') for neighbor in neighbors: print ("Neighbor:", neighbor.getAttribute("name"),neighbor.getAttribute("direction"))
執行結果
>python test_DOM.py *****Country***** Name: Liechtenstein Rank: 1 Year: 2008 Gdppc: 141100 Neighbor: Austria E Neighbor: Switzerland W *****Country***** Name: Singapore Rank: 4 Year: 2011 Gdppc: 59900 Neighbor: Malaysia N *****Country***** Name: Panama Rank: 68 Year: 2011 Gdppc: 13600 Neighbor: Costa Rica W Neighbor: Colombia E
備註
DOM(Document Object Model)
DOM是一個W3C的跨語言的API,用來讀取和更改 XML 文件。
一個DOM解析器在解析一個XML文件時,一次性讀取整個文件,把文件中的所有元素儲存在記憶體中的一個樹結構中,之後可以對這個樹結構進行讀取或修改,也可以把修改過的樹結構寫入xml檔案。
參見: https://docs.python.org/2/library/xml.dom.html
DOMTree = xml.dom.minidom.parse("countries.xml")
使用 xml.dom.minidom解析器開啟 countries.xml 檔案,並返回一個 Document物件,也就是樹結構。Document 物件代表了整個 XML 文件,包括它的元素、屬性、處理指令、備註等。
參見: https://docs.python.org/2/library/xml.dom.minidom.html
Return a Document from the given input. filename_or_file may be either a file name, or a file-like object. parser, if given, must be a SAX2 parser object. This function will change the document handler of the parser and activate namespace support; other parser configuration (like setting an entity resolver) must have been done in advance.
collection = DOMTree.documentElement
返回 DOMTree的根元素。
Document.documentElement The one and only root element of the document.
rank = country.getElementsByTagName('rank')[0]
從country往下尋找所有 tag名為“rank”的元素節點,將找到的第一個節點賦值給 rank。
Document.getElementsByTagName(tagName) Search for all descendants (direct children, children’s children, etc.) with a particular element type name.
collection.getAttribute("data")
獲取並返回 collection 的“data”屬性值。如果collection沒有“data”屬性,則返回一個空的字串。
Element.getAttribute(name) Return the value of the attribute named by name as a string. If no such attribute exists, an empty string is returned, as if the attribute had no value.