Inside MSXML Performance(MSXML效能分析) (3) (轉)

worldblog發表於2007-12-12
Inside MSXML Performance(MSXML效能分析) (3) (轉)[@more@]

MS Featuresmicrosoft-com::office" />

MSXML特點

Next, let's examine some important scenariassociated with the Document Model ()—including loading, saving, walking a DOM tree, and creating a new DOM tree in memory.

接下去,讓我們討論一些在文件模型(DOM)中很重要的場景,包括載入,儲存,遍歷DOM樹和在中建立一個新的DOM樹。

DOM

The MSXML Document Object Model ("Microsoft.XMLDOM," CLSID_DOMDocument, IID_IXMLDOMDocument) is the starting point for all XML processing within the MSXML parser. The fastest way to load an XML document is to use the default "rental" threading model (which means the DOM document can be used by only one thread at a time; it doesn't matter which thread) with validateOnParse, resolveExternals, and preserveWhiteSpace all disabled:

MSXML文件物件模型("Microsoft.XMLDOM," CLSID_DOMDocument, IID_IXMLDOMDocument)是MSXML解析器中所有處理XML過程的起始點。載入一個XML文件的最快的方法是使用預設的“租用”執行緒(這意味著該DOM文件同時只有一個執行緒能使用;但它並不介意是哪一個執行緒使用),必須將validateOnParseresolveExternalspreserveWhiteSpace的屬性設為False:

  var doc = new Object("Microsoft.XMLDOM");

  doc.validateOnParse = false;

  doc.resolveExternals = false;

  doc.preserveWhiteSpace = false;

  doc.load("test.xml");

Working Set

工作集

When using the DOM, the first metric to consr is the working set. Memory is used to load Msxml.dll and the other .dll files on which it depends. Some of these other .dll files are "delay loaded," which means the working set won't be affected until that .dll is used. MSXML is a DLL, so you typically use the standard COM s (CoInitialize and CoCreateInstance) to create a new XML document object. The minimum working set for a simple Visual C++ 6.0 command line application that uses COM is about one megabyte. (This includes the following .dll files: Ntdll.dll, Kernel32.dll, Ole32.dll, Rpcrt4.dll, Advapi32.dll, Gdi.dll, User32.dll, and Oleaut32.dll.) The first call to CoCreateInstance of an IXMLDOMDocument object loads Msxml.dll and Shli.dll, which adds another 745 KB on top of this. Once all the .dll files are loaded, a new IXMLDOMDocument object is only about 8 KB.

當使用DOM時,首先要考慮的度量指標是工作集。記憶體中載入了Msxml.dll和其他必須的dll。這些dll檔案中有的是延時載入的,就是說它們在沒有使用之前並不影響工作集。MSXML是一個COM DLL,所以你通常使用標準COM API(CoInitializeCoCreateInstance)來建立一個新的XML文件物件。對於一個簡單的使用COM的Visual C++6.0命令列應用最少的工作集是1兆位元組左右。(這包含了以下dll檔案:Ntdll.dll,Kernel32.dll,Ole32.dll,Rpcrt4.dll,Advapi32.dll,Gdi.dll,User32.dll和Oleaut32.dll。)首次CoCreateInstance建立IXMLDOMDocument物件時載入Msxml.dll和Shlwaip.dll,在前面的基礎上又增加了745KB。一旦所有的dll檔案載入後,新建的IXMLDocument物件只需要8KB空間。

The memory used by the XML data loaded into an XML document is anywhere from one to four times the size of the XML file on disk, depending on the "tagginess" of the data being loaded and whether the file was already in a Unicode format on disk. The following is a very rough formula for estimating the memory required for a given XML document:

記憶體中XML資料的大小可能是XML檔案在上大小的一至四倍,這取決於載入資料的“標籤比重”和它在磁碟上是否已經是Unicode編碼格式的。以下是一個粗略的公式,用來估計給定的XML文件需要的記憶體空間大小:

ws = 32(n+t) + 12t + 50u + 2w;

The following table describes the parts of the formula:

下表介紹了公式中的各個部分:

Part
專案

Description
描述

ws

The working set in bytes.
工作集的大小(單位為位元組)

n

The number of element and attribute nodes in the tree. Each element, attribute, attribute value, and text content has one node (for example, text = four nodes).
樹中元素和屬性節點的數量。每一個元素,屬性,屬性的值和文字內容都有一個節點(例如,text 共四個節點)

t

The number of text nodes.
文字節點的數量

u

The number of unique element and attribute names.
元素和屬性的唯一名數量。

w

The number of Unicode characters in text content (including attribute values). Note that loading single-byte ANSI text into memory results in twice the number, because all text is stored as Unicode characters, which are two bytes each.
文字內容中Unicode字元的數量(包括屬性值)。注意,將單位元組的ANSI文字載入記憶體後會佔用兩倍的空間大小,因為它們會以Unicode字元,每個字元佔用兩個位元組。

This assumes you do not set the preserveWhiteSpace flag; when you do, more nodes are created to preserve the white space between elements, using more memory.

以上公式是基於沒有設定preserveWhiteSpace標誌的情況;當你設定該標誌時,會建立更多的節點來保留元素之間的空格,這樣就會佔用更多的記憶體空間。

For the sample data above, we see the following working set numbers (not including the initial startup working set):

對於前述的樣品檔案,以下表格顯示了所需的工作空間大小(不包括工作空間初始化時的工作空間):

Sample
樣品

Working set
工作空間

Ratio to file size
與磁碟檔案大小的比例

Ado.xml

4,689,920

2.16

Hamlet.xml

704,512

1.25

Ot.xml

10,720,000

1.39

Northwind.xml

249,856

0.51

An element-heavy XML document containing a lot of white space between elements and stored in Unicode can actually be smaller in memory than on disk. Files that have a more balanced ratio of elements to text content, such as Hamlet.xml and Ot.xml, end up at about 1.25 to 1.5 the UCS-2 file size when in memory. Files that are very data-dense, such as Ado.xml, end up more than twice the disk-file size when loaded into memory.

一個元素比重很大,在各元素之間有很多空格並且以Unicode格式儲存的XML文件可能在記憶體空間所需的空間比在磁碟上要少。而元素和文字內容比較平衡的文件,如Hamlet.xml和Ot.xml,可能在記憶體中所佔空間與在磁碟上以UCS-2格式佔用的空間大小比為1.25至1.5。而那些資料密集型的文件,就像Ado.xml那樣,佔用的記憶體空間可能會是在磁碟上大小的兩倍或者更多。

Megabytes Per Second

百兆位元組每秒

For the megabytes-per-second metric, I loaded each sample file 10 times in a l on a Pentium II 450-MHz dual-processor computer running 2000, measured the load times, and averaged the results.

對於百兆位元組每秒這個度量指標,我透過以下試驗來衡量載入時間:在Pentium II 450-MHz雙,執行的上,將每個樣品檔案迴圈載入10次,得到載入時間,並進行平均,結果如下表所示:

Sample
樣品

Load time (milliseconds)
載入時間(單位:毫秒)

MB/second
MB/

Nodes/second
節點/

Ado.xml

677

3.2

184,909

Hamlet.xml

104

5.3

116,432

Ot.xml

1063

7.2

111,682

Northwind.xml

62

7.8

103,887

Also shown in this table is a measure of nodes per second. Notice how this correlates with megabytes per second. The more nodes processed per buffer of input data, the slower the absolute throughput. Conversely, the more compact the nodes are (as in Ado.xml), the higher the nodes per second.

在上面的表格中還顯示了節點/秒的測試結果。請注意它與百兆位元組每秒之間的關係。每個輸入資料的緩衝區中節點數量越多,輸出的絕對量就越少。相反,節點越緊湊(就像Ado.xml那樣),每秒處理的節點數就越多。

Attributes vs. Elements
屬性與元素

You could conclude from this that attribute-heavy formats (such as that of Ado.xml) deliver more data per second than element-heavy formats. But this should not be the reason for you to switch everything to attributes. There are many other factors to consider in the decision to use attributes versus elements.

你可以從上面得到結論:屬性比重大的格式(就像Ado.xml那樣)比元素比重大的格式每秒傳遞的資料量更大。但是這並不是要你將所有的東西都用屬性來表達。在考慮使用元素還是屬性時,還有很多其他的因素要斟酌。


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/10752043/viewspace-991663/,如需轉載,請註明出處,否則將追究法律責任。

相關文章