Inside MSXML Performance(MSXML效能分析) (轉)

worldblog發表於2007-12-12
Inside MSXML Performance(MSXML效能分析) (轉)[@more@]

02212000>Ins MSXML Performancemicrosoft-com::office" />

MSXML分析

Chris Lovett
Microsoft Corporation

February 21, 2000

the code for this article (1.17MB)

本文中示例的

Contents

?frame=true#xml0221_topic1xml0221_topic1" target=_self>Metrics




Walk Working Set Delta

SingleNode


Free-Threaded Documents







目錄

度量指標
MSXML特點
工作空間
百兆位元組每秒
屬性與元素
第一次DOM樹遍歷引起的工作空間增量
提前createNode
遍歷與selectSingleNode
儲存
名字空間
自由執行緒文件
延時的釋放
虛擬記憶體
IDispatch
指令碼
令人擔心的“//”運算子
修剪查詢樹
交叉執行緒
小結

I definitely got the message from your online comments that we need more "novice-level" material and some real XML applications. However, this article was already in the pipeline-and is intended for the advanced XML developer. (After all, this column is called "Extreme XML"!) That said, this article assumes you are familiar with XML and the Microsoft XML Parser (MSXML) in particular. See the for more information.

我從網上很多評論中得知,大家需要更多的是級的資料和一些XML的實際應用舉例。但是,本文已經基本成稿並且針對的是高階XML開發人員(畢竟,本專欄的名稱叫“極限XML”!)。這就是說,本文的讀者應該是比較熟悉XML和Microsoft XML解析器的。要得到更多相關資訊,請查閱。

So, you're designing your XML-based application and you need to know what kind of performance to expect from your XML server. Obviously, this depends a lot on what processing you plan to do. It is hard to generalize, because there are so many variables—such as the size of the XML documents, the amount of script code required to process the documents, the amount of output generated, and so on.

因此,你可能正在設計基於XML的Web應用,而且你需要知道XML的工作效能到底怎樣。顯然,這是由同你的處理過程密切相關。這很難概括來說,因為有太多的因素可以影響它的效能——如XML文件的大小,處理文件所使用的指令碼程式碼的多少,產生輸出的多少等等。

For example, major variables that can affect the performance of MSXML include:

例如,主要影響MSXML效能的因素有:

·  The kind of XML data

·  The ratio of tags to text

·  The ratio of attributes to elements

·  The amount of discarded white space

·  XML資料的種類

·  標籤對文字的比例

·  屬性對元素的比例

·  可忽略的空格的數量

To illustrate some of these variables, I'll use four sample data files. Shown below is a snippet from each file to show you what each looks like:

為了說明各個因素,在此使用4個樣本資料。一下就是這些檔案中抽取的片段示例:

Ado.xml

This sample file is a persistently saved ADO Recordset —and is extremely attribute heavy. Each attribute value is short, with little wasted white space, making it a data-dense document.

這個樣本檔案被永久儲存的ADO Recordset,它充滿了屬性。每一個屬性的值很短,沒有什麼空格,是一個資料密集的文件。

  phone='408 286-2428' address='22 Cleveland Av. #14' city='San Jose' state='CA'

  zip='95128' contract='True' name='systypes' id='4' uid='1' type='S ' userstat='0'

  sysstat='113' indexdel='0' schema_ver='1' refdate='1900-01-01T00:00:00'

  crdate='1996-04-03T03:38:57.387000000' version='0' deltrig='0' instrig='0'

  updtrig='0' seltrig='0' category='0' cache='0'/>

Hamlet.xml

This sample file consists of Shakespeare's play "Hamlet." The file is a well -balanced combination of text and element markup, with no attributes.

這個檔案包含了莎士比亞的劇本“哈姆雷特”。它由文字和元素標籤組成,沒有任何屬性。

SCENE I.  Elsinore. A platfobefore the castle.

FRAN at his post. Enter to him BERNARDO

BERNARDO

Who's there?

Ot.xml

This sample file consists of the entire Old Testament. Each tag is only one or two characters, which reduces the tag-to-text ratio.

這個檔案包含了整本舊約全書。每個標籤只有一到兩個字元,降低了標籤對文字的比例

The First Book of Moses, Called GENESIS.

Genesis

Chapter 1

1

In the beginning God created the heaven and the earth.

...

Northwind.xml

This sample file contains a portion of the Northwind database that ships with Microsoft Access. It uses elements instead of attributes, and has a high tag-to-text ratio, and has a lot of extra white space.

本樣品包含了Microsoft Access附帶的Northwind的一部分。它使用元素而不是屬性,有很高的標籤對文字比例,還有很多多餘的空格。

 

  10326

  11/10/94

  C/ Araquil, 67

 

...

Another major factor is whether the original file is stored as UCS-2. For most XML documents in English, UTF-8 is half the size of UCS-2 because the Latin characters compress down to a single byte in UTF-8. But this is not true for all languages. For some Asian languages, UTF-8 is actually larger than UCS-2, because it can expand to three bytes per character in the worst case. To be fair, the best format to use for measuring performance is UCS-2 on disk so that the numbers are more globally meaningful.

另一個主要因素是檔案是否以UCS-2格式編碼。由於大多數XML文件是英文的,UTF-8的大小是UCS-2的一半,因為拉丁字元在UTF-8中到了一個位元組。但是在對於其他語言來說並不一樣。比如,對於一些亞洲語言,UTF-8比UCS-2更大,因為在最壞情況下它將每個字元擴充套件到三個位元組。為了公正起見,度量效能的最好格式應該是UCS-2,這樣更適應全球化的情況。

The following table shows the UCS-2 file sizes, number of unique names, number of elements and attributes, number of text nodes, and amount of text content (in Unicode characters) for each of our sample files. It also shows a "tagginess factor," which is the ratio of element and attribute name characters to the rest of the file.

下表顯示了四個樣品檔案的UCS-2檔案大小,唯一名的數量,元素和屬性的數量,文字節點的數量和文字內容的數量(Unicode字元)。它還顯示了標籤比重,表示元素和屬性名字元對檔案中其他字元的比例。

Sample
樣品

File size
檔案大小

Unique names
唯一名

Elements and attributes元素和屬性

Text nodes
文位元組點

Text content (characters)
文字內容(字元數)

Tagginess (percentage)
標籤比重(百分比)

Ado.xml

2,171,812

53

63,722

61,462

3890

18.7

Hamlet.xml

559,260

17

6637

5472

170,545

5.9

Ot.xml

7,663,624

12

71,417

47,302

3,236,900

1.4

Northwind.xml

488,140

12

3680

2761

31,155

6.0

The number of unique names is interesting because MSXML "atomizes" element and attribute names, meaning it creates only one string object for each unique name and points to that object from each element or attribute that shares the same name. This is important because the names of elements and attributes are typically highly repetitive. For example, the Ado.xml sample actually contains 63,722 element and attribute names, which consume a total of 407,148 bytes of the overall file size. This is a tag-to-file size ratio of over 18 percent! But out of all these names remain only 53 unique names. So instead of using 407 KB of memory to store them, they can be stored in just a few kilobytes.

唯一名數量很有趣,因為MSXML“原子化”了元素和屬性的名字,這意味著它對於每個唯一名只建立一個字串物件,指向有相同名字的元素和屬性。這很重要,因為元素和屬性名通常重複性很高。例如,在Ado.xml樣本檔案中,實際有63,722個元素和屬性名,在整個檔案中佔了407,148位元組。這裡的標籤對檔案的比例超過了18%!但是這些名字中只有53個唯一名。所以不必用407KB的記憶體來了,只需要很少的記憶體就夠了。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/10752043/viewspace-991660/,如需轉載,請註明出處,否則將追究法律責任。

相關文章