Inside MSXML Performance(MSXML效能分析) (6) (轉)

worldblog發表於2007-12-12
Inside MSXML Performance(MSXML效能分析) (6) (轉)[@more@]

Validation:namespace prefix = o ns = "urn:schemas--com::office" />

驗證

Validation compares the types of elements in an XML document against a Document Type Definition (DTD) or XML Schema. For example, the DTD may say that all "Customer" elements must contain a child "Name" element. Take a look at the DTD for Hamlet.xml (">hamletdtd.htm) and the XML Schema for Hamlet.xml ().

驗證是指按照文件型別定義(DTD)或者XML Schema來檢查XML文件中的元素型別。例如,DTD中規定所有“Customer”元素必須包含一個“Name”子元素。Hamlet.xml的DTD()和Hamlet.xml的XML Schema()script class=msocomanchor id=_anchor_1 onmouseover="msoCommentShow('_anchor_1','_com_1')" onmouseout="msoCommentH('_com_1')" href="" name=_mnchor_1>[SL1] 。

Validation is another huge area for performance analysis, but I only have time for a brief mention today. Validation is expensive for several reasons. First, it involves loading a separate file (the DTD or XML Schema) and compiling it. Second, it requires state machinery for perfong the validation itself. Third, when the schema also includes information about data types, any data types also have to be validated. For example, if an XML element or attribute is typed as an integer, that text has to be parsed to see if it is a valid integer.

驗證是分析的另一大領域,但是這裡只有一個比較簡單的討論。由於很多原因,驗證的代價是很大的。首先,它牽涉到另一個單獨的(DTD或者XML Schema)需要載入。第二,它需要狀態機(state machinery)配合進行驗證。第三,如果Schema包含了資料型別的資訊,那麼所有資料型別都必須經過驗證。例如,如果一個XML元素或型別被定為整型,那麼相應的文字必須經過解析來檢視它是否是一個合法的整型。

The following table shows the difference between loading without validation, with DTD validation, and with XML Schema validation.

下表中顯示了載入時沒有驗證,有DTD驗證和有XML Schema驗證的不同情況:

Sample
樣本

Load (milliseconds)
載入(毫秒)

DTD (milliseconds)
DTD
(毫秒)

Schema (milliseconds)
Schema
(毫秒)

Schema plus datatypes (milliseconds)
Schema
並有資料型別檢驗(毫秒)

Ado.xml

662

2,230

2,167

3064

Hamlet.xml

106

215

220

N/A

Ot.xml

1,069

2,168

2,193

N/A

Northwind.xml

64

123

127

N/A

The bottom line is to expect validation to double or triple the time it takes to load your documents. New to MSXML January 2000 Release is a SchemaCollection , which allows you to load the XML Schema once and then share it across your documents for validation. This will be discussed in a future article.

最起碼,驗證可能會使載入文件的時間增加兩倍或三倍。MSXML January 2000 Web Release中新增加了SchemaCollection,它能夠使得XML Schema只需載入一次,並能在各文件驗證時共享。這將在以後的文章中討論。

XSL can be a big performance win over using code for generating "tranormed" reports from an XML document. For example, suppose you wanted to print out all the speeches by Hamlet in the sample Hamlet.xml. You might use Nodes to find all the speeches by Hamlet, then use another selectNodes call to iterate through the lines of each of those speeches, as follows:

XSL在效能上大大優於使用DOM程式碼去轉化XML文件。例如,假設你想要列印出Hamlet.xml中哈姆雷特所有的話。你可能會用selectNodes來查詢所有哈姆雷特的話,然後使用另一個selectNodes來查詢這些話中的每一行,程式碼如下:

function Method1(doc)

{

  var speeches = doc.selectNodes("/PLAY/ACT/SCENE/SPEECH[SPEAKER='HAMLET']");

  var s = speeches.nextNode();

  var out = "";

  while (s)

  {

  var lines = s.selectNodes("LINE");

  var line = lines.nextNode();

  while (line)

  {

  out += line.text;

  line = lines.nextNode();

  }

  out += "


";

  s = speeches.nextNode();

  }

  return out;

}

This works, but it takes about 1,500 milliseconds. A better way to tackle this problem is to use XSL. The following XSL style sheet (or template) does exactly the same thing:

這能夠達到目的,但是會花大概1,500毫秒。一個更好的處理這個問題的方式是使用XSL。以下的XSL樣式表(或者模板)可以完成同樣的任務:

 

 

 

 

 


 

You can then write the following simpler script code that uses this template:

你可以使用該模板寫以下簡單的指令碼程式碼:

function Method2(doc)

{

  var xsl = new Object("Microsoft.XMLDOM");

  xsl.async = false;

  xsl.load("hamlet.xsl");

  return doc.transformNode(xsl)

}

This takes only 203 milliseconds—it is more than seven times faster. This is a rather compelling reason to use XSL. In addition, it is easier to update the XSL template than it is to rewrite your code every time you want to get a different report.

這隻需203毫秒就可以了——比前面的方法快7倍以上。這也是為什麼要使用XSL的有力理由。而且,如果你想要得到不同的報告,改寫XSL模板比改寫你的程式碼要容易得多。

The problem is that XSL is very powerful. You have a lot of rope with which to hang yourself, so to speak. XSL has a rich expression language that can be used to walk all over the document in any order. It is highly recursive, and the MSXML parser includes script support for added extensibility. Using all these features with reckless abandon will result in slow XSL style sheets. The following sections describe a few specific traps to watch out for.

問題是XSL太強大了。所以你可以用很多方法來處理問題。XSL有很豐富的表達語言讓你以任何次序來遍歷文件。它是高度遞迴的,而且MSXML解析器增加了對擴充套件性的指令碼支援。濫用這些功能會導致很低的XSL樣式表。以下幾個部分會討論一些必須注意的陷阱。

Scripting

指令碼

It is convenient to call script from within an XSL style sheet, and it is a great extensibility mechanism. But as always, there is a catch. Script code is slow. For purposes of illustration, imagine that we wrote the following style sheet instead of the one shown previously:

在XSL樣式表中可以很方便的指令碼,這提供了很好的擴充套件效能。但是它總是帶來效能上的損失。指令碼程式碼的速度比較慢。為了說明這一點,我們改寫前面的樣式表如下:

 

  this.text

 


 

 

This produces the same result, but it takes 266 milliseconds instead of 203 milliseconds—a whop 23 percent slower. The more frequently your xsl:eval statements are executed, the slower the performance becomes. For purposes of illustration only, lets move the xsl:eval inside the inner for-each l:

這產生相同的結果,但執行需要266毫秒而不是203毫秒了,慢了整整23%。你越經常執行xsl:eval語句,效能下降就越明顯。為了說明這一點,將xsl:eval移到內層for-each迴圈中:

 

  this.text

 

This one takes 516 milliseconds, more than twice as slow. The bottom line is to be careful with script code in XSL.

這個程式碼的執行速度為516毫秒,比原先慢了2倍。所以,你應該對XSL中的指令碼程式碼小心使用。

The Dreaded "//" Operator

令人擔心的“//”運算子

Watch out for the "//" operator. This little operator walks the entire subtree looking for matches. Developers use it more than they should just because they are too lazy to type in the full path. (I catch myself using it all the time, too.) For example, try switching the select statement in the previous example to the following:

小心“//”運算子。這個小小的運算子會遍歷整個子樹來進行查詢匹配。開發者經常在不必要的情況下使用它,只是因為他們懶得打入完整路徑。(我發現我也總是使用它。)例如,將前面例中的select語句改寫如下:

 

The time it takes to perfothe selection jumps from 203 milliseconds to 234 milliseconds. My laziness just cost me a 15 percent tax.

這次,它的執行時間從203毫秒升至234毫秒。我的懶惰造成了15%的損失。

Prune the Search Tree

精簡查詢樹

If there's anything you can do to "prune" the search tree, by all means do it. For example, suppose you were reporting all speeches by Bernardo from Hamlet.xml. All Bernardo's speeches happen to be in Act I. If you already knew this, you could skthe entire search of Act II through Act V. The following shows what the new select statement would look like:

如果你有任何方法可以“精簡”查詢樹,那就盡力去做。例如,假設你想查詢Hamle.xml中所有Bernardo的話。而所有他的話都在第一幕中。如果你已經知道這一點了,你就應該跳過查詢第二至第四幕。以下是新的select語句:

select="/PLAY/ACT[TITLE='ACT I']/SCENE/SPEECH[SPEAKER='BERNARDO']"

This chops the time down from 141 milliseconds to 125 milliseconds, a healthy 11 percent improvement.

這使得執行時間從141毫秒降低到125毫秒,整整提高了11%效能。

Cross-Threading Models

跨執行緒

Before, the transformNode and transformNodeToObject methods required that the threading model of the style sheet and that of the document being transformed be the same. In the MSXML January 2000 Web Release, you can use free-threaded style sheets on rental documents and vice versa. This means you can get the performance benefit of using rental documents at the same time as the performance win of sharing free-threaded style sheets across threads.

以前,transformNodetransformNodeToObject方法要求樣式表和被轉換文件的執行緒模式必須相同。在MSXML January 2000 Web Release中,你可以在租用模式的文件上使用自由執行緒的樣式表,也可以反過來。這意味著你可以在得到租用文件的效能優勢的同時享受自由執行緒模式的樣式表在各執行緒之中共享的效能提升。

Conclusion


 Since the link is not available, we can omit this sentence


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/10752043/viewspace-991690/,如需轉載,請註明出處,否則將追究法律責任。

相關文章