XML進階:Level 1 – XML簡介

幾何魔君阿基米德發表於2017-09-21

By Rob Sheldon, 2014/03/26 (首次發表於: 2012/09/20)

關於系列

本文屬於進階系列:XML進階

自2003年以來,XML一直是SQL標準的一部分,許多動態管理檢視也返回XML資料,所以,對於每一個資料庫管理員而言,同樣是至關重要的。現在,這個行業更多地用於由文件標記定義的資料,資料庫開發人員和資料庫管理員比以往任何時候都更瞭解XML技術和使用它的方式。 在這一系列文章中,羅伯特·謝爾頓(Robert Sheldon)將盡力把複雜的東西簡單化。 注意:這一系列的文章已經可作為電子書獲取。


可擴充套件標記語言(XML)首先在SQL Server的SQL Server 2000發行版中提供支援。然而,與XML相關的功能僅限於專注於對映關係資料和XML資料的資料管理功能。例如,SQL Server 2000中新增了FOR XML子句,可以將關係查詢結果作為XML返回。

但是到新增了XML資料型別的SQL Server 2005釋出時,支援XML就變得有意義起來。 XML資料型別允許將XML文件本身儲存在使用該型別配置的列和變數中。資料型別還支援一組可用於檢索和修改XML文件中特定元件的方法。

為了充分利用SQL Server支援的與XML相關的功能,你可能會發現對XML本身有一個基本的瞭解是很有用的。為此,XML進階系列的第一部分將介紹XML是什麼,並對構成XML文件的各種元件進行介紹。

XML概述

和超文字標記語言(HTML)相似,XML也是一種標記語言,它使用標籤來描述和這些標籤相關聯的資料的性質。使XML可擴充套件的是它的自描述性質,也就是建立特定於XML文件中包含的資料值的標籤。在HTML中,這些標籤是預定義的。 (通過XML元件,XML的可擴充套件性將變得更加清晰。)

儘管XML具有可擴充套件性,但它仍然是一種標準化語言,必須符合由全球資訊網聯盟(W3C)定義的特定格式規則集。由於這種標準化與用於顯示資料的HTML不同,這種語言被廣泛地用於傳輸和儲存資料。

XML可以輕鬆地在異構系統之間共享資料,而不管硬體,作業系統或應用程式型別如何,XML的普遍應用意味著資料可以更少的進行人為干預處理。同時可以控制資料的描述、排列和顯示方式。

XML 元件

構成XML文件的主要元件和支配這些元件使用的規則通常非常簡單,但為了使XML文件能夠被SQL Server XML解析器正確處理,你必須嚴格遵守一些規則。

XML文件中主要包含兩種型別的資訊:要儲存的資料和描述資料的標籤。 標籤由一組尖括號(<>)組成,它們包含描述與標籤相關聯的資料的描述性詞或複合詞(無空格)。 正是由於這些標籤的自描述性質,XML通常被認為是元語言。

每個離散的儲存資料片段都被封裝在開始標籤和結束標籤中,如下例所示:

<Person>John Doe</Person>

在這種情況下,開始標籤是<Person>,結束標籤是</ Person>。 請注意,正斜槓位於結束標記中的標記描述之前。 正斜槓必須在所有結束標籤之前,但標籤的語言必須與開頭標籤相同,上面的示例是Person。 我可以選擇一個除Person之外的名稱,包括與人無關的名稱,但是一個好的做法是始終提供最能描述開啟和關閉標籤中包含的資料的標籤名稱。 在這種情況下,標籤描述了一個人的名字John Doe,因此標籤名稱為<Person>

同樣的,標籤和封閉的資料代表一個單一的元素。 但是,元素並不總是包含資料。 一個空的元素可以以兩種方式之一呈現。 第一個是通過指定開啟和關閉標籤,但不包括資料,如下例所示:

<Person></Person> 

表示一個空白元素的另一種方法是隻使用一組括號,但仍包括正斜槓:

<Person />

再者,只有當元素不包含值時,才能使用這個方法。 以後你在XML進階系列中可能看到模式需要一個沒有值的元素。 在這種情況下,可以使用縮寫格式來表示元素的兩個標籤。

無論元素是否包含值,只要使用兩個標籤,開始和關閉標籤必須完全匹配,精確到大小寫(除了結束標籤中的正斜槓)。 例如,以下元素在SQL Server XML解析器中生成錯誤,因為兩個標籤不能匹配:

<person> John Doe </ Person>

開始標籤中的描述性文字都是小寫的; 然而,結束標籤中的描述性詞應以大寫字母開頭。 開始和結束標籤必須匹配,以被認為是適當的或者格式良好的XML。

但是你可以將元素嵌入到彼此中。 在以下示例中,我將<Person>元素的兩個例項嵌入到<People>元素中:

<People>
<Person>John Doe</Person>
<Person>Jane Doe</Person>
</People>

請注意,每個<Person>元素本身都是完整的。 它包括開啟和關閉標籤及其附帶的資料。 嵌入其他元素中的元素稱為子元素( child),或在某些情況下稱為subelements。 外部元素(在這種情況下為<People>)是父元素。 XML文件最高階別的父元素被認為是根元素。 所有XML文件必須有且只有一個根元素。 因此,上面示例中的<People>元素是兩個<Person>元素的父元素,它是XML文件的根元素。

SQL Server還允許將XML片段儲存在XML列或變數中。 片段是沒有根元素的XML程式碼塊,例如以下示例中顯示的兩個元素:

<Person>John Doe</Person>
<Place>Seattle, WA</Place>

元素必須仍然是格式良好的XML,也就是具有包含資料的匹配標籤,但它們不一定是XML文件。 像以後你將在XML進階系列中看到的那樣,你可以指定只允許在XML列或變數中使用XML文件,但現在只要知道SQL Server可以區分和儲存XML文件和片段即可。

在其他元素中嵌入元素時,必須確保子元素在完成父元素之前已經完成。 例如,在以下示例中,我在<Person>元素之前結束了<People>元素,這會導致SQL Server XML解析器生成錯誤:

<People><Person>John Doe</People></Person>

你必須確保無論多少級別包含嵌入元素,你的子元素都是完整的。 在下面的示例中,<FirstName><LastName>元素嵌入在每個<Person>元素中,<Person>元素嵌入在<People>元素中:

<People>
<Person>
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person>
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>

在這種情況下,<Person>元素充當子元素和父元素。 但是請注意,每個嵌入的元素,無論級別如何,必須落在父元素的開始和結束標記之內。 例如,<FirstName><LastName>元素的第一個例項完全落在<Person>元素的第一個例項中,並且<Person>元素的兩個例項完全落在<People>元素中,即 文件的根元素。

元素也可以具有與它們相關聯的屬性。 屬性是可以賦值的屬性。 該屬性被定義為元素的開始標籤的一部分。 在以下示例中,我已經將id屬性新增到<Person>元素的每個例項中:

<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>

如示例所示,屬性由屬性名稱(在這種情況下為id),後跟等號,屬性值由雙引號括起來。因此,<Person>元素的第一個例項的id屬性的值為1234,而<Person>元素的第二個例項的id屬性的值為5678。

許多XML文件中包含的另一個元件是宣告,它至少指定了文件符合的XML標準的版本。到目前為止,只有兩個版本:1.0和1.1。如果使用XML 1.0,宣告是不必要的;但是XML 1.1需要宣告。因此,你應該知道如何在XML文件中包含宣告。

如果你包含宣告,則必須將其放在文件的開頭,用<?開啟標籤,並使用?>關閉標籤結束。另外,你必須包含xml關鍵字(小寫)和版本屬性(也是小寫)。通常包括的另一個屬性(儘管是可選的)是編碼,它指定用於XML文件的字元編碼。在下面的示例中,我包括一個宣告,它指定版本1.0和UTF-8的編碼,這意味著資料被儲存為8位Unicode字元序列:

<?xml version="1.0" encoding="UTF-8"?>
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>

您還可以向XML文件新增註釋。 要做到這一點,只需在<! - 標籤之前註釋,並將其結束於 - >標籤,就像我在下面的例子中所做的那樣:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>

正如你所看到的那樣,我已經新增了評論當前客戶端的列表,附加在註釋標籤中。 SQL Server XML解析器將忽略標籤內的任何內容,因此你不僅可以使用註釋功能提供有關XML文件及其資料的資訊,還可以掛起保留不想作為文件進行處理的XML程式碼部分。

另一個使用XML時要考慮的是,當元素值出現時,某些字元將無法解析。 例如,您不能在元素的值中包含&符(&),就像我在以下示例中的<FavoriteBook>子元素中所做的那樣:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
<FavoriteBook>Crime & Punishment</FavoriteBook>
</Person>
</People>

如果我嘗試將此XML文件分配給XML列或變數,則<FavoriteBook>子元素將導致解析器生成錯誤,因為“Crime & Punishment”值包含&符號。 你必須使用實體引用替換此型別的字元,該實體引用會告知解析器保留原始打算的字元。 實體引用以&符號開頭,以分號結尾,其間包含代表原始值的多字元程式碼。 對於&符號,實體參考應該是&amp;,我將在下面的例子中使用:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
<FavoriteBook>Crime &amp; Punishment</FavoriteBook>
</Person>
</People>

請注意,我已經用了&amp;實體參考。現在XML解析器處理<FavoriteBook>元素將沒有任何問題。但是請注意,&符號不是唯一會產生錯誤的字元。 XML標準標識了應該用實體引用替換的五個字元,就像我在上面的例子中所做的那樣:

  • 小於(<):替換為&lt;
  • 大於(>):替換為&gt;
  • 和號(&):替換&amp;
  • 撇號(`):替換為`
  • 引號(“):替換為”

該示例提出的另一個問題是子元素不一定要從一個父例項到另一個父例項相同。你可以看到,<Person>元素的第一個例項只包含<FirstName><LastName>子元素,但<Person>元素的第二個例項包含<FirstName><LastName>子元素以及<FavoriteBook>元素。只要你的子元素形成良好,就可以包括描述和定義資料所需的任何元素。

總結

在這個級別,我們已經看到組成一個XML文件需要的主要元件。 元素作為所有XML文件的基本構建塊,每個元素由開始標籤和結束標籤劃定,資料值本身被包含在這兩個標籤之間。 元素可以彼此嵌入,但是一個元素 – 根 – 必須作為XML文件中所有其他元素的父元素。 元素還可以包括被定義為元素的開始標籤的一部分的屬性。

我們已經儘可能方便地瞭解瞭如何組織一個XML文件,但本課程的目的並不是培養如何建立這些型別的文件,而是提供XML的介紹以便大家能更有效地在SQL Server中使用XML工作 。 在下一節,我們將介紹如何在SQL Server中實現XML資料型別,以及如何將其分配給列和變數以便儲存XML文件和XML片段。


原文

Stairway to XML: Level 1 – Introduction to XML

By Rob Sheldon, 2014/03/26 (first published: 2012/09/20)

The Series

This article is part of the Stairway Series: Stairway to >XML

XML has been part of the SQL Standard since 2003, and it is also essential for any DBA because so many of the dynamic management views return XML data. Now that the industry is more used to data defined by document markup, it is becoming more important than ever for Database Developers and DBAs to understand the technology and to know where it makes sense to use XML. In this series of articles, Robert Sheldon flexes his talent to make the complicated seem simple. Note: This series of articles is now available as an eBook.


Support for the eXtensible Markup Language (XML) was first introduced in SQL Server with the release of SQL Server 2000. However, XML-related features were limited to data management capabilities that focused on mapping relational and XML data. For example, SQL Server 2000 added the FOR XML clause, which lets you return relational query results as XML.

However, it wasn’t until the release of SQL Server 2005—when the XML data type was added—that support for XML got interesting. The XML data type lets you natively store XML documents in columns and variables configured with that type. The data type also supports a set of methods you can use to retrieve and modify specific components within the XML document.

To take full advantage of the XML-related features supported in SQL Server, you might find it useful to have a fundamental understanding of XML itself. To that end, this first Level of the Stairway to XML series explains what XML is and describes the various components that make up an XML document.

An Overview of XML

Similar to the HyperText Markup Language (HTML), XML is a markup language that uses tags to delineate and describe the nature of the data associated with those tags. What makes XML extensible is its self-describing nature, that is, you create tags that are specific to the data values contained in the XML document. In HTML, those tags are pre-defined. (XML’s extensible nature will become clearer as we work through the XML components.)

Despite its extensibility, XML is still a standardized language that must conform to a specific set of formatting rules, as defined by the World Wide Web Consortium (W3C). Because of this standardization, the language has been widely adopted in order to transport and store data, unlike HTML, which is used to display data. XML makes it possible to easily share data among heterogeneous systems, regardless of hardware, operating system, or application type, and XML’s universal adoption means that data can be processed with little human intervention. At the same time, you can control how the data is described, while also controlling how the data is ordered and displayed.

XML Components

The primary components that make up an XML document—and the rules that govern the use of >those components—are generally very straightforward, but you must adhere strictly to these rules in order for an XML document to be properly processed by the SQL Server XML parser.

There are primarily two types of information included in an XML document: the data to be stored and the tags that describe the data. A tag is made up of a set of angle brackets (< >) that enclose a descriptive word or compound word (no spaces) that describes the data associated with the tag. It’s because of the self-describing nature of these tags that XML is often considered a meta-language.

Each discrete piece of stored data is enclosed in an opening tag and a closing tag, as shown in the following example:

<Person>John Doe</Person>

In this case, the opening tag is <Person>, and the closing tag is </Person>. Notice that a forward slash precedes the tag description in the end tag. A forward slash must precede all end tags, but the language of the tag must be the same as the opening tag, which in the example above is Person. I could have chosen a name other than Person, including a name that has nothing to do with people, but a good practice is to always provide tag names that best describe the data enclosed in the opening and closing tags. In this case, the tags are describing the name of a person, John Doe, thus the tag name <Person>.

Together, the tags and enclosed data represent a single element. However, an element does not always have to contain data. An empty element can be rendered in one of two ways. The first is by specifying the opening and closing tags, but including no data, as I do in the following example:

<Person></Person> 

Another way to represent an empty element is use only one set of brackets, but still include the forward slash:

<Person />

Again, this method can be used only when an element contains no value. As you’ll see later in the Stairway to XML series, a schema might require an element for which there is no value. In that case, you can use the shortened format to represent the both tags of the element.

Whether or not an element contains a value, whenever both tags are used, the opening and closing tags must match exactly, down to the capitalization (except for the forward slash in the closing tag). For instance, the following element generates an error in the SQL Server XML parser because the case is different between the two tags:

<person>John Doe</Person>  

The descriptive word in the opening tag is all lowercase; however, the descriptive word in the closing tag begins with a capital letter. The opening and closing tags must match to be considered proper, or well formed, XML.

You can, however, embed elements within each other. In the following example, I embed two instances of the <Person> element within the <People> element:

<People>
<Person>John Doe</Person>
<Person>Jane Doe</Person>
</People>  

Notice that each <Person> element is complete in itself. It includes the opening and closing tags and the data they enclose. Elements embedded in other elements are referred to as child elements or, in some cases, subelements. The outer element, in this case, <People>, is the parent element. The parent element at the highest level of an XML document is considered the root element. All XML documents must have one, and only one, root element. So the <People> element in the example above is the parent element to the two <Person> elements, and it is the root element for the XML document.

SQL Server also permits you to store XML fragments in an XML column or variable. A fragment is a chunk of XML code without a root element, such as the two elements shown in the following example:

<Person>John Doe</Person>
<Place>Seattle, WA</Place>  

The elements must still be well formed XML, that is, have matching tags that enclose the data, but they don’t have to be an XML document. As you’ll see later in the Stairway to XML series, you can specify that only XML documents be permitted in an XML column or variable, but for now just know that SQL Server distinguishes between XML documents and fragments and can store both.

When you embed elements within other elements, you must ensure that the child elements are complete before you end the parent element. For instance, in the following example, I end the <People> element before the <Person> element, which causes the SQL Server XML parser to generate an error:

<People><Person>John Doe</People></Person>  

You must ensure that your child elements are complete no matter how many levels contain embedded elements. In this following example, the <FirstName> and <LastName> elements are embedded in each <Person> element, and the <Person> elements are embedded in the <People> element:

<People>
<Person>
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person>
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>  

In this case, the <Person> elements act as both child and parent elements. Notice, however, that each embedded element, regardless of the level, falls completely within the opening and closing tags of the parent element. For example, the first instances of the <FirstName> and <LastName> elements fall completely within the first instance of the <Person> element, and the two instances of the <Person> elements fall completely within the <People> element, which is the document’s root element.

Elements can also have attributes associated with them. An attribute is a property to which you can assign a value. The attribute is defined as part of the element’s opening tag. In the following example, I’ve added the id attribute to each instance of the <Person> element:

<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People> 

As the example demonstrates, an attribute consists of the attribute name (in this case, id), followed by an equal sign and the attribute value, enclosed in double quotes. So the id attribute for the first instance of the <Person> element has a value of 1234, and the id attribute for the second instance of the <Person> element has a value of 5678.

Another component contained in many XML documents is the declaration, which at a minimum specifies the version of the XML standard that the document conforms to. To date, there are only two versions: 1.0 and 1.1. If using XML 1.0, the declaration is not necessary; however, XML 1.1 requires one. For that reason, you should be aware of how to include a declaration in your XML document.

If you include a declaration, you must place it at the beginning of the document, start the declaration with the <? opening tag, and end it with the ?> closing tag. In addition, you must include the xml keyword (lowercase) and the version attribute (also lowercase). Another attribute commonly included, although optional, is encoding, which specifies the character encoding used for the XML document. In the following example, I include a declaration that specifies version
1.0 and an encoding of UTF-8, which means the data is stored as a sequence of 8-bit Unicode characters:

<?xml version="1.0" encoding="UTF-8"?>
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>  

You can also add comments to your XML documents. To do so, simply precede the comment with the <!— tag and end it with the —> tag, as I’ve done in the following example:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
</Person>
</People>  

As you can see, I’ve added the comment A list of current clients, which I’ve enclosed in the comment tags. The SQL Server XML parser will ignore anything within the tags, so you can use the commenting feature not only to provide information about the XML document and its data, but also to preserve parts of the XML code that you want to hang on to but you don’t want to have processed as part of the document.

Another consideration when working with XML is that certain characters cannot be parsed when they appear in element values. For example, you cannot include an ampersand (&) in an element’s value, as I’ve done in the <FavoriteBook> child element in the following example:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
<FavoriteBook>Crime & Punishment</FavoriteBook>
</Person>
</People>  

If I try to assign this XML document to an XML column or variable, the <FavoriteBook> child element will cause the parser to generate an error because the value Crime & Punishment includes the ampersand. You must replace this type of character with an entity reference that tells the parser to preserve the character as it is originally intended. An entity reference begins with an ampersand and ends with a semi-colon and in between includes a multi-character code that represents the original value. For an ampersand, the entity reference should be &amp;, which I use in the following example:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A list of current clients -->
<People>
<Person id="1234">
<FirstName>John</FirstName>
<LastName>Doe</LastName>
</Person>
<Person id="5678">
<FirstName>Jane</FirstName>
<LastName>Doe</LastName>
<FavoriteBook>Crime &amp; Punishment</FavoriteBook>
</Person>
</People>

Notice that I’ve replaced the ampersand with the &amp; entity reference. Now the XML parser will handle the <FavoriteBook> element with no problem. But note that the ampersand is not the only character that will generate an error. The XML standard identifies five characters that should be replaced with entity references, as I’ve done in the above example:

  • Less than (<): replace with <
  • Greater than (>): replace with >
  • Ampersand (&): replace with &
  • Apostrophe (`): replace with `
  • Quotation mark (“): replace with “

Another issue that the example raises is the fact that the child elements do not have to be the same from one parent instance to the next. As you can see, the first instance of the <Person> element contains only the <FirstName> and <LastName> child elements, but the second instance of the <Person> element contains the <FirstName> and <LastName> child elements, as well as the <FavoriteBook> element. As long as your child elements are well formed, you can include whatever elements necessary to delineate and define your data.

Summary

In this Level, we’ve looked at the primary components that make up an XML document. Elements serve as the basic building blocks for all XML documents, with each element being delineated by an opening tag and a closing tag and the data value itself being enclosed between those two tags. Elements can be embedded within each other, but one element—the root—must act as the parent to all other elements in an XML document. An element can also include attributes, which are defined as part of an element’s opening tag.

As handy as it might be to know how to put together an XML document, the purpose of this Level has not been to train you in how to create these types of documents, but rather to provide an introduction to XML so you can more effectively work with XML in SQL Server. In the next Level, we’ll look at how the XML data type is implemented in SQL Server and how it can be assigned to columns and variables in order to store both XML documents and XML fragments.


相關文章