[譯] 最詳細的 CSS 字元轉義處理

西樓聽雨發表於2018-11-05

原文CSS character escape sequences
作者Mathias 發表時間:12th July 2010
譯者:西樓聽雨 發表時間: 2018/11/5 (轉載請註明出處)


檢視原文 When writing CSS for [markup with weird `class` or `id` attribute values](https://mathiasbynens.be/notes/html5-id-class), you need to [consider](https://www.w3.org/TR/CSS21/syndata.html#characters) some [rules](https://www.w3.org/International/questions/qa-escapes#cssescapes). For example, you can’t just use `## { color: #f00; }` to target the element with `id="#"`. Instead, you’ll have to escape the weird characters (in this case, the second `#`). Doing so will cancel the meaning of special CSS characters in identifiers and allows you to refer to characters you cannot easily type out, like crazy Unicode symbols.

There are some other cases where you might want or need to escape a character in CSS. You could be writing a selector for a funky id, class, attribute or attribute value, for example; or maybe you want to insert some weird characters using the content property without changing your CSS file’s character encoding.

當為一些 class 或 id 屬性值比較奇怪的標籤 編寫 CSS 樣式時,我們需要考慮到一些限制。例如:不能直接使用 ## { color: #foo; } 來匹配到 id="#" 這樣的元素;而是應該將這些怪異的字元進行轉義(這個例子中,指第二個“#”),這樣做就可以消除識別符號中包含的特殊 CSS 字元的涵義,還可以引用到不能簡單敲出來的字元,如令人抓狂的 Unicode 符號。

還有一些其他情況你可能或需要在 CSS 中使用轉義來對一個字元進行轉義。舉個例子:你可能會為一個有趣的 id 、class、屬性或屬性值寫一個選擇器;或者你想要在不改變 CSS 檔案的字元編碼的條件下使用 content 屬性表示式來插入一個奇怪的字元。

CSS 中的識別符號和字串

檢視原文 [The spec](http://dev.w3.org/csswg/css-syntax/#ident-token-diagram) defines *identifiers* using a token diagram. They may contain the symbols from `a` to `z`, from `A` to `Z`, from `0` to `9`, underscores (`_`), hyphens `-`, non-ASCII symbols or escape sequences for any symbol. They cannot start with a digit, or a hyphen (`-`) followed by a digit. Identifiers require at least one symbol (i.e. the empty string is not a valid identifier).

The grammar for identifiers is used for various things throughout the specification, including element names, class names, and IDs in selectors.

The spec definition for strings says that strings can either be written with double quotes or with single quotes. Double quotes cannot occur inside double quotes, unless escaped (e.g., as '\"' or as '\22'). The same goes for single quotes (e.g., "\'" or "\27"). A string cannot directly contain a newline. To include a newline in a string, use an escape sequence representing the line feed character (U+000A), such as "\A" or "\00000a". Newlines can also be represented as "\D \A " (CRLF), "\D " (i.e. \r in other languages), or "\C " (i.e. \f in other languages). It’s possible to break strings over several lines, for aesthetic or other reasons, but in such a case the newline itself has to be escaped with a backslash (\).

As you can see, character escapes are allowed in both identifiers and strings. So, let’s find out how these escape sequences work!

CSS 語法規範中用 token 圖定義了識別符號:可以包含 a 到 z 、A 到 Z、0 到 9、下劃線、連線符、非 ASCII 字元及針對任何字元的轉義序列;但不能以數字符號或者連線符號緊接一個數字符號開頭;且識別符號至少包含一個字元(即空字元是不正確的識別符號)。

識別符號的語法被這個規範的許多地方所引用,包括元素名稱,class 名稱,選擇器中的 id。

對於字串的定義,這個規範講到:可以用雙引號或者單引號來表達,但雙引號不能出現在雙引號內,除非對其進行轉義(例如,'\"''\22');同理,單引號也一樣;字串不能直接包含換行,需要使用轉義序列來表示換行符(U+000A),如 “\A” 或 “\00000a” ;換行也可以用 “\D \A” (CRLF) 、“\D ”(即其他語言中的 "\r")或 “\C ”(即其他語言中的 “\f”)來表示;為了美觀或其他原因的需要,字串可以拆分為幾行,但需要用反斜槓(\)對換行符本身進行轉義。

現在我們知道,字元的轉義在識別符號和字串中都支援,所以,下面我們來看一下這些轉義序列是如何用的。

CSS 中如何進行轉義

檢視原文 Here’s a ~~simple~~ list of rules you should keep in mind when escaping a character in CSS. Keep in mind that if you’re writing a selector for a given classname or ID, the strict syntax for identifiers applies. If you’re using a (quoted) string in CSS, you’ll only ever need to escape quotes or newline characters.

在 CSS 中對一個字元進行轉義時,你應該記住下文中的這些規則。如果對一個 class 或 id 寫一個選擇器,需要對其使用嚴格語法;如果要在 CSS 中使用字串(包含引號),你只需要對引號和換行符進行轉義。

開頭數字

If the first character of an identifier is numeric, you’ll need to escape it based on its Unicode code point. For example, the code point for the character 1 is U+0031, so you would escape it as \000031 or \31.

Basically, to escape any numeric character, just prefix it with \3 and append a space character (). Yay Unicode!

如果一個識別符號的第一個字元是數字時,需要用其 Unicode 碼來進行轉義。例如,1 的 Unicode 程式碼點為 U+0031,那麼就用 \000031 或者 \31  來轉義。

基本上,所有數字字元的轉義,都只需要在其之前附加 \3 及之後附加空格即可。

特殊字元

檢視原文 Any character that is not a hexadecimal digit, line feed, carriage return, or form feed can be escaped with a backslash to remove its special meaning.

The following characters have a special meaning in CSS: !, ", #, $, %, &, ', (, ), *, +, ,, -, ., /, :, ;, <, =, >, ?, @, [, \, ], ^, ```, {, |, }, and ~.

There are two options if you want to use them. Either you use the Unicode code point — for example, the plus sign (+) is U+002B, so if you would want to use it in a CSS selector, you would escape it into \2b(note the space character at the end) or \00002b (using exactly six hexadecimal digits).

The second option is far more elegant though: just escape the character using a backslash (\), e.g. + would escape into \+.

Theoretically, the : character can be escaped as \:, but IE < 8 doesn’t recognize that escape sequence correctly. A workaround is to use \3Ainstead.

任何不是 16 進位制的數字 (即 0到9 和 a-f ——譯註)、換行、回車、換頁的字元都可以通過反斜槓來消除它的特殊含義。

後面這些字元在 CSS 中是有特殊含義的:!, ", #, $, %, &, ', (, ), *, +, ,, -, ., /, :, ;, <, =, >, ?, @, [, \, ], ^, `, ,, {, |, }, and ~

如果你想使用這些字元,有兩種選擇:第一種,使用 Unicode 程式碼點,例如,加號 (+) 的程式碼點是 U+002B,則使用 \2b (注意結尾處的空格)或者 \00002b (使用完整的 6 位16進位制數字)來轉義。第二種則比較優雅一點,只需使用反斜槓 (\) 即可,例如,+ 使用 \+ 轉義。

理論上,: 可以使用 \: 來轉義,但 IE8 以下版本不能正確地識別出這個轉義序列,一個解決方案是轉而使用 \3A

空白字元

Whitespace — even some characters that are technically invalid in HTML attribute values — can be escaped as well.

Any characters matching [\t\n\v\f\r] need to be escaped based on their Unicode code points. The space character () can simply be backslashed (\). Other whitespace characters don’t need to be escaped.

空白字元——雖然有些字元從技術上講,在 HTML 屬性值裡是錯誤——也可以使用轉義。

能夠匹配 [\t\n\v\f\r] 的字元都需要根據 Unicode 碼來進行轉義;空格字元 () 僅需用反斜槓進行轉義("\ ");其他空白字元則不需要轉義。

下劃線

CSS doesn’t require you to escape underscores (_) but if it appears at the start of an identifier, I’d recommend doing it anyway to prevent IE6 from ignoring the rule altogether.

CSS 對下劃線 (_) 沒有轉義要求,但如果是出現在識別符號的開頭的話,我還是建議一定要做一下轉義,以此避免 IE6 把整體樣式規則都忽略。

其他 Unicode 字元

檢視原文 Other than that, characters that can’t possibly convey any meaning in CSS (e.g. `♥`) can and **should** just be used unescaped.

In theory (as per the spec), any character can be escaped based on its Unicode code point as explained above (e.g. for ?, the U+1D306 “tetragram for centre” symbol: \1d306or \01d306), but older WebKit browsers don’t support this syntax for characters outside the BMP (fixed in April 2012).

Because of browser bugs, there is another (non-standard) way to escape these characters, namely by breaking them up in UTF-16 code units (e.g. \d834\df06), but this syntax (rightfully) isn’t supported in Gecko andOpera 12.

Since there is currently no way to escape non-BMP symbols in a cross-browser fashion without breaking backwards compatibility with older browsers, it’s best to just use these characters unescaped.

除了上面說的這些字元外,其他沒有任何含義的字元都可以也應該保持不轉義。

從規則上講,所有字元都可以用其 Unicode 程式碼點來進行轉義——就像上面提到的那樣。(例如:四橫線? 的程式碼點為 U+1D306,可以用 \1d306 或者 \01d306 來轉義),但老的 Webkit 瀏覽器對於不在 BMP 平面(BMP 是 Unicode 規範所劃分的一種字元平面,包含最常用的字元。每個平面都有 65536 即 2 的 16 次方個字元。——譯註)之內的字元是不支援這種轉義的](older WebKit browsers don’t support this syntax for characters outside the BMP。(2012年4月已修復

因為瀏覽器存在的 bug,還有另外一種方式(非標準方式)來對這些 BMP 之外的字元進行轉義,即,將他們的 UTF-16 的程式碼點拆分開(如:\d834\df06),但這種語法不被 GeckoOpera 12 所支援。

由於目前沒有任何一種跨瀏覽器相容的方式來對非 BMP 平面的字元進行轉義,所以最好就是不要對其進行轉義。

十六進位制轉義序列的尾部空白字元

檢視原文 Any U+0020 space characters immediately following a hexadecimal escape sequence are automatically [consumed by the escape sequence](http://dev.w3.org/csswg/css-syntax/#consume-escaped-code-point). For example, to escape the text `foo © bar`, you would have to use `foo \A9 bar`, with two space characters following `\A9`. The first space character gets swallowed; only the second one is preserved.

The space character following a hexadecimal escape sequence can only be omitted if the next character is not another space character and not a hexadecimal digit. For example, foo©bar becomes foo\A9 bar, but foo©qux could be written as foo\A9qux.

緊接在十六進位制轉義序列之後的空格字元(U+0020)都會自動被視為這個轉義序列的一部分。例如,對文字foo © bar進行轉義,需要使用 foo \A9 bar 來完成,其中 \A9 之後需要用到兩個空格,第一個空格會被吸收掉,只有第二個空格才會被保留。

緊接在十六進位制轉義序列之後的空格字元只有在其下一個字元不是空格字元且不是十六進位制的數字字元的情況下,才可以省略。例如,foo©bar 對應 foo\A9 bar ,而 foo©qux 則可以寫成 foo\A9qux

示例

下面用一些隨便舉的例子來進行演示:

.\3A \`\( { } /* 匹配 class=":`(" 的元素 */
.\31 a2b3c { } /* 匹配 class="1a2b3c" 的元素 */
#\#fake-id {} /* 匹配 id="#fake-id" 的元素 */
#-a-b-c- {} /* 匹配 id="-a-b-c-" 的元素 */
#© { } /* 匹配 id="©" 的元素*/
複製程式碼

檢視更多,請點選為這篇貼文 (HMTL5 中的 @id 和 @class) 寫的 demo 頁面

… 那麼 JS 中的情況呢?

檢視原文 In JavaScript, it depends.

document.getElementById() and similar functions like document.getElementsByClassName() can just use the unescaped attribute value, the way it’s used in the HTML. Of course, you would have to escape any quotes so that you still end up with a valid JavaScript string.

On the other hand, if you were to use these selectors with the Selectors API (i.e. document.querySelector()and document.querySelectorAll()) or libraries that rely on the same syntax (e.g. jQuery/Sizzle), you would have to take the escaped CSS selectors and escape them again. All you really have to do is double every backslash in the CSS selector (and of course escape the quotes, where necessary):

對於 JavaScript 來說,要看情況來。

document.getElementById() 及類似的方法,如 document.getElementsByClassName() 直接使用未經轉義之前的屬性值即可 (指 HMTL 中所使用的那種轉義方式) 。當然,對於引號來說你仍然還需要進行轉義,以確保字串語法正確。

而如果你使用的 Selectors API (即 document.querySelector() 和 document.querySelectorAll() ) 或者使用的是依賴同樣語法的庫時,你必須使用轉義後的 CSS 選擇器,然後再進行轉義——你所需要做的就是把每個反斜線加倍(如果有需要的話,當然也包括引號的轉義)。

<!-- HTML -->
<p class=":`("></p>
複製程式碼
/* CSS */
.\3A \`\( { }
複製程式碼
/* JavaScript */
document.getElementsByClassName(':`(');
document.querySelectorAll('.\\3A \\`\\(');
複製程式碼

CSS 轉義工具

Remembering all these rules sure sounds like fun, but to make life a little easier I created a simple CSS escaper tool that does all the hard work for you.

這些規則確實比較有趣,但為了讓事情更簡單,我製作了[一個簡單地 CSS 轉義工具,它可以幫你完成這些複雜的工作](a simple CSS escaper tool that does all the hard work for you)。

CSS escapes tool

檢視原文 Just enter a value and it will tell you how to escape it in CSS and JavaScript, based on the rules above. It uses an `id` attribute in its example, but of course you could use the the same escaped string for `class` attribute values or the `content` property. Enjoy!

Need to escape text for use in CSS strings or identifiers? I’ve packaged the code that powers this tool as an open-source JavaScript library named cssesc. Check it out!

只需(在高亮的位置——譯註)輸入一個值,它就會告訴你如何基於以上所述規則在 CSS 和 JavaScript 中進行轉義。雖然這裡使用的是 id 屬性,但其實你也把獲取到的轉義後的字串用於 class 屬性值及 content 屬性。請享用!

沒有在 CSS 中對字串或識別符號進行轉義的需求?我把支撐這個工具的程式碼打包成了一個開源的 JavaScript 庫,名字叫做 cssesc,請檢視!

更新: CSS 物件模型規範現在已經定義了一個 CSS.escape()` 方法,該方法可以用來執行轉義。我製作了一個它的墊片庫

相關文章