在RFC1738中,對於URL可以使用的字符集做了如下規定:
“
只有0-9a-zA-Z的字母以及$-_.+!*'(),"這幾個特殊字元
”
而在html4中擴充套件了所有的unicode character set能夠在url中使用。
那麼到底有哪些字元需要encoded呢?
1. ascii control characters
原因是:他們不可列印,
字元範圍iso-8859-1的00-1F 以及7F
2. non-ascii characters:
原因:這些字元因為不在ascii集合中不被認為在url中是合法的
字元範圍: iso-latin的80-FF範圍
3. reserved characters:
原因:URL使用部分預留的字元來定義url的語法。當這些字元在url中不被當作其特殊角色時,他們必須被encoded
字元範圍: $, &,+, , /,:,;,=,?,@
Character | Code Points (Hex) | Code Points (Dec) |
---|---|---|
Dollar ("$") Ampersand ("&") Plus ("+") Comma (",") Forward slash/Virgule ("/") Colon (":") Semi-colon (";") Equals ("=") Question mark ("?") 'At' symbol ("@") |
24 26 2B 2C 2F 3A 3B 3D 3F 40 |
36 38 43 44 47 58 59 61 63 64 |
4.unsafe characters
原因: 部分字元如果在url中可能導致歧義。這些字元也必須被encoded:
Character | Code Points (Hex) | Code Points (Dec) | Why encode? |
---|---|---|---|
Space | 20 | 32 | Significant sequences of spaces may be lost in some uses (especially multiple spaces) |
Quotation marks 'Less Than' symbol ("<") 'Greater Than' symbol (">") |
22 3C 3E |
34 60 62 |
These characters are often used to delimit URLs in plain text. |
'Pound' character ("#") | 23 | 35 | This is used in URLs to indicate where a fragment identifier (bookmarks/anchors in HTML) begins. |
Percent character ("%") | 25 | 37 | This is used to URL encode/escape other characters, so it should itself also be encoded. |
Misc. characters: Left Curly Brace ("{") Right Curly Brace ("}") Vertical Bar/Pipe ("|") Backslash ("\") Caret ("^") Tilde ("~") Left Square Bracket ("[") Right Square Bracket ("]") Grave Accent ("`") |
7B 7D 7C 5C 5E 7E 5B 5D 60 |
123 125 124 92 94 126 91 93 96 |
Some systems can possibly modify these chara |
如何做url encoded呢?
url encoding of a character包含一個%號,並且以iso-latin的16進位制兩位數來跟進
例如:
space = %20
使用javascript的
encodeURIComponent 函式來實現