apache url rewrite及正規表示式筆記

世有因果知因求果發表於2015-03-31

  

什麼是mod_rewrite?

mod_rewrite是apache一個允許伺服器端對請求url做修改的模組。入端url將和一系列的rule來比對。這些rule包含一個正規表示式以便檢測每個特別的模式。如果在url中檢測到該模式,並且適當的預設條件滿足,name該模式將被一個預設的字串或者行為所替換。

這個過程持續進行直到沒有任何未處理的規則或者該過程被顯式地停止。

這可以用三點來總結:

  • 有一系列的順序處理的規則rule集
  • 如果有一條規則被匹配,將同時檢查該規則對應的條件是否滿足
  • 如果一切處理結果都是go,那麼將執行一條替換或者其他動作

mod_rewrite的好處

有一些比較明顯的好處,但是也有一些並不是很明顯:

mod_rewrite非常普遍地被用於轉換醜陋的,難以明義的URL,形成所謂"友好或乾淨的url"。

另一方面,這些轉換後的url將會是搜尋引擎友好的

正規表示式token:

\s{2,}  2個以上的空格

\|    backward referrence

\\    matches a '\'

\b    word boundary position,比如whitespace或者字串的開始或者結束

\B    Not a word boundary position

(?=ABC)  positive lookahead. Matches a group after your main expression without including it in the result

(?!ABC)  Negative lookahead.Specifies a group that can not match after your main expression(ie. if it matches, the result is discarded)

(?<=ABC)   Positive lookbehind. Matches a group before your main expression without including it in the result.

(?<!ABC)  Negative lookbehind.Specifies a group that can not match before your main expression(ie.if it matches, the result is discarded)

*?    :match zero or more of the preceeding token. This is a lazy match, and will match as few characters as possible before satisfying the next token

+?    :match one or more of the preceeding token. This is a lazy match, and will match as few characters as possible before satisfying the next token

{5}    :matches exactly 5 of the preceeding token;

{2,5}  : matches 2 to 5 of the preceding token. Greedy match;

{2,5}?   matches 2 to 5 of the preceding token. lazy match;

(ABC)  groups multiple tokens together. This allows you to apply quantifiers to the fall group. Creates a capture group roll over a match highlight to see the capture group result

(?:ABC)  groups multiple tokens without creating capture group;

$$    escaped $ symbol         $`: insert the portion of the string that precedes the match

$&:     inserts the matched substring           $' : insert the portion of the string that follows the match
[$1]:     inserts the result of the first capture group

m      multiline

i       ignore case

"S"      match any character, except for line breaks if dotall is false

"g"    search globally

var str='The price of tomato is 5, the price of apple is 10';
str.replace(/(\d+)/g, '$1.00');

// 5.00  10.00

 

?     zero or one

\    escape

\.  \\   \+  \*  \?  \^  \$  \[  \]  \(  \)  \{  \}  \/  \'  \#

[ABC]  Any single character in ABC set

/th(a|i)nk/=/th[ai]nk/

() :捕獲 /(.+)@(163|126|188)\.com$/ 檢查網易郵箱的格式

(?:)不捕獲 /(.+)@(?:163|126|188)\.com$/ 

javascript中的str.match(regexp)獲取被捕獲的字串以便使用

var url='http://blog.163.com/album?id=1#comment';
var reg=/(https?:)\/\/([^\/]+)(\/[^\?]*)?(\?[^#]*)?(#.*)?/;
var arr=rul.match(reg);
var protocal = arr[1]; //http
var host=arr[2];//blog.163.com
var pathname=arr[3]; //  /album
var search=arr[4]; // id=1
var hash=arr[5]; //#comment 

 

+    one or more

*    zero or more

|  or    matches the full before or after '|'      (https?|ftp)://

^    matches the beginning of the string    

$    matches the end of the string

$1    refer to a match

$2    refer to another match

?:  within parenthesis to not capture (^.+(?:jpg|png|gif)$)

[^ABC]   Any single character not in the set

[a-z]     any single character in the a-z range

[^b-e]    any single character that is not in range b-e

[0-9]

[\w'-]  any world characater, single quote or -

\t  \r\n  tab

\xFF    specifying a character by its hexdecimal index

\xA9 => copyright symbol

如何匹配不包含連續出現的一串字串?

^(?!.*ab).*$  :不匹配ab連續出現

如何lazy模式儘可能少的匹配到字串?

alert( "123 456".match(/\d+ \d+?/g) ); // 123 4

注意上面程式碼中的?就起到了數字匹配lazy最少的模式!

http://javascript.info/regexp-greedy-and-lazy

 

https://24ways.org/2013/url-rewriting-for-the-fearful/ 號稱是最適合人來閱讀的關於url-rewrite的文章

 

相關文章