apache url rewrite及正規表示式筆記

世有因果知因求果發表於2015-03-31

什麼是mod_rewrite?

mod_rewrite是apache一個允許伺服器端對請求url做修改的模組。入端url將和一系列的rule來比對。這些rule包含一個正規表示式以便檢測每個特別的模式。如果在url中檢測到該模式，並且適當的預設條件滿足，name該模式將被一個預設的字串或者行為所替換。

這個過程持續進行直到沒有任何未處理的規則或者該過程被顯式地停止。

這可以用三點來總結：

有一系列的順序處理的規則rule集
如果有一條規則被匹配，將同時檢查該規則對應的條件是否滿足
如果一切處理結果都是go，那麼將執行一條替換或者其他動作

mod_rewrite的好處

有一些比較明顯的好處，但是也有一些並不是很明顯：

mod_rewrite非常普遍地被用於轉換醜陋的，難以明義的URL，形成所謂"友好或乾淨的url"。

另一方面，這些轉換後的url將會是搜尋引擎友好的

正規表示式token：

\s{2,}　　2個以上的空格

\|　　　　backward referrence

\\　　　　matches a '\'

\b　　　　word boundary position,比如whitespace或者字串的開始或者結束

\B　　　　Not a word boundary position

(?=ABC)　　positive lookahead. Matches a group after your main expression without including it in the result

(?!ABC)　　Negative lookahead.Specifies a group that can not match after your main expression(ie. if it matches, the result is discarded)

(?<=ABC) Positive lookbehind. Matches a group before your main expression without including it in the result.

(?<!ABC)　　Negative lookbehind.Specifies a group that can not match before your main expression(ie.if it matches, the result is discarded)

*?　　　　:match zero or more of the preceeding token. This is a lazy match, and will match as few characters as possible before satisfying the next token

+?　　　　:match one or more of the preceeding token. This is a lazy match, and will match as few characters as possible before satisfying the next token

{5}　　　 :matches exactly 5 of the preceeding token;

{2,5}　　: matches 2 to 5 of the preceding token. Greedy match;

{2,5}?　　 matches 2 to 5 of the preceding token. lazy match;

(ABC)　　groups multiple tokens together. This allows you to apply quantifiers to the fall group. Creates a capture group roll over a match highlight to see the capture group result

(?:ABC)　　groups multiple tokens without creating capture group;

$$　　　　escaped $ symbol　　　　　　 $`: insert the portion of the string that precedes the match

$&: 　　　　inserts the matched substring $' : insert the portion of the string that follows the match
[$1]: 　　　　inserts the result of the first capture group

m　　　　　　multiline

i　　　　　　　ignore case

"S"　　　　　　match any character, except for line breaks if dotall is false

"g"　　　　search globally

var str='The price of tomato is 5, the price of apple is 10';
str.replace(/(\d+)/g, '$1.00');

// 5.00  10.00

? 　　　　zero or one

\　　　　escape

\.　　\\ 　　\+　　\*　　\?　　\^　　\$　　\[　　\]　　　　\{　　\}　　\/　　\'　　\#

[ABC]　　Any single character in ABC set

/th(a|i)nk/=/th[ai]nk/

() :捕獲 /(.+)@(163|126|188)\.com$/ 檢查網易郵箱的格式

(?:)不捕獲 /(.+)@(?:163|126|188)\.com$/

javascript中的str.match(regexp)獲取被捕獲的字串以便使用

var url='http://blog.163.com/album?id=1#comment';
var reg=/(https?:)\/\/([^\/]+)(\/[^\?]*)?(\?[^#]*)?(#.*)?/;
var arr=rul.match(reg);
var protocal = arr[1]; //http
var host=arr[2];//blog.163.com
var pathname=arr[3]; //  /album
var search=arr[4]; // id=1
var hash=arr[5]; //#comment

+　　　　one or more

*　　　　zero or more

|　　or matches the full before or after '|'　　　　　　(https?|ftp)://

^　　　　matches the beginning of the string　　　　

$　　　　matches the end of the string

$1　　　　refer to a match

$2　　　　refer to another match

?:　　within parenthesis to not capture (^.+(?:jpg|png|gif)$)

[^ABC]　　　Any single character not in the set

[a-z] 　　　　any single character in the a-z range

[^b-e]　　　　any single character that is not in range b-e

[0-9]

[\w'-]　　any world characater, single quote or -

\t　　\r\n　　tab

\xFF　　　　specifying a character by its hexdecimal index

\xA9 => copyright symbol

如何匹配不包含連續出現的一串字串?

^(?!.*ab).*$  :不匹配ab連續出現

如何lazy模式儘可能少的匹配到字串?

alert( "123 456".match(/\d+ \d+?/g) ); // 123 4

注意上面程式碼中的?就起到了數字匹配lazy最少的模式！

http://javascript.info/regexp-greedy-and-lazy

https://24ways.org/2013/url-rewriting-for-the-fearful/ 號稱是最適合人來閱讀的關於url-rewrite的文章

正規表示式（筆記）
2016-11-13
筆記
正規表示式速查筆記
2018-05-21
筆記
正規表示式筆記（四）
2015-05-08
筆記
正規表示式筆記（三）
2015-05-06
筆記
正規表示式筆記（二）
2015-05-05
筆記
正規表示式筆記（一）
2015-05-03
筆記
正規表示式學習筆記
2019-03-02
筆記
JS筆記(15): 正規表示式
2019-04-26
JS筆記
7，正規表示式(perl筆記)
2007-06-22
筆記
JavaScript正規表示式迷你書-筆記
2018-10-21
JavaScript筆記
Ruby學習筆記-正規表示式
2017-10-23
筆記
正規表示式學習筆記一
2017-04-13
筆記
python筆記(2) 正規表示式
2017-02-20
Python筆記
PERL學習筆記---正規表示式
2014-06-11
筆記
正規表示式學習筆記 (轉)
2007-08-16
筆記
JavaScript正規表示式學習筆記(一)
2019-03-04
JavaScript筆記
Python學習筆記 - 正規表示式
2019-01-16
Python筆記
最容易理解的正規表示式筆記
2018-06-18
筆記
JDK6筆記（3）—-正規表示式
2007-03-26
JDK筆記
Python筆記五之正規表示式
2024-02-25
Python筆記
正規表示式學習筆記（1）-認識正則
2021-09-09
筆記
JS常用正規表示式及驗證時間的正規表示式
2022-03-19
JS
js加固之正規表示式學習筆記
2021-04-02
JS筆記
學習筆記-5.1.正規表示式1
2017-11-15
筆記
Python下正規表示式學習筆記
2016-05-14
Python筆記
JDK6筆記（4）—-正規表示式2
2007-03-28
JDK筆記
JS助記 ----- 正規表示式
2020-11-22
JS
js匹配網址url的正規表示式集合
2022-03-20
JS
解析url地址正規表示式程式碼例項
2017-03-17
python 中的正規表示式學習筆記
2021-05-05
Python筆記
Python 正規表示式 re 模組簡明筆記
2016-12-27
Python筆記
Kotlin學習筆記（五十八）正規表示式
2017-07-18
Kotlin筆記
PERL學習筆記---正規表示式的應用
2014-06-13
筆記
Ruby筆記《一》Regexp正規表示式薦
2012-10-30
筆記
8，以正規表示式進行匹配(perl筆記)
2007-06-22
筆記
工作學習筆記（二）正規表示式（轉載）
2024-11-07
筆記
利用apache的mod_rewrite做URL規則重寫
2017-01-19
Apache
Golang正則筆記：使用正規表示式處理題庫文字
2018-06-07
Golang筆記