Regular Expressions in Grep Command with 10 Examples --reference

weixin_34219944發表於2014-06-14

Regular expressions are used to search and manipulate the text, based on the patterns. Most of the Linux commands and programming languages use regular expression.

Grep command is used to search for a specific string in a file. Please refer our earlier article for 15 practical grep command examples.

You can also use regular expressions with grep command when you want to search for a text containing a particular pattern. Regular expressions search for the patterns on each line of the file. It simplifies our search operation.

This articles is part of a 2 article series.

This part 1 article covers grep examples for simple regular expressions. The future part 2 article will cover advanced regular expression examples in grep.

Let us take the file /var/log/messages file which will be used in our examples.

Example 1. Beginning of line ( ^ )

In grep command, caret Symbol ^ matches the expression at the start of a line. In the following example, it displays all the line which starts with the Nov 10. i.e All the messages logged on November 10.

$ grep "^Nov 10" messages.1
Nov 10 01:12:55 gs123 ntpd[2241]: time reset +0.177479 s
Nov 10 01:17:17 gs123 ntpd[2241]: synchronized to LOCAL(0), stratum 10
Nov 10 01:18:49 gs123 ntpd[2241]: synchronized to 15.1.13.13, stratum 3
Nov 10 13:21:26 gs123 ntpd[2241]: time reset +0.146664 s
Nov 10 13:25:46 gs123 ntpd[2241]: synchronized to LOCAL(0), stratum 10
Nov 10 13:26:27 gs123 ntpd[2241]: synchronized to 15.1.13.13, stratum 3

The ^ matches the expression in the beginning of a line, only if it is the first character in a regular expression. ^N matches line beginning with N.

Example 2. End of the line ( $)

Character $ matches the expression at the end of a line. The following command will help you to get all the lines which ends with the word “terminating”.

$ grep "terminating.$" messages
Jul 12 17:01:09 cloneme kernel: Kernel log daemon terminating.
Oct 28 06:29:54 cloneme kernel: Kernel log daemon terminating.

From the above output you can come to know when all the kernel log has got terminated. Just like ^ matches the beginning of the line only if it is the first character, $ matches the end of the line only if it is the last character in a regular expression.

Example 3. Count of empty lines ( ^$ )

Using ^ and $ character you can find out the empty lines available in a file. “^$” specifies empty line.

$ grep -c  "^$" messages anaconda.log
messages:0
anaconda.log:3

The above commands displays the count of the empty lines available in the messages and anaconda.log files.

Example 4. Single Character (.)

The special meta-character “.” (dot) matches any character except the end of the line character. Let us take the input file which has the content as follows.

$ cat input
1. first line
2. hi hello
3. hi zello how are you
4. cello
5. aello
6. eello
7. last line

Now let us search for a word which has any single character followed by ello. i.e hello, cello etc.,

$ grep ".ello" input
2. hi hello
3. hi zello how are you
4. cello
5. aello
6. eello

In case if you want to search for a word which has only 4 character you can give grep -w “….” where single dot represents any single character.

Example 5. Zero or more occurrence (*)

The special character “*” matches zero or more occurrence of the previous character. For example, the pattern ’1*’ matches zero or more ’1′.

The following example searches for a pattern “kernel: *” i.e kernel: and zero or more occurrence of space character.

$ grep "kernel: *." *
messages.4:Jul 12 17:01:02 cloneme kernel: ACPI: PCI interrupt for device 0000:00:11.0 disabled
messages.4:Oct 28 06:29:49 cloneme kernel: ACPI: PM-Timer IO Port: 0x1008
messages.4:Oct 28 06:31:06 btovm871 kernel:  sda: sda1 sda2 sda3
messages.4:Oct 28 06:31:06 btovm871 kernel: sd 0:0:0:0: Attached scsi disk sda
.
.

In the above example it matches for kernel and colon symbol followed by any number of spaces/no space and “.” matches any single character.

Example 6. One or more occurrence (\+)

The special character “\+” matches one or more occurrence of the previous character. ” \+” matches at least one or more space character.

If there is no space then it will not match. The character “+” comes under extended regular expression. So you have to escape when you want to use it with the grep command.

$ cat input
hi hello
hi    hello how are you
hihello

$ grep "hi \+hello" input
hi hello
hi    hello how are you

In the above example, the grep pattern matches for the pattern ‘hi’, followed by one or more space character, followed by “hello”.

If there is no space between hi and hello it wont match that. However, * character matches zero or more occurrence.

“hihello” will be matched by * as shown below.

$ grep "hi *hello" input
hi hello
hi    hello how are you
hihello
$

Example 7. Zero or one occurrence (\?)

The special character “?” matches zero or one occurrence of the previous character. “0?” matches single zero or nothing.

$ grep "hi \?hello" input
hi hello
hihello

“hi \?hello” matches hi and hello with single space (hi hello) and no space (hihello).

The line which has more than one space between hi and hello did not get matched in the above command.

Example 8.Escaping the special character (\)

If you want to search for special characters (for example: * , dot) in the content you have to escape the special character in the regular expression.

$ grep "127\.0\.0\.1"  /var/log/messages.4
Oct 28 06:31:10 btovm871 ntpd[2241]: Listening on interface lo, 127.0.0.1#123 Enabled

Example 9. Character Class ([0-9])

The character class is nothing but list of characters mentioned with in the square bracket which is used to match only one out of several characters.

$ grep -B 1 "[0123456789]\+ times" /var/log/messages.4
Oct 28 06:38:35 btovm871 init: open(/dev/pts/0): No such file or directory
Oct 28 06:38:35 btovm871 last message repeated 2 times
Oct 28 06:38:38 btovm871 pcscd: winscard.c:304:SCardConnect() Reader E-Gate 0 0 Not Found
Oct 28 06:38:38 btovm871 last message repeated 3 times

Repeated messages will be logged in messages logfile as “last message repeated n times”. The above example searches for the line which has any number (0to9) followed by the word “times”. If it matches it displays the line before the matched line and matched line also.

With in the square bracket, using hyphen you can specify the range of characters. Like [0123456789] can be represented by [0-9]. Alphabets range also can be specified such as [a-z],[A-Z] etc. So the above command can also be written as

$ grep -B 1 "[0-9]\+ times" /var/log/messages.4

Example 10. Exception in the character class

If you want to search for all the characters except those in the square bracket, then use ^ (Caret) symbol as the first character after open square bracket. The following example searches for a line which does not start with the vowel letter from dictionary word file in linux.

$ grep -i  "^[^aeiou]" /usr/share/dict/linux.words
1080
10-point
10th
11-point
12-point
16-point
18-point
1st
2

First caret symbol in regular expression represents beginning of the line. However, caret symbol inside the square bracket represents “except” — i.e match except everything in the square bracket.

http://www.thegeekstuff.com/2011/01/regular-expressions-in-grep-command/

字元說明

\

將下一字元標記為特殊字元、文字、反向引用或八進位制轉義符。例如,“n”匹配字元“n”。“\n”匹配換行符。序列“\\”匹配“\”,“\(”匹配“(”。

^

匹配輸入字串開始的位置。如果設定了 RegExp 物件的 Multiline 屬性,^ 還會與“\n”或“\r”之後的位置匹配。

$

匹配輸入字串結尾的位置。如果設定了 RegExp 物件的 Multiline 屬性,$ 還會與“\n”或“\r”之前的位置匹配。

*

零次或多次匹配前面的字元或子表示式。例如,zo* 匹配“z”和“zoo”。* 等效於 {0,}。

+

一次或多次匹配前面的字元或子表示式。例如,“zo+”與“zo”和“zoo”匹配,但與“z”不匹配。+ 等效於 {1,}。

?

零次或一次匹配前面的字元或子表示式。例如,“do(es)?”匹配“do”或“does”中的“do”。? 等效於 {0,1}。

{n}

是非負整數。正好匹配 n 次。例如,“o{2}”與“Bob”中的“o”不匹配,但與“food”中的兩個“o”匹配。

{n,}

是非負整數。至少匹配 次。例如,“o{2,}”不匹配“Bob”中的“o”,而匹配“foooood”中的所有 o。“o{1,}”等效於“o+”。“o{0,}”等效於“o*”。

{n,m}

M 和 n 是非負整數,其中 n <= m。匹配至少 n 次,至多 m 次。例如,“o{1,3}”匹配“fooooood”中的頭三個 o。'o{0,1}' 等效於 'o?'。注意:您不能將空格插入逗號和數字之間。

?

當此字元緊隨任何其他限定符(*、+、?、{n}、{n,}、{n,m})之後時,匹配模式是“非貪心的”。“非貪心的”模式匹配搜尋到的、儘可能短的字串,而預設的“貪心的”模式匹配搜尋到的、儘可能長的字串。例如,在字串“oooo”中,“o+?”只匹配單個“o”,而“o+”匹配所有“o”。

.

匹配除“\n”之外的任何單個字元。若要匹配包括“\n”在內的任意字元,請使用諸如“[\s\S]”之類的模式。

(pattern)

匹配 pattern 並捕獲該匹配的子表示式。可以使用 $0…$9 屬性從結果“匹配”集合中檢索捕獲的匹配。若要匹配括號字元 ( ),請使用“\(”或者“\)”。

(?:pattern)

匹配 pattern 但不捕獲該匹配的子表示式,即它是一個非捕獲匹配,不儲存供以後使用的匹配。這對於用“or”字元 (|) 組合模式部件的情況很有用。例如,'industr(?:y|ies) 是比 'industry|industries' 更經濟的表示式。

(?=pattern)

執行正向預測先行搜尋的子表示式,該表示式匹配處於匹配 pattern 的字串的起始點的字串。它是一個非捕獲匹配,即不能捕獲供以後使用的匹配。例如,'Windows (?=95|98|NT|2000)' 匹配“Windows 2000”中的“Windows”,但不匹配“Windows 3.1”中的“Windows”。預測先行不佔用字元,即發生匹配後,下一匹配的搜尋緊隨上一匹配之後,而不是在組成預測先行的字元後。

(?!pattern)

執行反向預測先行搜尋的子表示式,該表示式匹配不處於匹配 pattern 的字串的起始點的搜尋字串。它是一個非捕獲匹配,即不能捕獲供以後使用的匹配。例如,'Windows (?!95|98|NT|2000)' 匹配“Windows 3.1”中的 “Windows”,但不匹配“Windows 2000”中的“Windows”。預測先行不佔用字元,即發生匹配後,下一匹配的搜尋緊隨上一匹配之後,而不是在組成預測先行的字元後。

x|y

匹配 x 或 y。例如,'z|food' 匹配“z”或“food”。'(z|f)ood' 匹配“zood”或“food”。

[xyz]

字符集。匹配包含的任一字元。例如,“[abc]”匹配“plain”中的“a”。

[^xyz]

反向字符集。匹配未包含的任何字元。例如,“[^abc]”匹配“plain”中的“p”。

[a-z]

字元範圍。匹配指定範圍內的任何字元。例如,“[a-z]”匹配“a”到“z”範圍內的任何小寫字母。

[^a-z]

反向範圍字元。匹配不在指定的範圍內的任何字元。例如,“[^a-z]”匹配任何不在“a”到“z”範圍內的任何字元。

\b

匹配一個字邊界,即字與空格間的位置。例如,“er\b”匹配“never”中的“er”,但不匹配“verb”中的“er”。

\B

非字邊界匹配。“er\B”匹配“verb”中的“er”,但不匹配“never”中的“er”。

\cx

匹配 x 指示的控制字元。例如,\cM 匹配 Control-M 或回車符。x 的值必須在 A-Z 或 a-z 之間。如果不是這樣,則假定 c 就是“c”字元本身。

\d

數字字元匹配。等效於 [0-9]。

\D

非數字字元匹配。等效於 [^0-9]。

\f

換頁符匹配。等效於 \x0c 和 \cL。

\n

換行符匹配。等效於 \x0a 和 \cJ。

\r

匹配一個回車符。等效於 \x0d 和 \cM。

\s

匹配任何空白字元,包括空格、製表符、換頁符等。與 [ \f\n\r\t\v] 等效。

\S

匹配任何非空白字元。與 [^ \f\n\r\t\v] 等效。

\t

製表符匹配。與 \x09 和 \cI 等效。

\v

垂直製表符匹配。與 \x0b 和 \cK 等效。

\w

匹配任何字類字元,包括下劃線。與“[A-Za-z0-9_]”等效。

\W

與任何非單詞字元匹配。與“[^A-Za-z0-9_]”等效。

\xn

匹配 n,此處的 n 是一個十六進位制轉義碼。十六進位制轉義碼必須正好是兩位數長。例如,“\x41”匹配“A”。“\x041”與“\x04”&“1”等效。允許在正規表示式中使用 ASCII 程式碼。

\num

匹配 num,此處的 num 是一個正整數。到捕獲匹配的反向引用。例如,“(.)\1”匹配兩個連續的相同字元。

\n

標識一個八進位制轉義碼或反向引用。如果 \n 前面至少有 n 個捕獲子表示式,那麼 n 是反向引用。否則,如果 n 是八進位制數 (0-7),那麼 n 是八進位制轉義碼。

\nm

標識一個八進位制轉義碼或反向引用。如果 \nm 前面至少有 nm 個捕獲子表示式,那麼 nm 是反向引用。如果 \nm 前面至少有 n 個捕獲,則 n 是反向引用,後面跟有字元 m。如果兩種前面的情況都不存在,則 \nm 匹配八進位制值 nm,其中 和 m 是八進位制數字 (0-7)。

\nml

當 n 是八進位制數 (0-3),m 和 l 是八進位制數 (0-7) 時,匹配八進位制轉義碼 nml

\un

匹配 n,其中 n 是以四位十六進位制數表示的 Unicode 字元。例如,\u00A9 匹配版權符號 (©)。

http://msdn.microsoft.com/zh-cn/library/ae5bf541(v=vs.80).aspx

相關文章