python筆記-文字處理（第三天）

宣告：本人是每天學習賴明星《python linux系統管理與自動化運維》一書後，整理成自己的筆記，供學習和分享使用，如有需要，請購買作者正版書，謝謝

1、字串常量

1.1、定義字串

python不區分字元和字串，所以python可以使用引號或者雙引號來定義字串，如下：

				
In [5]: greet = "Hello world"
In [6]: greet = 'Hello world'

一般來說，字串的值本身包含單引號的情況下，我們一般使用雙引號來定義字串；同理，如果字串的值本身包含雙引號的情況下，我們可以使用單引號來定義字串，如下：

				
In [7]: intro = "He's a teacher"
In [8]: statment = 'john said to me:"can you do me a favor tonight"'

當然，python也可以使用轉譯字元來進行轉譯，轉譯字元是\。

\`：單引號

\n：換行

\a：響鈴

\b：退格，將當前位置移動到前一列

\f：換頁，將當前位置移動到下頁開頭

\r：回車，將當前位置移動到本行開頭

\t：水平製表，跳到下一個TAB位置

\v：垂直製表

\\：代表一個反斜槓字元'\'

在程式語言中，使用\定義轉譯字元，在URL中，使用%定義轉譯字元。

對於前面的例子，還可以用轉譯字元來表示：

				
In [11]: intro = "He\'s a teacher"
In [12]: statment = "john said to me:\"can you do me a favor tonight\""

大家透過前面的介紹，知道python遇到\時，會認為是一個轉移符，但是如下，可能改變我們的初衷：

				
In [13]: import os
In [14]: path = "c:\next"
In [15]: print(path)
c:
ext

上面的是因為python對\n進行了轉移，為了修正這個問題，我們可以在字串前面加一個r，表示使用原始字串，或者用\\進行轉移：

				
In [16]: path = "c:\\next"
In [17]: print(path)
c:\next

或者：

				
In [18]: path = r"c:\next"
In [19]: print(path)
c:\next

在python中，還可以使用三引號來定義字串，如下：

				
In [20]: message =''' Type "copyright","credits" or "license" for more infomation ,Details about 'object' for extra'''
In [21]: print(message)
Type "copyright","credits" or "license" for more infomation ,Details about 'object' for extra

python字串還有一個容易忽略的特性，即兩個相連的字串會自動組成一個新字串：

				
In [23]: s = "Hello" "World"
In [24]: s
Out[24]: 'HelloWorld'

1.2、字串是不可變的有序集合

python語言的字串有兩大特點：

1）字串是不可變的，不能直接對字串進行修改（python工程師應該謹記這個特性）

2）字串是字元的有序集合

				
			

								

								
In [25]: s = "hello"
In [26]: s[0]
Out[26]: 'h'
In [27]: s[0] = 'H'
---------------------------------------------------------------------------
TypeError                                Traceback (most recent call last)
<ipython-input-27-812ef2514689> in <module>()
----> 1 s[0] = 'H'
TypeError: 'str' object does not support item assignment
可見python的字串是不能直接修改的。
In [30]: s
Out[30]: 'hello'
In [31]: s[1:]
Out[31]: 'ello'
In [32]: s = 'H' + s[1:]
In [33]: s
Out[33]: 'Hello'
In [34]: s + 'World'
Out[34]: 'HelloWorld'
In [35]: s
Out[35]: 'Hello'
In [36]: s*3
Out[36]: 'HelloHelloHello'

				

python字串每次操作都會產生一個新的字串，新的字串會佔用一塊獨立的記憶體。因此，操作字串時需要避免產生太多的中間結果。

列舉元組裡面東西，應該用字串的join方法，如下：

				
In [16]: fruits = ['orange','apple','banana','pear']
In [17]: ",".join(fruits)
Out[17]: 'orange,apple,banana,pear'

Python字串可以透過下標和切片進行訪問。

a）下標操作每次只能訪問一個元素；

b）切片操作每次可以訪問一個範圍，可以指定切片操作的起點，終點和步長，也可以省略。

例子：

				
In [40]: s = "Hello,world"
In [41]: s[:5]
Out[41]: 'Hello'
In [42]: s[0:5]
Out[42]: 'Hello'
In [43]: s[0:4]
Out[43]: 'Hell'
In [44]: s[5]
Out[44]: ','
In [49]: s[6:]
Out[49]: 'world'

				
			

								

								
In [51]: s
Out[51]: 'Hello,world'
In [52]: s[::-1]  #這是切片，起點是開頭，結尾是末尾，步長是-1
Out[52]: 'dlrow,olleH'
In [67]: s[::-2]
Out[67]: 'drwolH'
In [63]: s
Out[63]: 'Hello,world'
In [64]: s[0:5]
Out[64]: 'Hello'
In [65]: s[0:5:2]   #這是切片，起點是0，終點是5，步長是2
Out[65]: 'Hlo'

				

另外，下標和切片，可以應用於任何有序的集合，包括字串，元組，列表。

上面我們用如下方式進行了逆轉字串，但是這種可讀性不好：

In [52]: s[::-1] #這是切片，起點是開頭，結尾是末尾，步長是-1

Out[52]: 'dlrow,olleH'

逆轉字串不如使用reversed函式清晰，這種可讀性更好，如下：

				
In [68]: s
Out[68]: 'Hello,world'
In [69]: reversed(s)
Out[69]: <reversed at 0x2d5c2d0>
In [72]: ''.join(reversed(s))
Out[72]: 'dlrow,olleH'

2、字串函式

2.1 通用操作

				
In [73]: s = "Hello,World"
In [74]: len(s)
Out[74]: 11
In [75]: "Hello" in s
Out[75]: True
In [76]: "Hello" not in s
Out[76]: False

相同的方式應用於列表中：

				
In [11]: i = [1,2,3,4,5]
In [13]: len(i)
Out[13]: 5
In [15]: 1 in i
Out[15]: True

在Python語言的設計哲學中，字串、列表和元組具有一些共性，即他們都是元素的有序集合，python語言將對共性的操作提煉成了通用操作。因此，下標訪問、序列切片操作、求長度和判斷元素是否存在於集合中都是透過更加通用的函式和表示式提供支援。

2.2 與大小寫相關的方法

以下是幾個函式是與字元大小寫相關的字串處理函式：

1)upper：將字串轉換為大寫；

2）lower：將字串轉換為小寫；

3）isupper：判斷字串是否都為大寫；

4）islower：判斷字串是否都為小寫；

5）swapcase：將字串中的大寫轉換為小寫，小寫轉換為大寫；

6）capitalize：將首字元轉換為大寫；

7)istitle：判斷字串是不是一個標題。

例子：

				
			

								

								
In [82]: s = "Hello,world"
In [83]: s.capitalize()
Out[83]: 'Hello,world'
In [88]: s.capitalize?   //檢視幫助
Type:      builtin_function_or_method
String Form:<built-in method capitalize of str object at 0x2d53c00>
Docstring:
S.capitalize() -> string
Return a copy of the string S with only its first character
capitalized.
In [89]: "lai ming xing".upper()
Out[89]: 'LAI MING XING'

				

				
			

								

								
In [91]: "LAI MING XING".lower()
Out[91]: 'lai ming xing'
In [92]: "LAI MING XING".isupper()
Out[92]: True
In [93]: "LAI MING XING".swapcase()
Out[93]: 'lai ming xing'
In [95]: "lai ming xing".capitalize()
Out[95]: 'Lai ming xing'
In [96]: "lai ming xing".istitle()
Out[96]: False
In [97]: "Lai Ming Xing".istitle()
Out[97]: True

				

				
[root@localhost ~]# cat test3.py 
#!/usr/bin/python
#coding:utf-8
yes_or_no=input('Please input yes or no:')
if yes_or_no.lower()  ==  "yes":
    print("continue do something")
else:
    print("exit.....")

[root@localhost ~]# python test3.py

Please input yes or no:"yes"

continue do something

3、判斷類方法

1）isupper：判斷字串是否都為大寫；

2）islower：判斷字串是否都為小寫；

3）isalpha：如果字串只包含字母，並且非空，則返回True，否則False；

4）isalnum：如果字串值包含字母和數字，並且非空，則返回True，否則False；

5）isspace：如果字串值包含空格、製表符、換行符，並且非空，則返回True，否則False；

6）isdecimal：如果字串只包含數字字元，並且非空，則返回True，否則False。

例子：

				
			

								

								
In [18]: "Python".isalpha()
Out[18]: True
In [19]: "Python 3.6".isalpha()
Out[19]: False
In [20]: "Python 3.6".isalnum()
Out[20]: False
In [21]: "\t\n".isspace()
Out[21]: True
In [23]: u"Python 3.6".isdecimal()
Out[23]: False
In [25]: u"36".isdecimal()
Out[25]: True

				

u"Python 3.6".isdecimal()

u/U:表示unicode字串

不是僅僅是針對中文, 可以針對任何的字串，代表是對字串進行unicode編碼。

一般英文字元在使用各種編碼下, 基本都可以正常解析, 所以一般不帶u；但是中文, 必須表明所需編碼, 否則一旦編碼轉換就會出現亂碼。

建議所有編碼方式採用utf8

4、字串方法startswith和endswith

startswith和endswith也是兩個判斷函式，用來判斷方法的引數是否為字串的字首和字尾。

				
In [1]: s = 'lai ming xing'
In [2]: s.startswith('lai')
Out[2]: True
In [3]: s.startswith('lai m')
Out[3]: True
In [8]: s.endswith('xing')
Out[8]: True

下面我們看個使用的例子：

假設當前目錄下存在文字檔案、python檔案和圖片檔案，如下：

[root@localhost ~]# ls

a.txt b.txt c.txt d.txt e.py f.py g.py h.jpg

現在我們想找出所有文字檔案或者Python檔案，在python中使用內建的字串方法非常方法：

				
In [2]: import os
In [4]: [ item for item in os.listdir('.') if item.endswith('.py') ]
Out[4]: ['f.py', 'e.py', 'g.py']
In [5]: [ item for item in os.listdir('.') if item.endswith('.txt') ]
Out[5]: ['d.txt', 'b.txt', 'a.txt', 'c.txt']

在我們的實際工作中，更多的時候可能需要字首匹配：

				
[root@localhost log]# ls messages*
messages  messages-20171231

利用python檢視檔案大小：

				
In [6]: import os
In [7]: message_logs = [ item for item in os.listdir('/var/log/') if item.startswith('message')]
In [8]: message_logs
Out[8]: ['messages-20171231', 'messages']
    
In [9]: sum_size = sum(os.path.getsize(os.path.join('/var/log',item)) for item in message_logs)
In [10]: sum_size
Out[10]: 526031

5、查詢類函式

下面幾個函式都是用來查詢子串出現在字串中的位置，他們之間的區別就是查詢的方向不同，或者是處理異常情況的方式不同：

1）find：查詢子串出現在字串中的位置，如果查詢失敗，返回-1；

2）index：與find函式類似，如果查詢失敗，丟擲ValueError異常；

3）rfind：與find函式類似，區別在於rfind是從後向前找；

4）findex：與index類似，區別在於rindex是從後向橋找。

例子：

				
In [27]: a = 'Return the lowest index in S where substring sub is found'
In [33]: a.find('in')
Out[33]: 18
In [35]: a.find('indfd')
Out[35]: -1
In [38]: a.find('in',19) #表示從下表為19的位置開始找in
Out[38]: 24

注意上面的 a = 'Return the lowest index in S where substring sub is found'裡面，有兩個in，第一個出現在下標18的位置，第二個出現在19的位置，在不指定範圍的情況下，查詢的是第一個in。

注意：find函式最容易用錯的地方在於使用find函式判斷一個子串是否出現在字串中。

判斷一個字串是否是另外一個字串的子串，正確的做法是使用In或者not in，如下：

				
In [40]: a
Out[40]: 'Return the lowest index in S where substring sub is found'
In [41]: 'in' in a
Out[41]: True
In [43]: 'indbbb' not in a
Out[43]: True

6、字串操作方法

前面說過,python的字串是不可變的，無法修改。如果要修改原來的字串，可以對字串執行修改操作，將修改後的字串再賦值回原來的變數。

6.1 join函式

字串的join函式用以連線字串列表，組成一個新的、更大的字串。

				
			

								

								
In [45]: "".join(['a','b','c'])
Out[45]: 'abc'
In [46]: "-".join(['a','b','c'])
Out[46]: 'a-b-c'
In [48]: with open('/etc/passwd') as f:
   ....:     print('###'.join(f))
   ....:     
root:x:0:0:root:/root:/bin/bash
###bin:x:1:1:bin:/bin:/sbin/nologin
###daemon:x:2:2:daemon:/sbin:/sbin/nologin
###adm:x:3:4:adm:/var/adm:/sbin/nologin
###lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

				

另外，print函式本身就可以透過sep引數指定分割符進行拼接，不需要join函式，如下：

				
>>> print('root','root','/bin/bash',sep=':')
root:root:/bin/bash

注：筆者發現只有python3的print才有sep功能。

6.2 split函式

split函式與join函式的作用正好相反，split將一個字串拆分成字串列表，如下：

				
In [70]: "root:x:0:0:root:/root:/bin/bash".split(':')
Out[70]: ['root', 'x', '0', '0', 'root', '/root', '/bin/bash']

6.3 strip函式

strip：對字串兩邊進行裁剪；

lstrip：對字串左邊進行裁剪；

rstrip：對字串右邊進行裁剪。

使用場景：strip函式使用最多的場景是去除字串兩邊的空白字元,如下：

				
In [78]: s = "\tHello,\tWorld \n"
In [79]: s
Out[79]: '\tHello,\tWorld \n'
In [80]: s.strip()
Out[80]: 'Hello,\tWorld'
    In [81]: s.lstrip()
Out[81]: 'Hello,\tWorld \n'
In [82]: s.rstrip()
Out[82]: '\tHello,\tWorld'

當然，也可以給strip函式傳遞引數，引數中的所有字元都可以被裁剪，傳遞給strip函式的引數是需要裁剪的字符集合，因為是集合，所以字串的順序並不重要，重複字串也沒有任何效果，如下：

				
In [83]: s = "##Hello,world##"
In [84]: s
Out[84]: '##Hello,world##'
In [85]: s.strip('#')
Out[85]: 'Hello,world'
In [86]: s.strip('####')
Out[86]: 'Hello,world'
    
In [87]: s.strip('H#d')
Out[87]: 'ello,worl'
In [90]: s.strip('dH#')
Out[90]: 'ello,worl'

6.4、relapace函式

作用：將字串的子串替換成另外一個新的子串：

				
In [92]: "hello,word".replace('ll','oo')
Out[92]: 'heooo,word'

7、使用Python分析apache的訪問日誌

				
			

								

								
In [93]: line = '193.252.243.232 - - [29/Mar/2009:06:05:34 +0200] "GET /index.php HTTP/1.1" 200 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-" '
    
In [95]: line.split()
Out[95]: 
['193.252.243.232',
 '-',
 '-',
 '[29/Mar/2009:06:05:34',
 '+0200]',
 '"GET',
 '/index.php',
 'HTTP/1.1"',
 '200',
 '8741',
 '"-"',
 '"Mozilla/5.0',
 '(compatible;',
 'PJBot/3.0;',
 '+)"',
 '"-"']
In [97]: line.split()[0]
Out[97]: '193.252.243.232'
    
In [98]: line.split()[6]
Out[98]: '/index.php'

				

需求1：透過apache訪問日誌，統計PV和UV(pv是訪問的訪問請求數，uv是網站獨立訪客數)。

accesslog日誌如下：

				
[root@localhost ~]# cat /root/access.log 
24.243.232 - - [29/Mar/2009:06:05:34 +0200] "GET /index.php HTTP/1.1" 200 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"
232.243.232 - - [29/Mar/2009:06:225:34 +0200] "GET /index.php HTTP/1.1" 200 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"
252.433.232 - - [29/Mar/2009:04:05:34 +0200] "GET /index.php HTTP/1.1" 200 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"
252.253.232 - - [29/Mar/2009:06:05:55 +0200] "GET /index.php HTTP/1.1" 200 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"
252.243.232 - - [29/Mar/2009:07:05:34 +0200] "GET /index.php HTTP/1.1" 200 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"
252.243.232 - - [29/Mar/2009:07:05:34 +0200] "GET /index.php HTTP/1.1" 200 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"

注意：access.log最後一行不能是空行，否則列表會報錯，說下標超出範圍。

				
[root@localhost ~]# cat test.py 
#!/usr/bin/python
#coding:utf-8
from __future__ import print_function
ips = []
with open('/root/access.log') as f:
    for line in f:
        ips.append(line.split()[0])
print("PV is {0}".format(len(ips)))
#下面我們將ips列表透過set關鍵字轉換為集合，因為集合天生就就具有去重的功能。
print("UV is {0}".format(len(set(ips))))

				
[root@localhost ~]# python test.py 
PV is 6  //ip的個數，包含重複的
UV is 5 //ip去重後的個數

解釋：我們將解析出來的ip新增到一個列表中，那麼列表的長度就是網站的訪問數PV。當我們求UV時，只需要對剛才的列表進行去重，然後統計去重以後的元素個數，就得到UV。對於這個需求，在Python中可以透過將列表儲存到一個集合中實現，因為集合天生就具有去重的功能。

小知識，set建立集合

				
>>> set1 = set([1,2,3,4,5,5])
>>> set1
{1, 2, 3, 4, 5}
上面我們用set建立了一個集合。

需求2：透過apache訪問日誌找到網站中最熱的資源（Counter）。

這個資訊對工程師來說非常有用，工程師知道哪些資源是比較熱的資源後，可以對這些資源的訪問進行額外的的最佳化，比如快取、反向代理、CDN等技術手段。

Python中可以使用collections.Counter儲存資源的熱度。

Counter是dict的子類，使用方式和字典類似，如下：

Counter是python2.7及以上版本才有的功能，這點要注意。

				
			

								

								
>>> from collections import Counter
>>> c = Counter('abcde')
>>> c
Counter({'a': 1, 'c': 1, 'b': 1, 'e': 1, 'd': 1})
>>> c['a']
1
>>> c['d']
1
>>> c['a'] += 1
>>> c['d'] += 1
>>> c['d']
2
>>> c['a']
1
>>> c
Counter({'a': 2, 'd': 2, 'c': 1, 'b': 1, 'e': 1})
>>> c.most_common(3)   //most_common函式用來顯示Counter中取值最大的幾個元素
[('a', 2), ('d', 2), ('c', 1)]

				

上面可以看到，Counter是使用一個字典儲存了資源的熱度，字典的鍵是資源的名稱，字典的值是訪問的次數。

下面的程式碼使用Counter統計網站中最熱4項資源：

cat d:\\temp\\accesslog.txt

				
24.243.232 - - [29/Mar/2009:06:05:34 +0200] "GET /index.php HTTP/1.1" 200 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"
232.243.232 - - [29/Mar/2009:06:225:34 +0200] "GET /a.php HTTP/1.1" 200 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"
252.433.232 - - [29/Mar/2009:04:05:34 +0200] "GET /b.php HTTP/1.1" 200 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"
252.253.232 - - [29/Mar/2009:06:05:55 +0200] "GET /index.php HTTP/1.1" 200 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"
252.243.232 - - [29/Mar/2009:07:05:34 +0200] "GET /index.php HTTP/1.1" 200 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"
252.243.232 - - [29/Mar/2009:07:05:34 +0200] "GET /index.php HTTP/1.1" 200 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"

下面是python程式碼：

cat test.py

				
#coding:utf-8
from __future__ import  print_function
from collections import Counter
c = Counter()
with open('d:\\temp\\accesslog.txt') as f:
    for line in f:
        c[line.split()[6]] += 1
       
print("Popular resources :{0}".format(c.most_common(2)))

>>>python test.py

Popular resources :[('/index.php', 4), ('/a.php', 1)]

需求3：統計網站的出錯比例

思路：如果HTTP CODE為2xx或3xx,則視為訪問正確；如果HTTP CODE為4xx或者5xx，則視為訪問出錯。

出錯比例=出錯的請求數/總請求數

方法：

				
[root@localhost ~]# cat access.log 
24.243.232 - - [29/Mar/2009:06:05:34 +0200] "GET /index.php HTTP/1.1" 200 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"
232.243.232 - - [29/Mar/2009:06:225:34 +0200] "GET /index.php HTTP/1.1" 300 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"
252.433.232 - - [29/Mar/2009:04:05:34 +0200] "GET /index.php HTTP/1.1" 400 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"
252.253.232 - - [29/Mar/2009:06:05:55 +0200] "GET /index.php HTTP/1.1" 402 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"
252.243.232 - - [29/Mar/2009:07:05:34 +0200] "GET /index.php HTTP/1.1" 300 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"
252.243.232 - - [29/Mar/2009:07:05:34 +0200] "GET /index.php HTTP/1.1" 200 8741 "-" "Mozilla/5.0 (compatible; PJBot/3.0; +)" "-"

python程式碼：

				
			

								

								
[root@localhost ~]# cat test.py 
#!/usr/bin/python
#coding:utf-8
from __future__ import print_function
d = {}
with open('/root/access.log') as f:
    for line in f:
        key = line.split()[8]
        d.setdefault(key,0)
        d[key] += 1
sum_requests = 0
error_requests = 0
#d.iteritems()表示將d字典中所有項都顯示出來
for key,val in d.iteritems():
    if int(key) >= 400:
        error_requests += val
    sum_requests += val
#.2表示四捨五入保留一位小數點，f是表示列印出定點數。定點數和浮點數的意思一樣，即列印出小數。
print('error rate: {0:.2f}%'.format(error_requests *100.0 / sum_requests))

				

				
[root@localhost ~]# python test.py 
error rate: 33.33%

小知識：

python字典的items方法作用：是可以將字典中的所有項，以列表方式返回。如果對字典項的概念不理解，可以檢視Python對映型別字典基礎知識一文。因為字典是無序的，所以用items方法返回字典的所有項，也是沒有順序的。
python字典的iteritems方法作用：與items方法相比作用大致相同，只是它的返回值不是列表，而是一個迭代器。
呼叫格式
字典items()與iteritems()都是函式，呼叫標準格式和其它函式格式是一樣的：變數.方法()
操作方法
字典items()操作方法：
>>> x = {'title':'python web site','url':''}
>>> x.items()
[('url', ''), ('title', 'python web site')]
從結果中可以看到，items()方法是將字典中的每個項分別做為元組，新增到一個列表中，形成了一個新的列表容器。如果有需要也可以將返回的結果賦值給新變數，這個新的變數就會是一個列表資料型別。
>>> a=x.items()
>>> a
[('url', ''), ('title', 'python web site')]
>>> type(a)
<type 'list'>

dict iteritems()操作方法：
>>> f = x.iteritems()
>>> f
<dictionary-itemiterator object at 0xb74d5e3c>
>>> type(f)
<type 'dictionary-itemiterator'> #字典項的迭代器
>>> list(f)
[('url', ''), ('title', 'python web site')]
字典.iteritems()方法在需要迭代結果的時候使用最適合，而且它的工作效率非常的高。

8、字串格式化

在python中，存在兩種格式化字串的方法，即%表示式和format函式。

雖然%表示式目前還被廣泛使用，但是format函式才是字串格式的為了，因此本節只介紹format函式。

最簡單的format函式使用應該是透過引數的位置訪問引數。如下所示，透過{}來表示一個佔位符，python會自動將format函式的引數依次傳遞給{}佔位符：

				
In [7]: "{0} is better than {1},{2} is better than {3}".format('Beautiful','ugly','Explicit','explicit')
Out[7]: 'Beautiful is better than ugly,Explicit is better than explicit'

在引數較少的情況下，透過佔位符或下標的形式訪問format函式的引數並沒什麼問題，如果引數較多就不太適合了，這個時候可以使用解釋下更強的關鍵字引數形式，如下：

				
In [8]: d = dict(good1='Beautiful',bad1='ugly',good2='Explicit',bad2='implicit')
In [9]: d
Out[9]: {'bad1': 'ugly', 'bad2': 'implicit', 'good1': 'Beautiful', 'good2': 'Explicit'}
In [10]: "{good1} is better than {bad1},{good2} is better than {bad2}".format(**d)
Out[10]: 'Beautiful is better than ugly,Explicit is better than implicit'

format函式也可以直接訪問物件的屬性，如下所示：

				
In [15]: from collections import namedtuple
In [16]: Person = namedtuple('Person','name age sex')
In [17]: xm = Person('xiaoming',20,'male')
In [18]: xm
Out[18]: Person(name='xiaoming', age=20, sex='male')
In [19]: "{p.name} {p.age} old this year".format(p=xm)
Out[19]: 'xiaoming 20 old this year'

說明：namedtuple是繼承自tuple的子類。namedtuple建立一個和tuple類似的物件，而且物件擁有可訪問的屬性。

tuple是元組的意思。

下面的例子是對format函式的精度、符號、寬度、對齊方式、字元填充、逗號分隔等格式進行測試：

				
In [27]: "{0:.2f}".format(3.1415926)
Out[27]: '3.14'
 
In [28]: "{0:+.2f}".format(3.1415926)
Out[28]: '+3.14'
In [29]: "{0:10.2f}".format(3.1415926)
Out[29]: '      3.14'
    
In [30]: "{0:^10.2f}".format(3.1415926)
Out[30]: '   3.14   '
In [31]: "{0:_^10.2f}".format(3.1415926)
Out[31]: '___3.14___'
In [31]:"{0:,}".format(1234567)  //python 2.6不支援逗號分隔符，python 2.7及以上版本才支援
'1,234,567'
In [31]: "{0:_^+20,.2f}".format(1234.567)   //python 2.6不支援逗號分隔符，python 2.7及以上版本才支援
'_____+1,234.57______'

9、正規表示式

內建的字串處理函式能解決很多的字元處理問題，但是也是有一些比較複雜的情況，內建字串無法處理，或者沒法優雅的處理，這時用更具表達能力的正規表示式就好辦了。如下的一個字串，現在需要同時用":"和“.”進行split，這對於python的內建字串函式來說，就比較棘手了。但此時如果用正規表示式，就能輕鬆解決：

				
In [45]: data = 'Last login:Thu Mar 2 10:04:52 2017 from 11.113.197.131'
In [47]: import re;re.split('[:.]\s*',data)              //說明：\s也是正規表示式，表示匹配任意空白。
Out[47]: ['Last login', 'Thu Mar 2 10', '04', '52 2017 from 11', '113', '197', '131']

9.1正規表示式語法

1）要匹配給定文字中的所用單詞，可以使用下面的正規表示式：

?[a-zA-Z]+

?用於匹配單詞前後可能出現的空格，[a-zA-Z]+程式碼一個或多個英文字母。

2）要匹配一個ip，可以使用下面的正規表示式：

[0-9](1,3)\.[0-9](1,3)\.[0-9](1,3)\.[0-9](1,3)

表 1. 正規表示式基本語法

正規表示式的基本語法
正規表示式	描述	示例
^	行起始標記	^imp匹配以imp起始的行
$	行尾標記	import$匹配以import結尾的行
,	匹配任意一個字元	它只能匹配單個字元，但是可以匹配任意字元，如linu.可以匹配linux與linus
[]	匹配包含在[字元]之間的任意字元	coo[lk]能夠匹配cook或者cool
[^]	匹配包含在[字元]之外的任意字元	9[^01]可以匹配92,93，但是不匹配91或90
[-]	匹配[]中指定範圍內的任意一個字元	[1-5]匹配1~5的任意一個數字，[a-z]匹配任意一個小寫字母
?	匹配之前項的一次或0次	hel?o匹配hello或helo，但不能匹配helllo
+	匹配之前項的1次或多次	hel+匹配hel和hell，但是不能匹配he
*	匹配之前項的0次或多次	hel*匹配he，hel，hell
{n}	匹配之前項的n次	[0-9]{3}匹配任意一個三位數
{n,}	之前的項至少要匹配n次	[0-9]{3,}匹配任意一個三位數或更多的數字
{n,m}	指定之前的項所必須匹配的最小次數和最大次數	[0-9]{2,5}匹配從兩位數到五位數之間的任意一個數字

表 2. 正規表示式特殊序列

來源： https://www.ibm.com/developerworks/cn/opensource/os-cn-pythonre/

特殊表示式序列	意義
\A	只在字串開頭進行匹配。
\b	匹配位於開頭或者結尾的空字串
\B	匹配不位於開頭或者結尾的空字串
\d	匹配任意十進位制數，相當於 [0-9]
\D	匹配任意非數字字元，相當於 [^0-9]
\s	匹配任意空白字元，相當於 [ \t\n\r\f\v]
\S	匹配任意非空白字元，相當於 [^ \t\n\r\f\v]
\w	匹配任意數字和字母，相當於 [a-zA-Z0-9_]
\W	匹配任意非數字和字母的字元，相當於 [^a-zA-Z0-9_]
\Z	只在字串結尾進行匹配

9.2 利用re庫處理正規表示式

python中，標準庫的re模組用來處理正規表示式，它能夠順利處理unicode和普通字串，這個模組包含了與正規表示式相關的函式、標誌和一個異常。

最常用的是re模組下的findall函式，用來輸出所有符合模式匹配的子串，如下：

				
In [50]: import re
In [55]: data = "What is the difference between  python 2.7.13 and Python 3.6.0.7 "
In [56]: re.findall('python [0-9]\.[0-9]\.[0-9]',data)
Out[56]: ['python 2.7.1']
如果希望re模組在模式匹配的時候忽略字元的大小寫，可以用flags=re.IGNORECASE：
In [60]: re.findall('python [0-9]\.[0-9]\.[0-9]',data,flags=re.IGNORECASE)
Out[60]: ['python 2.7.1', 'Python 3.6.0']

在python中，有兩種使用正規表示式的方式：

第一種就是直接使用re模組中的函式，正如前面例子演示的那樣；

第二種是建立一個特定模式編譯的正規表示式物件，然後使用這個物件的方法。

編譯的正規表示式：它是一個簡單的物件，透過傳遞模式給re.compile函式建立。

編譯和非編譯正規表示式，在效能方面是有差異的；如果需要處理的資料量比較大，編譯以後的正規表示式處理效率會更好。

編譯的正規表示式使用方法如下：

				
In [61]: import re
In [62]: data = "What is the difference between  python 2.7.13 and Python 3.6.0.7 "
In [64]: re_obj = re.compile('[0-9]\.[0-9]\.[0-9]',flags=re.IGNORECASE)   //建立一個編譯的正規表示式物件
In [66]: re_obj.findall(data)
Out[66]: ['2.7.1', '3.6.0']

在python中，根據個人喜好選擇使用哪種正規表示式的方法，大部分情況下都不會有什麼問題。但是如果需要處理的資料量比較大，則使用編譯的正規表示式處理效能會更高。例如，我們使用Linux下的seq命令產生1000萬個整數儲存到檔案中，該資料檔案大約76M。接下來，對檔案的每一行應用模式[0-9]+，並使用linux下的time工具統計程式執行的時間。

				
[root@localhost ~]# seq 1 10000000 > data.txt

非編譯的正規表示式版本原始碼如下：

				
[root@localhost ~]# cat re_nocompile.py 
#!/usr/bin/python
#coding:utf-8
import re   
def main():
    pattern = "[0-9]+"
    with open('/root/data.txt') as f:
        for line in f:
            re.findall(pattern,line)
if __name__ == '__main__':
    main()

				
[root@localhost ~]# time python re_nocompile.py 
real 0m16.103s
user 0m16.047s
sys 0m0.059s

編譯的正則表示版本原始碼如下：

				
			

								

								
[root@localhost ~]# cat re_compile.py 
#!/usr/bin/python
#coding:utf-8
import re   
def main():
    pattern = "[0-9]+"
    re_obj = re.compile(pattern)
    with open('/root/data.txt') as f:
        for line in f:
            re_obj.findall(line)
if __name__ == '__main__':
    main()

				

				
[root@localhost ~]# time python re_compile.py 
real 0m7.901s
user 0m7.865s
sys 0m0.039s

可以看到：非編譯版本的python程式碼花費了16s，編譯版本程式碼花費了7s。

9.3 常用的re方法

9.3.1 匹配類函式

re模組中的findall函式。

re模組中的match函式類似於字串中的startswith函式，只是match應用在正規表示式中更強大，更具有表現力。match函式用以匹配字串的開始部分，如果匹配成功，

則返回一個SRE_Match型別的物件，如果匹配失敗，則返回一個None。因此，對於普通的字首匹配，它的用法和startswith一模一樣。例如我們要判斷data字串是否以What和not what開頭：

				
			

								

								
In [73]: data = 'What is the difference between  python 2.7.13 and Python 3.6.0.7'
In [74]: data.startswith('What')
Out[74]: True
In [75]: data.startswith('not What')
Out[75]: False
In [76]: import re
In [77]: re.match('What',data)
Out[77]: <_sre.SRE_Match at 0x2ada5e0>
    
In [78]: if re.match('What',data):
   ....:     print(True)
   ....: else:
   ....:     print(False)
   ....:     
True

				

雖然簡單使用時match函式和startswith函式類似，但是對於複雜情況時，match函式能夠輕易解決，startswith則無能無力。例如我們需要判斷一個文字字元是否以一個數字開頭。由於我們不知道是哪個數字，只知道要求是數字，因此無法使用startswith函式，這個時候，可以使用re模組的match函式輕鬆解決，如下：

				
In [79]: re.match('\d+','123 is one hundred and twenty-three')
Out[79]: <_sre.SRE_Match at 0x2ada578>

match函式匹配成功時返回SRE_Match型別的物件，也可以透過該物件獲取匹配的字串：

				
In [83]: r = re.match('[0-9]+','123 is one hundred and twenty-three')
In [84]: r.start()
Out[84]: 0
In [85]: r.end()
Out[85]: 3
In [87]: r.re
Out[87]: re.compile(r'[0-9]+')
In [88]: r.string
Out[88]: '123 is one hundred and twenty-three'
In [89]: r.group()
Out[89]: '123'

re模組中的search函式模式匹配成功時，也會返回一個SRE_Match物件。search函式與match函式用法幾乎一模一樣，區別在於search函式在字串的任意位置進行匹配，match函式僅在字串的開始部分進行匹配。他們的共同點是，如果匹配成功，返回SRE_Match物件，如果匹配失敗，返回一個None。

前面說過，search僅僅在查詢第一次匹配，也就是說，一個字串中包含了多個模式的匹配，也只返回第一個匹配的結果。如果我們要返回所有的結果應該怎麼做呢?返回所有結果的最簡單放放風就是用findall函式，除此之外，也可以使用finditer函式。

finditer返回一個迭代器，遍歷迭代器可以得到一個SRE_Match物件,如下：

				
			

								

								
In [91]: data
Out[91]: 'What is the difference between  python 2.7.13 and Python 3.6.0.7'
In [92]: r = re.finditer('[0-9]\.[0-9]\.[0-9]',data)
In [93]: re.finditer('[0-9]\.[0-9]\.[0-9]',data)
Out[93]: <callable-iterator at 0x2ad98d0>
In [94]: r
Out[94]: <callable-iterator at 0x2ad93d0>
In [95]: for it in r:
   ....:     print(it.group(0))
   ....:     
2.7.1
3.6.0

				

9.3.2 修改類函式

re模組的sub函式類似於字串的replace函式，只是sub函式支援使用正規表示式。所以，re模組中的sub函式使用場景更加廣泛。

如下面的正規表示式，可以同時匹配2.7.13和3.6.0，並將它們都替換為x.x.x，如下：

				
In [96]: data
Out[96]: 'What is the difference between  python 2.7.13 and Python 3.6.0.7'
In [97]: re.sub('[0-9]+\.[0-9]+\.[0-9]+','x.x.x',data)
Out[97]: 'What is the difference between  python x.x.x and Python x.x.x.7'

利用sub函式將下面的日期進行格式化：

				
In [98]: text = 'Today is 3/2/2017,PyCon starts 5/25/2017'
In [100]: re.sub(r'(\d+)/(\d+)/(\d+)',r'\3-\1-\2',text)
Out[100]: 'Today is 2017-3-2,PyCon starts 2017-5-25'

re模組的split函式與python字串的split函式功能一樣，都是將一個字串拆分成子串的列表，區別在於re模組的split函式能夠使用正規表示式。例如下面這段包含冒號、逗號、單引號和若干空格的文字，我們希望拆分出每一個單詞。面對這個需求python內建的split函式無法進行處理，因此我們可以直接使用re模組的split函式，re模組的split函式可以指定多個分隔符，如下：

				
			

								

								
In [112]: text = "MySQL slave binlog position:master host '10.173.33.35',filename 'mysql-bin.00002',position '52499343'"
In [113]: re.split(r"[':,\s]+",text.strip("'"))
Out[113]: 
['MySQL',
 'slave',
 'binlog',
 'position',
 'master',
 'host',
 '10.173.33.35',
 'filename',
 'mysql-bin.00002',
 'position',
 '52499343']

				

說明：\s也是正規表示式，表示匹配任意空白。

9.3.3 大小寫不敏感

我們在字串查詢或替換的時候?忽略字元的大小寫，如下：

				
In [114]: text = "UPPER PYTHON,lower python,Mixed python"
In [116]: re.findall('python',text,flags=re.IGNORECASE)
Out[116]: ['PYTHON', 'python', 'python']
Out[116]: re.sub('python','snake',text,flags=re.IGNORECASE)   //python2.7的SUB函式不支援flags，python3.0以上才支援
Out[116]:'UPPER snake,lower snake,Mixed snake

9.3.4非貪婪匹配

貪婪匹配：總是匹配最長的那個字串，預設使用貪婪匹配；

非貪婪匹配：匹配到那個最小的字串。

例子：

在下面的這個例子中，我麼你要匹配以Beautiful開頭並且以點號結尾的字串。顯然，存在兩個符合條件的匹配。預設情況下使用貪婪匹配，如果要使用非貪婪匹配，只需要在匹配的字串上加一個？，如下：

				
In [124]: text = "Beautiful is better than ugly.Explicit is better than implicit."
In [126]: re.findall('Beautiful.*\.',text)
Out[126]: ['Beautiful is better than ugly.Explicit is better than implicit.']
    
In [127]: re.findall('Beautiful.*?\.',text)     //非貪婪匹配
Out[127]: ['Beautiful is better than ugly.']

9.3.4 案例：獲取HTML頁面中的所有超連結

在這個例子中，我們使用開源的requests庫獲取Hack News的內容，然後使用正規表示式解析出所有的http或者https連結。

由於requests是一個開源專案，而不是標準庫，所有，使用之前需要安裝：

				
			

								

								
[root@localhost ~]# pip install requests
In [128]: import requests
In [129]: import re
In [152]: r = requests.get('')
In [153]: print(r.content)
 <td align="right" valign="top" class="title"><span class="rank">30.</span></td>      <td valign="top" class="votelinks"><center><a id='up_16043860' href='vote?id=16043860&amp;how=up&amp;goto=news'><div class='votearrow' title='upvote'></div></a></center></td><td class="title"><a href="" class="storylink">Hundreds of Pterosaur Eggs Found in Record-Breaking Fossil Haul</a><span class="sitebit comhead"> (<a href="from?site=nationalgeographic.com"><span class="sitestr">nationalgeographic.com</span></a>)</span></td></tr><tr><td colspan="2"></td><td class="subtext">
In [154]: re.findall('"(https?://.*?)"',r.content)    //雙引號表示匹配雙引號
Out[154]: 
 '',
 '',
In [160]: re.findall('(https?://.*?)"',r.content)    //雙引號表示匹配雙引號
In [164]: re.findall('https?://.*?"',r.content)   //我發現不用（）也行

				

10 字符集編碼

把unicode字元表示為二進位制的方法有很多種，最常見的編碼方式是UTF-8，但是讀者需要注意的是，unicode是編寫形式，utf-8是儲存形式。UTF-8是使用最廣泛的編碼，但僅僅是unicode的一種儲存形式。使用python處理unicde時，如想把unicode字元轉換成二進位制資料，可以使用encode（編碼）方法；若想把二進位制資料轉換成unicode字元，可以使用decode（解碼）方法。此外，也可以使用pyhon2中的codecs模組和python3中的open函式來指定編碼型別。

在python 3中，字串預設為Unicode，但如果是在python 2中需要使用unicode，則必須在字串前面顯示的加上一個"u"字首，如下所示：

				
In [5]: name = u'陳志新'

在python 2中，也可以使用預設的unicode字串，只需要執行下面的匯入即可：

				
In [6]: from __future__ import unicode_literals

python的字串具有encode編碼和decode解碼方法，下面是一個在python 2中的例子：

				
In [1]: name = '陳志新'
In [2]: name
Out[2]: '\xe9\x99\x88\xe5\xbf\x97\xe6\x96\xb0'
In [4]: new_name = name.decode('utf8')
In [5]: new_name
Out[5]: u'\u9648\u5fd7\u65b0'
In [6]: print new_name
陳志新
In [7]: new_name.encode('utf8')
Out[7]: '\xe9\x99\x88\xe5\xbf\x97\xe6\x96\xb0'

如果我們想把中文寫入檔案裡面，如果不保證字符集一致會報錯，如下：

				
In [13]: name = u'陳志新'
In [14]: with open('/root/czx.txt','w') as f:
   ....:     f.write(name)
   ....:     
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-14-0357fe0fce61> in <module>()
      1 with open('/root/czx.txt','w') as f:
----> 2     f.write(name)
      3 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

上面報錯的原因是：

因為我們定義了一個unicode字串'u陳志新'。隨後，歐美想把這個字串儲存到文字檔案中，由於我們沒有指定文字檔案的編碼，所以預設是ASCII編碼，顯然unicode表示的漢子是無法使用ASCII編碼進行儲存的。所以python丟擲了UnicodeEncodeError異常。

解決辦法，手動編碼的方式解決，這是因為一個ASCII文字本身也是一個UTF-8文字。

				
In [15]: name = u'陳志新'
In [16]: with open('/root/czx.txt','w') as f:
   ....:     f.write(name.encode('utf-8'))
   ....:

小知識：unicode是表現形式，utf8是儲存形式。

				
In [17]: with open('/root/czx.txt','r') as f:
   ....:     data = f.read()
   ....:     
In [18]: print data
陳志新
In [19]: data.decode('utf8')
Out[19]: u'\u9648\u5fd7\u65b0'

如果需要寫入的字串比較多，而每次都需要進行編碼，程式將會變得非常低效，在python2中可以使用codecs模組，在python3內建的open函式已經支援編碼格式。指定編碼格式後，當我們寫入時會自動將unicode轉換為特定的編碼，讀取檔案時，會以特定的UTF進行解碼：

在python2中，使用codecs模組進行編碼（寫入檔案）和解碼（讀取檔案）：

				
			

								

								
In [20]: import codecs
In [21]: name = u'陳志新'
In [22]: name
Out[22]: u'\u9648\u5fd7\u65b0'
In [23]: with codecs.open('/root/czx.txt','w',encoding='utf-8') as f:
   ....:     f.write(name)
   ....:     
In [24]: with codecs.open('/root/czx.txt','r',encoding='utf-8') as f:
   ....:     data = f.read()
   ....:     
In [25]: data
Out[25]: u'\u9648\u5fd7\u65b0'

				

在python3中，內建的open函式可以指定字符集編碼：

				
In [27]: name = '陳志新'
In [28]: name
Out[28]: '陳志新'
In [30]: with open('/root/czx.txt','w',encoding='utf-8') as f:
   ....:     f.write(name)
   ....:

在python程式設計中，應該把編碼和解碼操作放在程式的最外圍處理，程式的核心部分都使用unicode，為了在程式核心部分使用unicode，可以在程式碼中使用下面的輔助函式，函式能接受str或者unicode型別並且返回需要的字串型別：

python 2的字符集處理輔助函式：

				
			

								

								
def to_unicode(unicode_or_str):
    if isinstance(unicode_or_str,str):
        value = unicode_or_str.decode('utf-8')
    else:
        value = unicode_or_str
    return value
def to_str(unicode_or_str):
    if isinstance(unicode_or_str,str):
        value = unicode_or_dir.encode('utf-8')
    else:
        value = unicode_or_str
    return value

				

python 3的字符集處理輔助函式：

				
			

								

								
def to_str(bytes_or_str):
    if isinstance(bytes_or_str,bytes):
        value = bytes_or_str.decode('utf-8')
    else:
        value = bytes_or_str
    return value
def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str,str):
        value = bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value

				

11、jinja2模板

系統管理員和開發工程師應該經常使用模板來管理配置檔案。

渲染：可以使用模板將業務邏輯與頁面邏輯分隔開來。模板包含的是一個響應文字的檔案，其中包含用佔位符變數表示的動態部分，其具體值只在請求的上下文中才能知道。使用真實的值替換變數，再返回最終得到的相應字串，這一過程稱為渲染。

一句話，渲染就是用真實的值替換變數。

python的標準庫自帶了一個簡單的模板，下面的例子就是。不過，Python自帶的模板功能非常有限，例如無法在模板中使用控制語句和表示式，不支援繼承和重用等操作，這對於web開發來說遠遠不夠，所以才出現了第三方的模板系統，最知名的就是jinja2和Mako。

				
In [1]: from string import Template
In [2]: s = Template('$who is a $role')
In [3]: s.substitute(who='bob',role='teacher')
Out[3]: 'bob is a teacher'
In [4]: s.substitute(who='lily',role='student')
Out[4]: 'lily is a student'

11.1 jinja2模板語法入門

jinja2是Flask的一個依賴，如果已經安裝了Flask，jinja2也會隨之安裝。當然也可以單獨安裝jinja2：

[root@localhost ~]# pip install jinja2

[root@localhost ~]# python -c "import jinja2"

11.2 語法塊

在Jinja2中，存在三種語法：

a）控制結構{%%}

b)變數取值{}

c)註釋{##}

下面是一個使用jinja控制結構和註釋的一個例子：

{#note:disable temoate because we no longer use this

{% for user in users %}

....

{%endfor%}

#}

11.3 變數

jinja2模板中使用{{}}語法表示一個變數。jinja2識別所有的python資料型別，甚至是一個複雜的型別，如列表，字典和物件等，如下：

{{ myobj.somemethod{} }}

11.4 jinja2中的過濾器

過濾器可以理解為是jinja2裡面的內建函式和字串處理函式。

jinja2的常用過濾器
過濾器名	說明
safc	渲染時不轉譯
capitalize	把值的首字母轉換成大寫，其他字母轉換成小寫
lower	把值轉換為小寫形式
upper	把值轉換為大寫形式
title	把值中的每個單詞首字母轉換成大寫
trim	把值的首尾空格去掉
striptags	渲染之前把值中所有的html標籤都刪掉
join	拼接多個值為字串
replace	替換字串的值
round	預設對數字進行四捨五入，也可以用引數進行控制
int	把值轉換為整型

完整的過濾器列表可參見：

在Jinja2中，變數可以透過過濾器進行修改，過濾器和變數之間用管道 | 進行分割。多個過濾器可以鏈式呼叫，前一個過濾器的輸出會做為後一個過濾器的輸入，如下：

-->Goodbye World

-->GOODYE WORLD

---> 43.0

---> 43

11.5 jinja2的控制結構

jinja2中的if語句類似於python中的if語句，但是，需要使用endif語句做為條件判斷的結束。我們可以使用if語句判斷一個變數是否定義，是否為空，是否為真值。與python中的if語句一樣，頁可以使用elif和else構建分支，如下：

{% if kenny.sick %}

kenny is sick.

{% elif kenny.dead %}

You killed Kenny! You bastard!

{% else %}

Kenny looks ok

{% endif %}

11.6 jinja2的for迴圈

jinja2中的for語句可以迭代pytho的資料型別，包括列表、元組和字典。在jinja2中不存在while迴圈，這也符合了jinja2的“僅提供控制結構，不允許在模板中編寫太多的業務邏輯，避免工程師亂用的行為”的設計原則。

在jinja2中迭代列表：

				
<h1>Member<h1>
<u1>
 {% for user in users %}
  <li>{{ user.username}} </li>
  <% endfor %>
</u1>

在jinja2中也可以遍歷字典：

				
<d1>
 {% for key,value in d.iteriterms %}
  <dt>{{ key }} </dt>
  <dd>{{ value }}</dd>
 <% end for %>
</d1>

除了基本的for迴圈使用外，jinja2還提供了一些特殊的變數，我們不用定義就可以直接使用這些變數，如下：

變數	描述
loop.index	當前迴圈迭代的次數，從1開始
loop.index0	當前迴圈迭代的次數，從0開始
loop.revindex	到迴圈結束的次數，從1開始
loop.revindex0	到迴圈結束的次數，從0開始
loop.first	如果是第一次迭代，為True，否則為False
loop.last	如果是最後一次迭代，為True,否則為False
loop.length	序列中的專案數
loop.cycle	在一串序列間取值的輔助函式

需求：

假設你有一個儲存了聯絡人資訊的自動，字典的key是聯絡人的名字，字典的value是聯絡人的電話。你現在想把聯絡人的資訊以表格的形式顯示在HTML頁面上。此時除了姓名和電話外，你希望表格的第一列是序號，這個需要在python的程式碼中是這樣實現的：

				
In [2]: data = dict(bob=1300000001,lily=1300000002,robin=130000003)
In [3]: index = 0
In [7]: for key,value in data.items():
   ...:     index += 1
   ...:     print(index,key,value,sep=",")
   ...:     
1,bob,1300000001
2,lily,1300000002
3,robin,130000003

jinja2為了防止工程師儘可能在模板中少寫python程式碼處理業務邏輯，僅在模板處理顯示工作問題，提供了一些特殊變數。對於上面的例子，在jinja2中正確的做法如下：

				
{% for key,value in data.iteriterms() %}
 <tr class="info">
  <td>{{ loop.index}} </td>
  <td> {{ key}} </td>
  <td> {{ value}} </td>
 </tr>

11.7 jinja2的宏

宏類似於程式語言宏的函式，它用於將行為抽象成可重複的程式碼塊，與函式一樣，宏分為定義和呼叫。

宏的定義的例子：

				
{% macro input(name,type='text',value='') %}
 <input type="{{type}}" name={{name}} value="{{value}}" >
{% end macro %}

透過上面的例子可以看到，使用macro關鍵字定義一個宏，input是宏的名稱。它有三個引數，分別是name,type和value，其中type和value蠶食有預設值。可以看到宏的定義和python中的函式定義非常相似。此外，他與jinja2宏的for迴圈和if語句一樣，不需要使用複合語句的冒號，使用endmacro結束宏的定義。

宏的呼叫的例子：

				
<p> {{ input('username',value='user')}} </p>
<p> {{ input('password','password')}</p>
<p> {{input('submit','submit','submit' \)}} </p>

11.8 jinja2的繼承和Super函式

如果只是用jinja2進行配置檔案的管理，將基本用不到jinja2的繼承功能。如果是使用Jinja2進行web開發，那麼，繼承將是jinja2最吸引人的功能。

jinja2中最強大的部分就是模板的繼承。模板繼承允許你構建一個包含站點共同元素的基本模板“骨架”，並定義子模板可以覆蓋的塊。

假設我們有一個名為base.html的HTML文件，裡面內容如下：

				
			

								

								
<html land='en'>
<head>
 {% block head %}
 <link rel="stylesheet" href="style.css" />
  <title> {% block title %}{% endblock %} </title>
 {% endblock %}
</head>
<body>
<div id="content">
 {% block content %}{% endblock %}
</div>
</body>

				

在base.html中，我們使用{% block name %}的方式定義了三個塊，這些塊可以在子模板中進行替換或呼叫。

下面是一個名為index,html的HTML文件，文件的內容如下：

				
			

								

								
{% extends "base.html" %}
{% block title%}index{% endblock %}
<!-- -head模組被過載，並且使用super繼承了base.html中head的內容 -->
{ block head }
 {{ super() }}
 <style type="text/css" >
 .important ( color: #336699)
 </style>
{% end block %}
{% block content %}
 <h1>Index</h1>
 <p class="important"> Welcom on my awwsome homepage </p>
{% endblock %}

				

在上面的Index.html中，我們使用{% extends "base.html" %}繼承base.html，繼承以後，base.html中的所有內容都會在index.html中展現。在Index.html中，我們重新定義了title和content這連個塊的內容。

Super是繼承的意思。

11.9 jinja2的其他運算

jinja2中可以定義變數，為了對變數進行操作，jinja2提供了算數操作、比較操作和邏輯操作。使用jinja2模板時，應該儘可能在python程式碼中進行邏輯處理，在jinja2中僅處理顯示問題。因此，一般很少用到jinja2的變數和比那裡的運算操作。

部分jinja2的運算操作：

a)算術運算：+-* % * **

b）比較操作：== != > >= < <=

c) 邏輯操作：not and or

12 jinja2實戰

jinja2模組中有一個名為Environment的類，這個類的例項用於儲存配置和全域性物件，然後從檔案系統其他位置載入模板。

大多數應用都在初始化時建立一個Environment物件並用它載入模板。配置jinja2為應用載入文件的最簡單方式大概是這樣的：

				
In [3]: from jinja2 import Environment,PackageLoader
In [4]: env = Environment(Loader=PackageLoader('yourapplication','template'))

上面的程式碼會建立一個Environment物件和一個包載入器，該載入器會在yourapplication這個python包的templates目錄下查詢模板。接下來，只需要以模板的名字做為引數呼叫Environment.get_template方法即可。該方法會返回一個模板，最後使用模板的render方法進行渲染，如下：

				
In [10]: template = env.get_template('mytemplate.html')
In [12]: print(template.render(the='variables',go='here'))

除了使用包載入器外，還可以使用檔案系統載入器。檔案系統載入器不需要模板位於一個python包下，可以直接訪問系統中的檔案。為了便於功能演示，我們將在接下來請的例子中使用下面的輔助函式：

				
[root@localhost ~]# cat a.py 
#!/bin/python
#coding:utf8
import os
import jinja2
def render(tpl_path,**kwargs):
    path,filename= os.path.split(tpl_path)
    return jinja2.Environment(
        loader = jinja2.FileSystemLoader(path or './')
    ).get_template(filename).render(**kwargs)

備註：

tpl_path：模板路徑

**kwargs：需要渲染模板裡面的引數

小知識：

如果是函式定義中引數前的*表示的是將呼叫時的多個引數放入元組中,**則表示將呼叫函式時的關鍵字引數放入一個字典中

如定義以下函式

def func(*args):print(args)

當用func(1,2,3)呼叫函式時,引數args就是元組(1,2,3)

定義以下函式

def func(**args):print(args)

當用func(a=1,b=2)呼叫函式時,引數args將會是字典{'a':1,'b':2}

12.1 基本功能演示

下面來看一個模板渲染的例子，假設我們存在一個名為simple.html的文字檔案，它的內容如下：

				
			

								

								
<!DOCTYPE html>
<html lang="en">
    <head>
        <!--使用過濾器處理表示式的結果 -->
        <title>{{ title | trim }} </title>
    </head>
    <body>
        <!--註釋 -->
        {# This is a commnet #}
        <ul id="navigation">
             <!-- for語句，以endfor結尾 -->
             {% for item in items %}
                 <!-- 訪問變數的屬性 -->
                 <li><a href="{{ item.href}}"> {{item['caption'] }}</a></li>
             {% endfor %}
        </ul>
            <p>{{ content }}</p>
    </body>
</html>

				

在這個HTML模板中，我們使用for迴圈遍歷一個列表，列表中每一個項是一個字典，字典中包含了文字和連結，我們將使用字典中的資料渲染成HTML的超連結。此外，我們還會使用jinja2提供的過濾器trim刪掉titile裡面的空格。

				
			

								

								
[root@localhost ~]# cat  a.py 
#!/bin/python
#coding:utf8
import os
import jinja2
def render(tpl_path,**kwargs):
    path,filename= os.path.split(tpl_path)
    return jinja2.Environment(
                loader = jinja2.FileSystemLoader(path or './')
         ).get_template(filename).render(**kwargs)
def test_simple():
    title = "Title H     "
    items = [{'href':'a.com','caption':'ACaption'},{'href':'b.com','caption':'Bcaption'}]
    content = "This is content"
    result = render('simple.html',**locals())
    print(result)
if __name__=='__main__':
    test_simple()

				

小知識

a)、locals返回當前作用域的所有區域性變數的變數名:變數值組成的字典。

例如：當前作用域有兩個區域性變數x=1,y='something'則locals()返回字典

{'x':1,'y':'something'}

b)、**locals()在format函式呼叫裡的意思是將locals()返回的字典解包傳遞給format函式。如果locals返回的如上面的例子裡說的一樣的話，解包就是將{'x':1,'y':'something'}變成x=1,y='something'

執行上面程式碼，渲染模板的結果為：

				
			

								

								
[root@localhost ~]# python a.py 
]<!DOCTYPE html>
<html lang="en">
    <head>
        <!--使用過濾器處理表示式的結果 -->
        <title>Title H </title>
    </head>
    <body>
        <!--註釋 -->
        
        <ul id="navigation">
             <!-- for語句，以endfor結尾 -->
             
                 <!-- 訪問變數的屬性 -->
                 <li><a href="a.com"> ACaption</a></li>
             
                 <!-- 訪問變數的屬性 -->
                 <li><a href="b.com"> Bcaption</a></li>
             
        </ul>
            <p>This is content</p>
    </body>
</html>

				

12.2 繼承功能演示

為了演示繼承的功能，我們需要使用兩個html檔案，分別是base.html和index.html。

base.html內容如下：

				
			

								

								
<!DOCTYPE html>
<html lang="en">
    <head>
        <!--定義程式碼塊，可以在子模組中過載 -->
        {% block head %}
            <link rel="sty.esheet" href="style.css" />
            <title> {% block title%}{% endblock %} --My Webpage </title>
        {% endblock %}
    </head>
     <body>
            <div id="content">
                 <!-- 定義程式碼塊，沒有提供預設內容 -->
                {% block content %}
                {% endblock %}
            </div>    
            <div id="footer">
                 <!-- 定義程式碼塊，沒有提供預設內容 -->
                {% block footer %}
                {% endblock %}
            </div>  
         </body>
 </html>

				

index.html的內容如下：

				
			

								

								
<!-- 寫在開頭，用以繼承-->
{% extends "base.html"%}
<!--標題模組被過載 -->
{% block title%}index{% endblock %}
<!-- head模組被過載，並且使用super繼承了base.html中head的內容 -->
{% block head%}
    {{super() }}
<style type="text/css"> .important {color: #336699} </style>
{% endblock %}
<!--覆蓋了content摸塊 -->
{% block content %}
<h1> This is h1 content </h1>
<p class="important"> Welcome on mysql awesome homepage. </p>
{% endblock%}

				

我們使用下面的python程式碼渲染jinja2模板：

				
			

								

								
[root@localhost ~]# cat a.py 
#!/bin/python
#coding:utf8
import os
import jinja2
def render(tpl_path,**kwargs):
    path,filename= os.path.split(tpl_path)
    return jinja2.Environment(
                loader = jinja2.FileSystemLoader(path or './')
         ).get_template(filename).render(**kwargs)
def test_extend():
    result = render('index.html')
    print(result)
if __name__== '__main__':
    test_extend()

				

渲染後的結果為：

				
			

								

								
[root@localhost ~]# python a.py 
<!-- 寫在開通，用以繼承-->
<!DOCTYPE html>
<html lang="en">
    <head>
        <!--定義程式碼塊，可以在子模組中過載 -->
        
    
            <link rel="sty.esheet" href="style.css" />
            <title> index --My Webpage </title>
        
<style type="text/css"> .important {color: #336699} </style>
    </head>
     <body>
            <div id="content">
                 <!-- 定義程式碼塊，沒有提供預設內容 -->
                
<h1> This is h1 content </h1>
<p class="important"> Welcome on mysql awesome homepage. </p>
            </div>    
            <div id="footer">
                 <!-- 定義程式碼塊，沒有提供預設內容 -->
                
                
            </div>  
         </body>
 </html>

				

從這個例子可以看到：

1）、我們渲染的是index.html，並沒有直接渲染base.html，但是最後生成的模板中包含了完整的html框架，這也是繼承廣泛的使用場景；

2）、我們雖然在index.html中定義了title塊。但是，因為我們使用了{{super}}引用了base.html中的HEAD塊。因此，最後的渲染結果中包含了base.html中的head塊和index.html中的head塊。例如，最後渲染的結果中title標籤的內容是“index -My Webpage”，這個字串就來自index,html和base.html。

3）、我們在index.html中重新定義了content塊的內容，因此，最後生成的文件中在正確的位置顯示了content塊的內容。

12.3 案例：使用jinja2生成HTML表格和XML配置檔案

1、使用jinja2生成html表格

模板hzfc.html：

				
			

								

								
[root@localhost ~]# cat hzfc.html 
<html>
    <body>
        <table>
        {% for item in items %}
        <tr>
            <td> {{loop.index}} </td>
            <td><a href = "{{ item['href'] }}"> {{ item['title'] }}</a> </td>
        </tr>
        {% endfor %}
        </table>
    </body>
</html>

				

渲染模板的python程式碼：

				
			

								

								
[root@localhost ~]# cat a.py 
#!/usr/bin/python
#-*-coding:utf8 -*-
from __future__ import  print_function
from __future__ import  unicode_literals
import jinja2
import os
def render(tpl_path,**kwargs):
    path,filename= os.path.split(tpl_path)
    return jinja2.Environment(
                loader = jinja2.FileSystemLoader(path or './')
                    ).get_template(filename).render(**kwargs)
links = [{'title':u'杭州地鐵三期正式獲批 3號線即將上馬','href':''},
         {'title':u'潮鳴專案定名"風起潮鳴"','href':''}
        ]
content = render('hzfc.html',items=links)
print(content)

				

執行結果：

				
[root@localhost ~]# python a.py 
<html>
    <body>
        <table>
        
        <tr>
            <td> 1 </td>
            <td><a href = ""> 杭州地鐵三期正式獲批 3號線即將上馬</a> </td>
        </tr>
        
        <tr>
            <td> 2 </td>
            <td><a href = ""> 潮鳴專案定名"風起潮鳴"</a> </td>
        </tr>
        
        </table>
    </body>
</html>

2、使用jinja2生成xml配置檔案

應用場景：部署應用可以自動化，但是生成配置檔案就比較困難，這是因為配置檔案的取值是動態取值的，如果使用shell指令碼，只能透過sed進行動態的替換，shell指令碼處理這種情況比較複雜，而且容易出錯，可讀性差。對於這裡的需求，最好的方法是使用模板，透過模板渲染的方式，將底層的服務的Ip和埠號寫入一個配置檔案中，隨後用Python程式碼讀取配置檔案，渲染配置檔案生成新的配置檔案。

在本例中，有一個名為base.cfg的配置檔案，該檔案儲存了配置引數的取值：

				
[DEFAULT]
issa_server_a_host = 10.166.226.151
issa_server_a_port = 8101
issa_server_b_host = 10.166.266.152
issa_server_b_port = 8102
issa_server_c_host = 10.166.266.153
issa_server_c_port = 8102

此外，還有一個名為pass_service1_template.xml的配置模板，模板的內容如下：

				
<?xml version="1.0" encoding = 'utf-8' ?>
<pass_service1>
      <issa_server_a_host> {{ issa_server_a_host }} </issa_server_a_host> 
      <issa_server_a_port> {{ issa_server_a_port }} </issa_server_a_port> 
      <issa_server_c> {{ issa_server_c_host }}:{{issa_server_c_port}} </issa_server_c>
</pass_service1>

在本例中，存在兩個上層服務，另外一個上層服務的配置模板名稱為pass_service2_template.xml，內容如下：

				
<?xml version="1.0" encoding = 'utf-8' ?>
<pass_service2>
      <issa_server_b_host> {{ issa_server_b_host }} </issa_server_b_host> 
      <issa_server_b_port> {{ issa_server_b_port }} </issa_server_b_port> 
      <issa_server_c> {{ issa_server_c_host }}:{{issa_server_c_port}} </issa_server_c>
</pass_service2>

現在的需求是讀入配置檔案base.cfg，然後使用jinja2模板渲染技術，將兩個上層服務的配置模板pass_service1_template.xml和pass_service2_template.xml渲染成配置檔案。相關的python程式碼如下：

				
			


								

								

								
#!/usr/bin/python
#-*-coding:utf8 -*-
from __future__    import print_function
import os
#ConfigParser模組是python自帶的讀取配置檔案的模組，透過他可以方便的讀取配置檔案。注意，在python3中ConfigParser模組被改名為configparser了
try:
    import configparser
except ImportError:
    import ConfigParser as configparser
    
import jinja2
NAMES = ["issa_server_a_host","issa_server_a_port","issa_server_b_host","issa_server_b_port","issa_server_c_host","issa_server_c_port"]
def render(tpl_path,**kwargs):
    path,filename= os.path.split(tpl_path)
    return jinja2.Environment(
                loader = jinja2.FileSystemLoader(path or './')
                    ).get_template(filename).render(**kwargs)
#定義一個函式，將區域性變數定義為全域性變數
def parser_vars_into_globals(filename):
    parser = configparser.ConfigParser()
    #read 直接讀取檔案內容
    parser.read(filename)
    for name in NAMES:
        #get是得到section中option的值，返回為string型別
        globals()[name] = parser.get('DEFAULT',name)
        
def main():
    parser_vars_into_globals('base.cfg')
    with open('pass_service1.xml','w') as f:
        f.write(render('pass_service1_template.xml',**globals()))
    with open('pass_service2.xml','w') as f:
        f.write(render('pass_service2_template.xml',**globals()))
        
if __name__ == '__main__':
    main()
    

				

本文使用了一個小技巧，即透過給globals字典賦值的方式定義全域性變數，然後將所有的全域性變數傳遞給模板，模板渲染時只會使用到自己需要的變數，渲染完成後會在當前目錄下生成兩個配置檔案，分別是pass_server1.xml和pass_server2.xml。

pass_server1.xml的內容如下：

				
[root@localhost ~]# cat pass_service1.xml
<?xml version="1.0" encoding = 'utf-8' ?>
<pass_service1>
      <issa_server_a_host> 10.166.226.151 </issa_server_a_host> 
      <issa_server_a_port> 8101 </issa_server_a_port> 
      <issa_server_c> 10.166.266.153:8102 </issa_server_c>
</pass_service1>

pass_server2.xml的內容如下：

				
[root@localhost ~]# cat pass_service2.xml
<?xml version="1.0" encoding = 'utf-8' ?>
<pass_service2>
      <issa_server_b_host> 10.166.266.152 </issa_server_b_host> 
      <issa_server_b_port> 8102 </issa_server_b_port> 
      <issa_server_c> 10.166.266.153:8102 </issa_server_c>
</pass_service2>

小知識：

locals 是隻讀的,不可修改，而globals可以修改，原因是：

locals（）實際上沒有返回區域性名字空間，它返回的是一個複製。所以對它進行修改，修改的是複製，而對實際的區域性名字空間中的變數值並無影響。

globals（）返回的是實際的全域性名字空間，而不是一個複製: 與 locals 的行為完全相反。

所以對 globals 所返回的 dictionary 的任何的改動都會直接影響到全域性變數的取值。

				
In [39]: def foo(arg):
    ...:     x = 1
    ...:     print (locals()) #
    ...:     print ('x=',x)
    ...:     locals()['x'] = 2 #修改的是區域性名字空間的複製，而實際的區域性名字空間中的變數值並無影響。    
    ...:     print (locals())
    ...:     print ('x=',x)
    ...:     
In [42]: foo(3)
{'x': 1, 'arg': 3}
x= 1
{'x': 1, 'arg': 3}
x= 1

定義一個函式

				
			

								

								
#!/usr/bin/env python      
#coding:utf-8      
'''''This is my first python program!'''      
z = 7 #定義全域性變數    
def foo(arg):     
    x = 1     
    print locals()    
    print 'x=',x    
    locals()['x'] = 2 #修改的是區域性名字空間的複製，而實際的區域性名字空間中的變數值並無影響。    
    print locals()    
    print "x=",x    
    
foo(3)     
print globals()    
print 'z=',z    
globals()["z"] = 8 #globals（）返回的是實際的全域性名字空間，修改變數z的值    
print globals()    
print "z=",z

				

結果：
{'x': 1, 'arg': 3}
x= 1
{'x': 1, 'arg': 3}
x= 1
{'foo': <function foo at 0x02A17CF0>, '__builtins__': <module '__builtin__' (built-in)>, '__file__': 'E:\\workspace\\python day03\\main\\test.py', '__package__': None, '__name__': '__main__', 'z': 7, '__doc__': 'This is my first python program!'}
z= 7
{'foo': <function foo at 0x02A17CF0>, '__builtins__': <module '__builtin__' (built-in)>, '__file__': 'E:\\workspace\\python day03\\main\\test.py', '__package__': None, '__name__': '__main__', 'z': 8, '__doc__': 'This is my first python program!'}
z= 8

（完）

python筆記-文字處理（第三天）

相關文章