本文是對於 現代 Python 開發:語法基礎與工程實踐的總結,更多 Python 相關資料參考 Python 學習與實踐資料索引;本文參考了 Python Crash Course - Cheat Sheets,pysheeet 等。本文僅包含筆者在日常工作中經常使用的,並且認為較為關鍵的知識點與語法,如果想要進一步學習 Python 相關內容或者對於機器學習與資料探勘方向感興趣,可以參考程式猿的資料科學與機器學習實戰手冊。
基礎語法
Python 是一門高階、動態型別的多正規化程式語言;定義 Python 檔案的時候我們往往會先宣告檔案編碼方式:
# 指定指令碼呼叫方式
#!/usr/bin/env python
# 配置 utf-8 編碼
# -*- coding: utf-8 -*-
# 配置其他編碼
# -*- coding: <encoding-name> -*-
# Vim 中還可以使用如下方式
# vim:fileencoding=<encoding-name>複製程式碼
人生苦短,請用 Python,大量功能強大的語法糖的同時讓很多時候 Python 程式碼看上去有點像虛擬碼。譬如我們用 Python 實現的簡易的快排相較於 Java 會顯得很短小精悍:
def quicksort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) / 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quicksort(left) + middle + quicksort(right)
print quicksort([3,6,8,10,1,2,1])
# Prints "[1, 1, 2, 3, 6, 8, 10]"複製程式碼
控制檯互動
可以根據 __name__
關鍵字來判斷是否是直接使用 python 命令執行某個指令碼,還是外部引用;Google 開源的 fire 也是不錯的快速將某個類封裝為命令列工具的框架:
import fire
class Calculator(object):
"""A simple calculator class."""
def double(self, number):
return 2 * number
if __name__ == '__main__':
fire.Fire(Calculator)
# python calculator.py double 10 # 20
# python calculator.py double --number=15 # 30複製程式碼
Python 2 中 print 是表示式,而 Python 3 中 print 是函式;如果希望在 Python 2 中將 print 以函式方式使用,則需要自定義引入:
from __future__ import print_function複製程式碼
我們也可以使用 pprint 來美化控制檯輸出內容:
import pprint
stuff = ['spam', 'eggs', 'lumberjack', 'knights', 'ni']
pprint.pprint(stuff)
# 自定義引數
pp = pprint.PrettyPrinter(depth=6)
tup = ('spam', ('eggs', ('lumberjack', ('knights', ('ni', ('dead',('parrot', ('fresh fruit',))))))))
pp.pprint(tup)複製程式碼
模組
Python 中的模組(Module)即是 Python 原始碼檔案,其可以匯出類、函式與全域性變數;當我們從某個模組匯入變數時,函式名往往就是名稱空間(Namespace)。而 Python 中的包(Package)則是模組的資料夾,往往由 __init__.py
指明某個資料夾為包:
# 檔案目錄
someDir/
main.py
siblingModule.py
# siblingModule.py
def siblingModuleFun():
print('Hello from siblingModuleFun')
def siblingModuleFunTwo():
print('Hello from siblingModuleFunTwo')
import siblingModule
import siblingModule as sibMod
sibMod.siblingModuleFun()
from siblingModule import siblingModuleFun
siblingModuleFun()
try:
# Import 'someModuleA' that is only available in Windows
import someModuleA
except ImportError:
try:
# Import 'someModuleB' that is only available in Linux
import someModuleB
except ImportError:複製程式碼
Package 可以為某個目錄下所有的檔案設定統一入口:
someDir/
main.py
subModules/
__init__.py
subA.py
subSubModules/
__init__.py
subSubA.py
# subA.py
def subAFun():
print('Hello from subAFun')
def subAFunTwo():
print('Hello from subAFunTwo')
# subSubA.py
def subSubAFun():
print('Hello from subSubAFun')
def subSubAFunTwo():
print('Hello from subSubAFunTwo')
# __init__.py from subDir
# Adds 'subAFun()' and 'subAFunTwo()' to the 'subDir' namespace
from .subA import *
# The following two import statement do the same thing, they add 'subSubAFun()' and 'subSubAFunTwo()' to the 'subDir' namespace. The first one assumes '__init__.py' is empty in 'subSubDir', and the second one, assumes '__init__.py' in 'subSubDir' contains 'from .subSubA import *'.
# Assumes '__init__.py' is empty in 'subSubDir'
# Adds 'subSubAFun()' and 'subSubAFunTwo()' to the 'subDir' namespace
from .subSubDir.subSubA import *
# Assumes '__init__.py' in 'subSubDir' has 'from .subSubA import *'
# Adds 'subSubAFun()' and 'subSubAFunTwo()' to the 'subDir' namespace
from .subSubDir import *
# __init__.py from subSubDir
# Adds 'subSubAFun()' and 'subSubAFunTwo()' to the 'subSubDir' namespace
from .subSubA import *
# main.py
import subDir
subDir.subAFun() # Hello from subAFun
subDir.subAFunTwo() # Hello from subAFunTwo
subDir.subSubAFun() # Hello from subSubAFun
subDir.subSubAFunTwo() # Hello from subSubAFunTwo複製程式碼
表示式與控制流
條件選擇
Python 中使用 if、elif、else 來進行基礎的條件選擇操作:
if x < 0:
x = 0
print('Negative changed to zero')
elif x == 0:
print('Zero')
else:
print('More')複製程式碼
Python 同樣支援 ternary conditional operator:
a if condition else b複製程式碼
也可以使用 Tuple 來實現類似的效果:
# test 需要返回 True 或者 False
(falseValue, trueValue)[test]
# 更安全的做法是進行強制判斷
(falseValue, trueValue)[test == True]
# 或者使用 bool 型別轉換函式
(falseValue, trueValue)[bool(<expression>)]複製程式碼
迴圈遍歷
for-in 可以用來遍歷陣列與字典:
words = ['cat', 'window', 'defenestrate']
for w in words:
print(w, len(w))
# 使用陣列訪問操作符,能夠迅速地生成陣列的副本
for w in words[:]:
if len(w) > 6:
words.insert(0, w)
# words -> ['defenestrate', 'cat', 'window', 'defenestrate']複製程式碼
如果我們希望使用數字序列進行遍歷,可以使用 Python 內建的 range
函式:
a = ['Mary', 'had', 'a', 'little', 'lamb']
for i in range(len(a)):
print(i, a[i])複製程式碼
基本資料型別
可以使用內建函式進行強制型別轉換(Casting):
int(str)
float(str)
str(int)
str(float)複製程式碼
Number: 數值型別
x = 3
print type(x) # Prints "<type 'int'>"
print x # Prints "3"
print x + 1 # Addition; prints "4"
print x - 1 # Subtraction; prints "2"
print x * 2 # Multiplication; prints "6"
print x ** 2 # Exponentiation; prints "9"
x += 1
print x # Prints "4"
x *= 2
print x # Prints "8"
y = 2.5
print type(y) # Prints "<type 'float'>"
print y, y + 1, y * 2, y ** 2 # Prints "2.5 3.5 5.0 6.25"複製程式碼
布林型別
Python 提供了常見的邏輯操作符,不過需要注意的是 Python 中並沒有使用 &&、|| 等,而是直接使用了英文單詞。
t = True
f = False
print type(t) # Prints "<type 'bool'>"
print t and f # Logical AND; prints "False"
print t or f # Logical OR; prints "True"
print not t # Logical NOT; prints "False"
print t != f # Logical XOR; prints "True"複製程式碼
String: 字串
Python 2 中支援 Ascii 碼的 str() 型別,獨立的 unicode() 型別,沒有 byte 型別;而 Python 3 中預設的字串為 utf-8 型別,並且包含了 byte 與 bytearray 兩個位元組型別:
type("Guido") # string type is str in python2
# <type 'str'>
# 使用 __future__ 中提供的模組來降級使用 Unicode
from __future__ import unicode_literals
type("Guido") # string type become unicode
# <type 'unicode'>複製程式碼
Python 字串支援分片、模板字串等常見操作:
var1 = 'Hello World!'
var2 = "Python Programming"
print "var1[0]: ", var1[0]
print "var2[1:5]: ", var2[1:5]
# var1[0]: H
# var2[1:5]: ytho
print "My name is %s and weight is %d kg!" % ('Zara', 21)
# My name is Zara and weight is 21 kg!複製程式碼
str[0:4]
len(str)
string.replace("-", " ")
",".join(list)
"hi {0}".format('j')
str.find(",")
str.index(",") # same, but raises IndexError
str.count(",")
str.split(",")
str.lower()
str.upper()
str.title()
str.lstrip()
str.rstrip()
str.strip()
str.islower()複製程式碼
# 移除所有的特殊字元
re.sub('[^A-Za-z0-9]+', '', mystring)複製程式碼
如果需要判斷是否包含某個子字串,或者搜尋某個字串的下標:
# in 操作符可以判斷字串
if "blah" not in somestring:
continue
# find 可以搜尋下標
s = "This be a string"
if s.find("is") == -1:
print "No 'is' here!"
else:
print "Found 'is' in the string."複製程式碼
Regex: 正規表示式
import re
# 判斷是否匹配
re.match(r'^[aeiou]', str)
# 以第二個引數指定的字元替換原字串中內容
re.sub(r'^[aeiou]', '?', str)
re.sub(r'(xyz)', r'\1', str)
# 編譯生成獨立的正規表示式物件
expr = re.compile(r'^...$')
expr.match(...)
expr.sub(...)複製程式碼
下面列舉了常見的表示式使用場景:
# 檢測是否為 HTML 標籤
re.search('<[^/>][^>]*>', '<a href="#label">')
# 常見的使用者名稱密碼
re.match('^[a-zA-Z0-9-_]{3,16}$', 'Foo') is not None
re.match('^\w|[-_]{3,16}$', 'Foo') is not None
# Email
re.match('^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$', 'hello.world@example.com')
# Url
exp = re.compile(r'''^(https?:\/\/)? # match http or https
([\da-z\.-]+) # match domain
\.([a-z\.]{2,6}) # match domain
([\/\w \.-]*)\/?$ # match api or file
''', re.X)
exp.match('www.google.com')
# IP 地址
exp = re.compile(r'''^(?:(?:25[0-5]
|2[0-4][0-9]
|[1]?[0-9][0-9]?)\.){3}
(?:25[0-5]
|2[0-4][0-9]
|[1]?[0-9][0-9]?)$''', re.X)
exp.match('192.168.1.1')複製程式碼
集合型別
List: 列表
Operation: 建立增刪
list 是基礎的序列型別:
l = []
l = list()
# 使用字串的 split 方法,可以將字串轉化為列表
str.split(".")
# 如果需要將陣列拼裝為字串,則可以使用 join
list1 = ['1', '2', '3']
str1 = ''.join(list1)
# 如果是數值陣列,則需要先進行轉換
list1 = [1, 2, 3]
str1 = ''.join(str(e) for e in list1)複製程式碼
可以使用 append 與 extend 向陣列中插入元素或者進行陣列連線
x = [1, 2, 3]
x.append([4, 5]) # [1, 2, 3, [4, 5]]
x.extend([4, 5]) # [1, 2, 3, 4, 5],注意 extend 返回值為 None複製程式碼
可以使用 pop、slices、del、remove 等移除列表中元素:
myList = [10,20,30,40,50]
# 彈出第二個元素
myList.pop(1) # 20
# myList: myList.pop(1)
# 如果不加任何引數,則預設彈出最後一個元素
myList.pop()
# 使用 slices 來刪除某個元素
a = [ 1, 2, 3, 4, 5, 6 ]
index = 3 # Only Positive index
a = a[:index] + a[index+1 :]
# 根據下標刪除元素
myList = [10,20,30,40,50]
rmovIndxNo = 3
del myList[rmovIndxNo] # myList: [10, 20, 30, 50]
# 使用 remove 方法,直接根據元素刪除
letters = ["a", "b", "c", "d", "e"]
numbers.remove(numbers[1])
print(*letters) # used a * to make it unpack you don't have to複製程式碼
Iteration: 索引遍歷
你可以使用基本的 for 迴圈來遍歷陣列中的元素,就像下面介個樣紙:
animals = ['cat', 'dog', 'monkey']
for animal in animals:
print animal
# Prints "cat", "dog", "monkey", each on its own line.複製程式碼
如果你在迴圈的同時也希望能夠獲取到當前元素下標,可以使用 enumerate 函式:
animals = ['cat', 'dog', 'monkey']
for idx, animal in enumerate(animals):
print '#%d: %s' % (idx + 1, animal)
# Prints "#1: cat", "#2: dog", "#3: monkey", each on its own line複製程式碼
Python 也支援切片(Slices):
nums = range(5) # range is a built-in function that creates a list of integers
print nums # Prints "[0, 1, 2, 3, 4]"
print nums[2:4] # Get a slice from index 2 to 4 (exclusive); prints "[2, 3]"
print nums[2:] # Get a slice from index 2 to the end; prints "[2, 3, 4]"
print nums[:2] # Get a slice from the start to index 2 (exclusive); prints "[0, 1]"
print nums[:] # Get a slice of the whole list; prints ["0, 1, 2, 3, 4]"
print nums[:-1] # Slice indices can be negative; prints ["0, 1, 2, 3]"
nums[2:4] = [8, 9] # Assign a new sublist to a slice
print nums # Prints "[0, 1, 8, 9, 4]"複製程式碼
Comprehensions: 變換
Python 中同樣可以使用 map、reduce、filter,map 用於變換陣列:
# 使用 map 對陣列中的每個元素計算平方
items = [1, 2, 3, 4, 5]
squared = list(map(lambda x: x**2, items))
# map 支援函式以陣列方式連線使用
def multiply(x):
return (x*x)
def add(x):
return (x+x)
funcs = [multiply, add]
for i in range(5):
value = list(map(lambda x: x(i), funcs))
print(value)複製程式碼
reduce 用於進行歸納計算:
# reduce 將陣列中的值進行歸納
from functools import reduce
product = reduce((lambda x, y: x * y), [1, 2, 3, 4])
# Output: 24複製程式碼
filter 則可以對陣列進行過濾:
number_list = range(-5, 5)
less_than_zero = list(filter(lambda x: x < 0, number_list))
print(less_than_zero)
# Output: [-5, -4, -3, -2, -1]複製程式碼
字典型別
建立增刪
d = {'cat': 'cute', 'dog': 'furry'} # 建立新的字典
print d['cat'] # 字典不支援點(Dot)運算子取值複製程式碼
如果需要合併兩個或者多個字典型別:
# python 3.5
z = {**x, **y}
# python 2.7
def merge_dicts(*dict_args):
"""
Given any number of dicts, shallow copy and merge into a new dict,
precedence goes to key value pairs in latter dicts.
"""
result = {}
for dictionary in dict_args:
result.update(dictionary)
return result複製程式碼
索引遍歷
可以根據鍵來直接進行元素訪問:
# Python 中對於訪問不存在的鍵會丟擲 KeyError 異常,需要先行判斷或者使用 get
print 'cat' in d # Check if a dictionary has a given key; prints "True"
# 如果直接使用 [] 來取值,需要先確定鍵的存在,否則會丟擲異常
print d['monkey'] # KeyError: 'monkey' not a key of d
# 使用 get 函式則可以設定預設值
print d.get('monkey', 'N/A') # Get an element with a default; prints "N/A"
print d.get('fish', 'N/A') # Get an element with a default; prints "wet"
d.keys() # 使用 keys 方法可以獲取所有的鍵複製程式碼
可以使用 for-in 來遍歷陣列:
# 遍歷鍵
for key in d:
# 比前一種方式慢
for k in dict.keys(): ...
# 直接遍歷值
for value in dict.itervalues(): ...
# Python 2.x 中遍歷鍵值
for key, value in d.iteritems():
# Python 3.x 中遍歷鍵值
for key, value in d.items():複製程式碼
其他序列型別
集合
# Same as {"a", "b","c"}
normal_set = set(["a", "b","c"])
# Adding an element to normal set is fine
normal_set.add("d")
print("Normal Set")
print(normal_set)
# A frozen set
frozen_set = frozenset(["e", "f", "g"])
print("Frozen Set")
print(frozen_set)
# Uncommenting below line would cause error as
# we are trying to add element to a frozen set
# frozen_set.add("h")複製程式碼
函式
函式定義
Python 中的函式使用 def 關鍵字進行定義,譬如:
def sign(x):
if x > 0:
return 'positive'
elif x < 0:
return 'negative'
else:
return 'zero'
for x in [-1, 0, 1]:
print sign(x)
# Prints "negative", "zero", "positive"複製程式碼
Python 支援執行時建立動態函式,也即是所謂的 lambda 函式:
def f(x): return x**2
# 等價於
g = lambda x: x**2複製程式碼
引數
Option Arguments: 不定引數
def example(a, b=None, *args, **kwargs):
print a, b
print args
print kwargs
example(1, "var", 2, 3, word="hello")
# 1 var
# (2, 3)
# {'word': 'hello'}
a_tuple = (1, 2, 3, 4, 5)
a_dict = {"1":1, "2":2, "3":3}
example(1, "var", *a_tuple, **a_dict)
# 1 var
# (1, 2, 3, 4, 5)
# {'1': 1, '2': 2, '3': 3}複製程式碼
生成器
def simple_generator_function():
yield 1
yield 2
yield 3
for value in simple_generator_function():
print(value)
# 輸出結果
# 1
# 2
# 3
our_generator = simple_generator_function()
next(our_generator)
# 1
next(our_generator)
# 2
next(our_generator)
#3
# 生成器典型的使用場景譬如無限陣列的迭代
def get_primes(number):
while True:
if is_prime(number):
yield number
number += 1複製程式碼
裝飾器
裝飾器是非常有用的設計模式:
# 簡單裝飾器
from functools import wraps
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
print('wrap function')
return func(*args, **kwargs)
return wrapper
@decorator
def example(*a, **kw):
pass
example.__name__ # attr of function preserve
# 'example'
# Decorator
# 帶輸入值的裝飾器
from functools import wraps
def decorator_with_argument(val):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
print "Val is {0}".format(val)
return func(*args, **kwargs)
return wrapper
return decorator
@decorator_with_argument(10)
def example():
print "This is example function."
example()
# Val is 10
# This is example function.
# 等價於
def example():
print "This is example function."
example = decorator_with_argument(10)(example)
example()
# Val is 10
# This is example function.複製程式碼
類與物件
類定義
Python 中對於類的定義也很直接:
class Greeter(object):
# Constructor
def __init__(self, name):
self.name = name # Create an instance variable
# Instance method
def greet(self, loud=False):
if loud:
print 'HELLO, %s!' % self.name.upper()
else:
print 'Hello, %s' % self.name
g = Greeter('Fred') # Construct an instance of the Greeter class
g.greet() # Call an instance method; prints "Hello, Fred"
g.greet(loud=True) # Call an instance method; prints "HELLO, FRED!"複製程式碼
# isinstance 方法用於判斷某個物件是否源自某個類
ex = 10
isinstance(ex,int)複製程式碼
Managed Attributes: 受控屬性
# property、setter、deleter 可以用於複寫點方法
class Example(object):
def __init__(self, value):
self._val = value
@property
def val(self):
return self._val
@val.setter
def val(self, value):
if not isintance(value, int):
raise TypeError("Expected int")
self._val = value
@val.deleter
def val(self):
del self._val
@property
def square3(self):
return 2**3
ex = Example(123)
ex.val = "str"
# Traceback (most recent call last):
# File "", line 1, in
# File "test.py", line 12, in val
# raise TypeError("Expected int")
# TypeError: Expected int複製程式碼
類方法與靜態方法
class example(object):
@classmethod
def clsmethod(cls):
print "I am classmethod"
@staticmethod
def stmethod():
print "I am staticmethod"
def instmethod(self):
print "I am instancemethod"
ex = example()
ex.clsmethod()
# I am classmethod
ex.stmethod()
# I am staticmethod
ex.instmethod()
# I am instancemethod
example.clsmethod()
# I am classmethod
example.stmethod()
# I am staticmethod
example.instmethod()
# Traceback (most recent call last):
# File "", line 1, in
# TypeError: unbound method instmethod() ...複製程式碼
物件
例項化
屬性操作
Python 中物件的屬性不同於字典鍵,可以使用點運算子取值,直接使用 in 判斷會存在問題:
class A(object):
@property
def prop(self):
return 3
a = A()
print "'prop' in a.__dict__ =", 'prop' in a.__dict__
print "hasattr(a, 'prop') =", hasattr(a, 'prop')
print "a.prop =", a.prop
# 'prop' in a.__dict__ = False
# hasattr(a, 'prop') = True
# a.prop = 3複製程式碼
建議使用 hasattr、getattr、setattr 這種方式對於物件屬性進行操作:
class Example(object):
def __init__(self):
self.name = "ex"
def printex(self):
print "This is an example"
# Check object has attributes
# hasattr(obj, 'attr')
ex = Example()
hasattr(ex,"name")
# True
hasattr(ex,"printex")
# True
hasattr(ex,"print")
# False
# Get object attribute
# getattr(obj, 'attr')
getattr(ex,'name')
# 'ex'
# Set object attribute
# setattr(obj, 'attr', value)
setattr(ex,'name','example')
ex.name
# 'example'複製程式碼
異常與測試
異常處理
Context Manager - with
with 常用於開啟或者關閉某些資源:
host = 'localhost'
port = 5566
with Socket(host, port) as s:
while True:
conn, addr = s.accept()
msg = conn.recv(1024)
print msg
conn.send(msg)
conn.close()複製程式碼
單元測試
from __future__ import print_function
import unittest
def fib(n):
return 1 if n<=2 else fib(n-1)+fib(n-2)
def setUpModule():
print("setup module")
def tearDownModule():
print("teardown module")
class TestFib(unittest.TestCase):
def setUp(self):
print("setUp")
self.n = 10
def tearDown(self):
print("tearDown")
del self.n
@classmethod
def setUpClass(cls):
print("setUpClass")
@classmethod
def tearDownClass(cls):
print("tearDownClass")
def test_fib_assert_equal(self):
self.assertEqual(fib(self.n), 55)
def test_fib_assert_true(self):
self.assertTrue(fib(self.n) == 55)
if __name__ == "__main__":
unittest.main()複製程式碼
儲存
檔案讀寫
路徑處理
Python 內建的 __file__
關鍵字會指向當前檔案的相對路徑,可以根據它來構造絕對路徑,或者索引其他檔案:
# 獲取當前檔案的相對目錄
dir = os.path.dirname(__file__) # src\app
## once you're at the directory level you want, with the desired directory as the final path node:
dirname1 = os.path.basename(dir)
dirname2 = os.path.split(dir)[1] ## if you look at the documentation, this is exactly what os.path.basename does.
# 獲取當前程式碼檔案的絕對路徑,abspath 會自動根據相對路徑與當前工作空間進行路徑補全
os.path.abspath(os.path.dirname(__file__)) # D:\WorkSpace\OWS\tool\ui-tool-svn\python\src\app
# 獲取當前檔案的真實路徑
os.path.dirname(os.path.realpath(__file__)) # D:\WorkSpace\OWS\tool\ui-tool-svn\python\src\app
# 獲取當前執行路徑
os.getcwd()複製程式碼
可以使用 listdir、walk、glob 模組來進行檔案列舉與檢索:
# 僅列舉所有的檔案
from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
# 使用 walk 遞迴搜尋
from os import walk
f = []
for (dirpath, dirnames, filenames) in walk(mypath):
f.extend(filenames)
break
# 使用 glob 進行復雜模式匹配
import glob
print(glob.glob("/home/adam/*.txt"))
# ['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]複製程式碼
簡單檔案讀寫
# 可以根據檔案是否存在選擇寫入模式
mode = 'a' if os.path.exists(writepath) else 'w'
# 使用 with 方法能夠自動處理異常
with open("file.dat",mode) as f:
f.write(...)
...
# 操作完畢之後記得關閉檔案
f.close()
# 讀取檔案內容
message = f.read()複製程式碼
複雜格式檔案
JSON
import json
# Writing JSON data
with open('data.json', 'w') as f:
json.dump(data, f)
# Reading data back
with open('data.json', 'r') as f:
data = json.load(f)複製程式碼
XML
我們可以使用 lxml 來解析與處理 XML 檔案,本部分即對其常用操作進行介紹。lxml 支援從字串或者檔案中建立 Element 物件:
from lxml import etree
# 可以從字串開始構造
xml = '<a xmlns="test"><b xmlns="test"/></a>'
root = etree.fromstring(xml)
etree.tostring(root)
# b'<a xmlns="test"><b xmlns="test"/></a>'
# 也可以從某個檔案開始構造
tree = etree.parse("doc/test.xml")
# 或者指定某個 baseURL
root = etree.fromstring(xml, base_url="http://where.it/is/from.xml")複製程式碼
其提供了迭代器以對所有元素進行遍歷:
# 遍歷所有的節點
for tag in tree.iter():
if not len(tag):
print tag.keys() # 獲取所有自定義屬性
print (tag.tag, tag.text) # text 即文字子元素值
# 獲取 XPath
for e in root.iter():
print tree.getpath(e)複製程式碼
lxml 支援以 XPath 查詢元素,不過需要注意的是,XPath 查詢的結果是陣列,並且在包含名稱空間的情況下,需要指定名稱空間:
root.xpath('//page/text/text()',ns={prefix:url})
# 可以使用 getparent 遞迴查詢父元素
el.getparent()複製程式碼
lxml 提供了 insert、append 等方法進行元素操作:
# append 方法預設追加到尾部
st = etree.Element("state", name="New Mexico")
co = etree.Element("county", name="Socorro")
st.append(co)
# insert 方法可以指定位置
node.insert(0, newKid)複製程式碼
Excel
可以使用 xlrd 來讀取 Excel 檔案,使用 xlsxwriter 來寫入與操作 Excel 檔案。
# 讀取某個 Cell 的原始值
sh.cell(rx, col).value複製程式碼
# 建立新的檔案
workbook = xlsxwriter.Workbook(outputFile)
worksheet = workbook.add_worksheet()
# 設定從第 0 行開始寫入
row = 0
# 遍歷二維陣列,並且將其寫入到 Excel 中
for rowData in array:
for col, data in enumerate(rowData):
worksheet.write(row, col, data)
row = row + 1
workbook.close()複製程式碼
檔案系統
對於高階的檔案操作,我們可以使用 Python 內建的 shutil
# 遞迴刪除 appName 下面的所有的資料夾
shutil.rmtree(appName)複製程式碼
網路互動
Requests
Requests 是優雅而易用的 Python 網路請求庫:
import requests
r = requests.get('https://api.github.com/events')
r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
r.status_code
# 200
r.headers['content-type']
# 'application/json; charset=utf8'
r.encoding
# 'utf-8'
r.text
# u'{"type":"User"...'
r.json()
# {u'private_gists': 419, u'total_private_repos': 77, ...}
r = requests.put('http://httpbin.org/put', data = {'key':'value'})
r = requests.delete('http://httpbin.org/delete')
r = requests.head('http://httpbin.org/get')
r = requests.options('http://httpbin.org/get')複製程式碼
資料儲存
MySQL
import pymysql.cursors
# Connect to the database
connection = pymysql.connect(host='localhost',
user='user',
password='passwd',
db='db',
charset='utf8mb4',
cursorclass=pymysql.cursors.DictCursor)
try:
with connection.cursor() as cursor:
# Create a new record
sql = "INSERT INTO `users` (`email`, `password`) VALUES (%s, %s)"
cursor.execute(sql, ('webmaster@python.org', 'very-secret'))
# connection is not autocommit by default. So you must commit to save
# your changes.
connection.commit()
with connection.cursor() as cursor:
# Read a single record
sql = "SELECT `id`, `password` FROM `users` WHERE `email`=%s"
cursor.execute(sql, ('webmaster@python.org',))
result = cursor.fetchone()
print(result)
finally:
connection.close()複製程式碼