Python爬蟲開發與專案實戰 1:回顧Python程式設計

CopperDong發表於2018-01-04

https://github.com/qiyeboy/SpiderBook

第一章 回顧Python程式設計

      本書採用的是Python 2.7版本

       sudo apt-get install python-pip python-dev

      搭建Eclipse + PyDev : 通過擴充套件PyDev外掛,Eclipse就具有了編寫Python程式的功能。

               啟動Eclipse, 點選Help -> Install New Software ...

               Add:  name : Pydev , location: http://pydev.org/updates

               Pydev直譯器配置:window -> Pydev --> Interpreters --> Python Interpreter  新增python路徑
     讀檔案

try:

f = open(r'qiye.txt', 'r')

print f.read()      

finally:

if f:

f.close()

        上面程式碼略長,使用簡單的寫法,用with語句來代替try ... finally和close()

with open(r'qiye.txt', 'r') as fileReader:

print fileReader.read()

      序列化操作:用dict 物件, 和CPickle模組(用C語言編寫,速度快)和pickle模組

# 優先匯入cPickle
try:
     import cPickle as picker
except ImportError:
     import pickle
import cPickle as pickle
d = dict(url='index.html', title='首頁', content='內容')
pickel.dumps(d)    #dumps可以將任意物件序列化成一個str
f = open(r'dump.txt', 'wb')
pickle.dump(d, f)  # dump直接將物件寫入檔案
f.close()
反序列化:loads方法或load方法

f = open(r'dump.txt', 'rb')
d = pickle.load(f)
f.close()
       程式和執行緒:

taskManager.py

# coding: utf-8
import random, time, Queue
from multiprocessing.managers import BaseManager
# 1
task_queue = Queue.Queue()
result_queue = Queue.Queue()

class Queuemanager(BaseManager):
	pass

# 2 register
Queuemanager.register('get_task_queue', callable=lambda:task_queue);
Queuemanager.register('get_result_queue', callable=lambda:result_queue);

# 3 bind port, set the password "qiye"
manager = Queuemanager(address=('', 8001), authkey='qiye')

# 4 
manager.start();

# 5
task = manager.get_task_queue()
result = manager.get_result_queue()

# 6
for url in ['ImageUrl_'+bytes(i) for i in range(10)]:
	print 'put task %s ...' % url
	task.put(url)

#
print 'try get result...'
for i in range(10):
	print 'result is %s' % result.get(timeout=10)

#
manager.shutdown()

taskWorker.py

# coding: utf-8
import time
from multiprocessing.managers import BaseManager

# 0
class QueueManager(BaseManager):
	pass

# 1
QueueManager.register('get_task_queue')
QueueManager.register('get_result_queue')
# 2
server_addr = '127.0.0.1'
print('Connect to server %s...' % server_addr)
m = QueueManager(address=(server_addr, 8001), authkey='qiye')
m.connect()
# 3 獲取Queue的物件
task = m.get_task_queue()
result = m.get_result_queue()
# 4
while (not task.empty()):
	image_url = task.get(True, timeout=5)
	print('run task download %s...' % image_url)
	time.sleep(1)
	result.put('%s--->success'%image_url)
print('worker exit.')

       網路程式設計

            Python提供了兩個基本的Socket模組:

                   Socket, 提供了標準的BSD Sockets API

                   SocketServer, 提供了伺服器中心類,可以簡化網路伺服器的開發

 Socket型別

         
      



相關文章