Python學習筆記 - aiohttp

MADAO是不會開花的發表於2019-02-18

上一個筆記總結了asyncio的一些知識點,這次就來應用一下。如果使用協程的方式來寫爬蟲,網路相關的請求就要將requests庫替換成aiohttp這個庫。

一. 效率對比

  1. 用上次寫的爬蟲,先爬一些桌布連結

    urls = [
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-729560.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-724055.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716644.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716643.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716645.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686220.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686212.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-652608.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639894.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639893.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639892.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639890.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639888.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-468197.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467016.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467012.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467009.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467007.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467005.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466997.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466998.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466993.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466994.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466995.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-729560.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-724055.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716644.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716643.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716645.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686220.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686212.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-652608.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639894.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639893.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639892.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639890.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639888.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-468197.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467016.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467012.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467009.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467007.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467005.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466997.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466998.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466993.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466994.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466995.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466992.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466989.jpg"
    ]
    複製程式碼
  2. 用協程的方式下載這些圖片

    • 安裝aiohttp庫

    • 根據文件,aiohttp庫發起請求使用的是aiohttp.ClientSession(),文件建議不用每次請求都建立一個session會話,所以這裡就只建立一個:

      async def main():
          async with aiohttp.ClientSession() as session:
              pass
      複製程式碼
    • 定義下載圖片的協程函式和建立儲存路徑的函式:

      # 下載圖片
      async def download_img(session, url):
          image_name = url.split('/')[-1]
          async with session.get(url, headers=headers) as response:
              with open('%s/%s' % (get_store_path('city'), image_name), 'wb') as fd:
                  while True:
                      chunk = await response.content.read(200)
                      if not chunk:
                          break
                      fd.write(chunk)
      複製程式碼
      # 獲取圖片儲存路徑,如果沒有則建立
      def get_store_path(dir_name):
          current_path = os.path.abspath('.')
          target_path = os.path.join(current_path, 'wallpaper/%s' % dir_name)
          folder = os.path.exists(target_path)
          if not folder:
              os.makedirs(target_path)
      
          return target_path
      複製程式碼
    • 補全main函式

      async def main(loop):
          async with aiohttp.ClientSession() as session:
              tasks = [loop.create_task(download_img(session, url)) for url in urls]
              await asyncio.wait(tasks)
              
      loop = asyncio.get_event_loop()
      loop.run_until_complete(main(loop))
      loop.close()
      複製程式碼
    • 計算下載耗時:

      if __name__ == '__main__':
          t1 = time.time()
          loop = asyncio.get_event_loop()
          loop.run_until_complete(main(loop))
          loop.close()
          print('耗時:%fs' % (time.time() - t1))
      複製程式碼
    • 結果:

      Python學習筆記 - aiohttp

      26張圖片用時1.476328s

  3. 使用多程式方式下載

     import requests
     import multiprocessing
     import os
     import time
    
    
     urls = [
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-729560.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-724055.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716644.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716643.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716645.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686220.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686212.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-652608.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639894.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639893.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639892.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639890.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639888.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-468197.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467016.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467012.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467009.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467007.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467005.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466997.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466998.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466993.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466994.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466995.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-729560.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-724055.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716644.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716643.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716645.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686220.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686212.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-652608.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639894.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639893.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639892.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639890.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639888.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-468197.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467016.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467012.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467009.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467007.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467005.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466997.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466998.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466993.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466994.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466995.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466992.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466989.jpg"
     ]
    
     req_session = requests.Session()
     req_session.headers['user-agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
    
    
     def get_store_path(dir_name):
         current_path = os.path.abspath('.')
         target_path = os.path.join(current_path, 'wallpaper/%s' % dir_name)
         folder = os.path.exists(target_path)
         if not folder:
             os.makedirs(target_path)
    
         return target_path
    
    
     def download_img(url):
         img = req_session.get(url, stream=True)
         image_name = url.split('/')[-1]
         with open('%s/%s' % (get_store_path('city'), image_name), 'wb') as fd:
             for chunk in img.iter_content(chunk_size=128):
                 fd.write(chunk)
    
    
     def main():
         p = multiprocessing.Pool()
         [p.apply_async(download_img, args=(url,)) for url in urls]
         p.close()
         p.join()
    
     if __name__ == '__main__':
         t1 = time.time()
         main()
         print('耗時:%fs' % (time.time() - t1))
    
    複製程式碼

    結果

    Python學習筆記 - aiohttp

兩種我都執行了多次,差不多都在1.4~2.7s之間,可以看到協程還是很強大的,僅僅用單執行緒就做到了類似多程式的效果。

相關文章