ubuntu安裝Scrapy
Scrapy是Python開發的一個快速,高層次的螢幕抓取和web抓取框架,用於抓取web站點並從頁面中提取結構化的資料。Scrapy用途廣泛,可以用於資料探勘、監測和自動化測試。 官網網站http://www.scrapy.org/
1、安裝如下軟體
sudo apt-get install build-essential; sudo apt-get install python-dev; sudo apt-get install libxml2-dev; sudo apt-get install libxslt1-dev; sudo apt-get install python-setuptools; |
2、安裝Scrapy
sudo easy_install Scrapy; |
wang@ubuntu:/usr/local/lib/python2.7/dist-packages$ sudo easy_install Scrapy Searching for Scrapy Best match: Scrapy 0.16.1 Processing Scrapy-0.16.1-py2.7.egg Scrapy 0.16.1 is already the active version in easy-install.pth Installing scrapy script to /usr/local/bin Using /usr/local/lib/python2.7/dist-packages/Scrapy-0.16.1-py2.7.egg Processing dependencies for Scrapy Searching for lxml Reading http://pypi.python.org/simple/lxml/ Reading http://codespeak.net/lxml Best match: lxml 3.0.1 Downloading http://pypi.python.org/packages/source/l/lxml/lxml-3.0.1.tar.gz#md5=0f2b1a063ab3b6b0944cbc4a9a85dcfa Processing lxml-3.0.1.tar.gz Running lxml-3.0.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-qibAzL/lxml-3.0.1/egg-dist-tmp-mSvUVN Building lxml version 3.0.1. Building without Cython. Using build configuration of libxslt 1.1.26 Building against libxml2/libxslt in the following directory: /usr/lib/x86_64-linux-gnu warning: no files found matching '*.txt' under directory 'src/lxml/tests' src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree__getFilenameForFile’: src/lxml/lxml.etree.c:26310:7: warning: variable ‘__pyx_clineno’ set but not used [-Wunused-but-set-variable] src/lxml/lxml.etree.c:26309:15: warning: variable ‘__pyx_filename’ set but not used [-Wunused-but-set-variable] src/lxml/lxml.etree.c:26308:7: warning: variable ‘__pyx_lineno’ set but not used [-Wunused-but-set-variable] src/lxml/lxml.etree.c: In function ‘__pyx_pf_4lxml_5etree_4XSLT_18__call__’: src/lxml/lxml.etree.c:132608:81: warning: passing argument 1 of ‘__pyx_f_4lxml_5etree_12_XSLTContext__copy’ from incompatible pointer type [enabled by default] src/lxml/lxml.etree.c:130569:52: note: expected ‘struct __pyx_obj_4lxml_5etree__XSLTContext *’ but argument is of type ‘struct __pyx_obj_4lxml_5etree__BaseContext *’ src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree__copyXSLT’: src/lxml/lxml.etree.c:133997:79: warning: passing argument 1 of ‘__pyx_f_4lxml_5etree_12_XSLTContext__copy’ from incompatible pointer type [enabled by default] src/lxml/lxml.etree.c:130569:52: note: expected ‘struct __pyx_obj_4lxml_5etree__XSLTContext *’ but argument is of type ‘struct __pyx_obj_4lxml_5etree__BaseContext *’ src/lxml/lxml.etree.c: At top level: src/lxml/lxml.etree.c:12128:13: warning: ‘__pyx_f_4lxml_5etree_displayNode’ defined but not used [-Wunused-function] src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseDocFromFile’: src/lxml/lxml.etree.c:86715:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized] src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseDoc’: src/lxml/lxml.etree.c:86403:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized] src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseUnicodeDoc’: src/lxml/lxml.etree.c:86093:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized] src/lxml/lxml.etree.c: In function ‘__pyx_f_4lxml_5etree_11_BaseParser__parseDocFromFilelike’: src/lxml/lxml.etree.c:86925:3: warning: ‘__pyx_r’ may be used uninitialized in this function [-Wuninitialized] Adding lxml 3.0.1 to easy-install.pth file Installed /usr/local/lib/python2.7/dist-packages/lxml-3.0.1-py2.7-linux-x86_64.egg Searching for w3lib>=1.2 Reading http://pypi.python.org/simple/w3lib/ Reading http://github.com/scrapy/w3lib Best match: w3lib 1.2 Downloading http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz#md5=f929d5973a9fda59587b09a72f185a9e Processing w3lib-1.2.tar.gz Running w3lib-1.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-ZAXTgy/w3lib-1.2/egg-dist-tmp-aU3vpc zip_safe flag not set; analyzing archive contents... Adding w3lib 1.2 to easy-install.pth file Installed /usr/local/lib/python2.7/dist-packages/w3lib-1.2-py2.7.egg Searching for Twisted>=8.0 Reading http://pypi.python.org/simple/Twisted/ Reading http://www.twistedmatrix.com Reading http://twistedmatrix.com/products/download Reading http://twistedmatrix.com/ Reading http://tmrc.mit.edu/mirror/twisted/Twisted/9.0/ Reading http://tmrc.mit.edu/mirror/twisted/Twisted/10.0/ Reading http://twistedmatrix.com/projects/core/ Reading http://tmrc.mit.edu/mirror/twisted/Twisted/8.2/ Reading http://tmrc.mit.edu/mirror/twisted/Twisted/8.1/ Best match: Twisted 12.2.0 Downloading http://pypi.python.org/packages/source/T/Twisted/Twisted-12.2.0.tar.bz2#md5=9a321b904d01efd695079f8484b37861 Processing Twisted-12.2.0.tar.bz2 Running Twisted-12.2.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-kw897y/Twisted-12.2.0/egg-dist-tmp-sZWFYb In file included from /usr/include/python2.7/Python.h:8:0, from twisted/internet/_sigchld.c:9: /usr/include/python2.7/pyconfig.h:1161:0: warning: "_POSIX_C_SOURCE" redefined [enabled by default] /usr/include/features.h:215:0: note: this is the location of the previous definition twisted/internet/_sigchld.c: In function ‘got_signal’: twisted/internet/_sigchld.c:15:13: warning: variable ‘ignored_result’ set but not used [-Wunused-but-set-variable] Adding Twisted 12.2.0 to easy-install.pth file Installing mailmail script to /usr/local/bin Installing conch script to /usr/local/bin Installing pyhtmlizer script to /usr/local/bin Installing twistd script to /usr/local/bin Installing lore script to /usr/local/bin Installing tkconch script to /usr/local/bin Installing tapconvert script to /usr/local/bin Installing ckeygen script to /usr/local/bin Installing tap2rpm script to /usr/local/bin Installing manhole script to /usr/local/bin Installing trial script to /usr/local/bin Installing cftp script to /usr/local/bin Installing tap2deb script to /usr/local/bin Installed /usr/local/lib/python2.7/dist-packages/Twisted-12.2.0-py2.7-linux-x86_64.egg Finished processing dependencies for Scrapy |
表示安裝成功。
3、測試
scrapy shell http://ziki.cn |
獲取所有a標籤
hxs.select('//a').extract() |
參考資料
http://doc.scrapy.org/en/latest/intro/install.html http://doc.scrapy.org/en/latest/intro/tutorial.html |
原創文章,轉載請註明: 轉載自海波無痕
相關文章
- scrapy安裝——UbuntuUbuntu
- Ubuntu 安裝 SCRAPY 方法Ubuntu
- CentOS 安裝ScrapyCentOS
- ubuntu安裝scrapy外掛的時候缺失python.hUbuntuPython
- 安裝scrapy失敗
- CentOS6.5安裝ScrapyCentOS
- scrapy的簡介與安裝
- 安裝 UbuntuUbuntu
- ubuntu安裝Ubuntu
- scrapy入門教程1:scrapy環境配置以及安裝
- Ubuntu解除安裝和安裝Ubuntu
- python3安裝scrapy框架Python框架
- 在win10下安裝scrapyWin10
- ubuntu安裝CMakeUbuntu
- Ubuntu 安裝 RabbitMQUbuntuMQ
- ubuntu 安裝 ElasticSearchUbuntuElasticsearch
- Ubuntu 安裝 ImagickUbuntu
- ubuntu安裝redisUbuntuRedis
- Ubuntu安裝FSearchUbuntu
- Ubuntu硬碟安裝Ubuntu硬碟
- Ubuntu Docker 安裝UbuntuDocker
- ubuntu工具安裝Ubuntu
- Ubuntu 安裝 MemcachedUbuntu
- Ubuntu 安裝 ZooKeeperUbuntu
- Ubuntu 安裝 JDKUbuntuJDK
- Ubuntu 安裝 MavenUbuntuMaven
- ubuntu安裝memcachedUbuntu
- ubuntu 安裝mysqlUbuntuMySql
- kaldi安裝(Ubuntu)Ubuntu
- Ubuntu SDK 安裝Ubuntu
- Ubuntu安裝QQUbuntu
- ubuntu安裝KVMUbuntu
- ubuntu apc 安裝Ubuntu
- 安裝了UbuntuUbuntu
- [Ubuntu]安裝MysqlUbuntuMySql
- ubuntu 安裝 rosUbuntuROS
- Ubuntu安裝gitUbuntuGit
- nginx ubuntu 安裝NginxUbuntu