Scrapy:安装:修订间差异
imported>Soleverlee 以“=安装= 在Python2.7x86 windows环境下安装Scrapy。首先安装[http://sourceforge.net/projects/pywin32/ pywin32]以及[https://download.microsoft.com/download/...”为内容创建页面 |
imported>Soleverlee |
||
第14行: | 第14行: | ||
</source> | </source> | ||
=使用= | =使用= | ||
一个Hello world,存储为hello.py | |||
<source lang="python"> | <source lang="python"> | ||
import scrapy | |||
class QuotesSpider(scrapy.Spider): | |||
name = "quotes" | |||
start_urls = [ | |||
'http://quotes.toscrape.com/tag/humor/', | |||
] | |||
def parse(self, response): | |||
for quote in response.css('div.quote'): | |||
yield { | |||
'text': quote.css('span.text::text').extract_first(), | |||
'author': quote.xpath('span/small/text()').extract_first(), | |||
} | |||
next_page = response.css('li.next a::attr("href")').extract_first() | |||
if next_page is not None: | |||
next_page = response.urljoin(next_page) | |||
yield scrapy.Request(next_page, callback=self.parse) | |||
</source> | |||
运行: | |||
</source lang="bash"> | |||
scrapy runspider hello.py -o hello.json | |||
</bash> | |||
结果: | |||
<source lang="text"> | |||
[ | |||
{"text": "\u201cThe person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.\u201d", "author": "Jane Austen"}, | |||
{"text": "\u201cA day without sunshine is like, you know, night.\u201d", "author": "Steve Martin"}, | |||
{"text": "\u201cAnyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.\u201d", "author": "Garrison Keillor"}, | |||
{"text": "\u201cBeauty is in the eye of the beholder and it may be necessary from time to time to give a stupid or misinformed beholder a black eye.\u201d", "author": "Jim Henson"}, | |||
{"text": "\u201cAll you need is love. But a little chocolate now and then doesn't hurt.\u201d", "author": "Charles M. Schulz"}, | |||
{"text": "\u201cRemember, we're madly in love, so it's all right to kiss me anytime you feel like it.\u201d", "author": "Suzanne Collins"}, | |||
{"text": "\u201cSome people never go crazy. What truly horrible lives they must lead.\u201d", "author": "Charles Bukowski"}, | |||
{"text": "\u201cThe trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.\u201d", "author": "Terry Pratchett"}, | |||
{"text": "\u201cThink left and think right and think low and think high. Oh, the thinks you can think up if only you try!\u201d", "author": "Dr. Seuss"}, | |||
{"text": "\u201cThe reason I talk to myself is because I\u2019m the only one whose answers I accept.\u201d", "author": "George Carlin"}, | |||
{"text": "\u201cI am free of all prejudice. I hate everyone equally. \u201d", "author": "W.C. Fields"}, | |||
{"text": "\u201cA lady's imagination is very rapid; it jumps from admiration to love, from love to matrimony in a moment.\u201d", "author": "Jane Austen"} | |||
] | |||
</source> | </source> | ||
[[Category:Programing]] | [[Category:Programing]] |
2017年1月9日 (一) 13:03的版本
安装
在Python2.7x86 windows环境下安装Scrapy。首先安装pywin32以及VC9
easy_install pip
pip install Scrapy
这个时候会报错。 Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?
pip install wheel
wget http://www.lfd.uci.edu/~gohlke/pythonlibs/f9r7rmd8/lxml-3.7.2-cp27-cp27m-win32.whl
pip install lxml-3.7.2-cp27-cp27m-win32.whl
pip install Scrapy
使用
一个Hello world,存储为hello.py
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = [
'http://quotes.toscrape.com/tag/humor/',
]
def parse(self, response):
for quote in response.css('div.quote'):
yield {
'text': quote.css('span.text::text').extract_first(),
'author': quote.xpath('span/small/text()').extract_first(),
}
next_page = response.css('li.next a::attr("href")').extract_first()
if next_page is not None:
next_page = response.urljoin(next_page)
yield scrapy.Request(next_page, callback=self.parse)
运行: </source lang="bash"> scrapy runspider hello.py -o hello.json </bash> 结果:
[
{"text": "\u201cThe person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.\u201d", "author": "Jane Austen"},
{"text": "\u201cA day without sunshine is like, you know, night.\u201d", "author": "Steve Martin"},
{"text": "\u201cAnyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.\u201d", "author": "Garrison Keillor"},
{"text": "\u201cBeauty is in the eye of the beholder and it may be necessary from time to time to give a stupid or misinformed beholder a black eye.\u201d", "author": "Jim Henson"},
{"text": "\u201cAll you need is love. But a little chocolate now and then doesn't hurt.\u201d", "author": "Charles M. Schulz"},
{"text": "\u201cRemember, we're madly in love, so it's all right to kiss me anytime you feel like it.\u201d", "author": "Suzanne Collins"},
{"text": "\u201cSome people never go crazy. What truly horrible lives they must lead.\u201d", "author": "Charles Bukowski"},
{"text": "\u201cThe trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.\u201d", "author": "Terry Pratchett"},
{"text": "\u201cThink left and think right and think low and think high. Oh, the thinks you can think up if only you try!\u201d", "author": "Dr. Seuss"},
{"text": "\u201cThe reason I talk to myself is because I\u2019m the only one whose answers I accept.\u201d", "author": "George Carlin"},
{"text": "\u201cI am free of all prejudice. I hate everyone equally. \u201d", "author": "W.C. Fields"},
{"text": "\u201cA lady's imagination is very rapid; it jumps from admiration to love, from love to matrimony in a moment.\u201d", "author": "Jane Austen"}
]