Hello,
You probably want to use Splash for Requests that CrawlSpider generates
from the rules.
See `process_request` argument when defining CrawlSpider Rules
http://doc.scrapy.org/en/latest/topics/spiders.html#crawling-rules
Something like this:
rules = [
Rule(SgmlLinkExtractor(allow =
(r'https://detail.ju.taobao.com/.*')),
follow = False,
process_request = "use_splash"
),
Rule(SgmlLinkExtractor(allow =
(r'https://detail.tmall.com/item.htm.*')),
callback = "parse_link",
process_request = "use_splash"),
]
def use_splash(self, request):
request.meta['splash'] = {
'endpoint':'render.html',
'args':{
'wait':0.5,
}
}
return request
...
See https://github.com/scrapy/scrapy/blob/master/scrapy/spiders/crawl.py#L64
for the implementation details
Also note that SgmlLinkExtractor is not the current recommended link
extractor:
http://doc.scrapy.org/en/latest/topics/link-extractors.html#module-scrapy.linkextractors
Hope this helps.
Paul.
On Monday, November 2, 2015 at 12:00:13 PM UTC+1, Raymond Guo wrote:
>
> Hi:
> sorry that I'm not really familiar about scrapy. but I had to use scrapyJs
> to get rendered contents.
> I noticed that you have scrapySpider example but I want to use
> crawlSpider. So I wrote this:
>
>
> class JhsSpider(CrawlSpider):
> name = "jhsspy"
> allowd_domains=["taobao.com"]
> start_urls = ["https://ju.taobao.com/"]
> rules = [
> Rule(SgmlLinkExtractor(allow =
> (r'https://detail.ju.taobao.com/.*')), follow = False),
>
> Rule(SgmlLinkExtractor(allow =
> (r'https://detail.tmall.com/item.htm.*')), callback = "parse_link"),
> ]
> def parse_link(self, response):
> le = SgmlLinkExtractor()
> for link in le.extract_links(response):
> yield scrapy.Request(link.url, self.parse_item, meta={
> 'splash':{
> 'endpoint':'render.html',
> 'args':{
> 'wait':0.5,
> }
> }
> })
>
> def parse_item(self, response):
> ...get items with reponse...
>
>
>
> but I had some problem that I'm not sure what caused them. So, want to
> know is it the right way to yield request like what I did above.
>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.