Hello. I'm writing a spider for a website that gathers some hyperlinks,
then visits them and checks if something exists and returns the results
into a text file.
I have a for loop that yields requests, calling a parse2 function that
checks the link and updates the text file.
evenselectorlist = response.css('table[id="result_table"] tr.even')
for evenselector in evenselectorlist:
relative =
evenselector.css('a[title="Link"]::attr(href)').extract_first()
yield scrapy.Request(response.urljoin(relative),
callback=self.parse2,meta={'item':item},dont_filter=True)
def parse2(self, response):
#txt file stuff
Is there a way to make the first parse function pause when the request is
yielded? I would like to continue to do some stuff AFTER the new requests
have ended.
For example, I'd like to have a counter to see how many links have the
information I want, which is available only after all the links have been
visited.
I hope you understand what I'm trying to say. Thank you!
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.