This is my third or fourth post in the last 24 hours. I freely admit that I
don’t know what I am doing, and that over the last several hours for this
particular issue I have been guessing, because I didn’t know what scrapy
wanted from me and I couldn’t find an answer.
Here are just a few lines from my log today. It runs over 100 pages when
pasted into my word processor. I was just trying to make this work with the
pipeline. It started with this error:
SavePipeline(item)
> TypeError: object() takes no parameters
and never got better.
I read on SO that this was because my pipeline class did not have its own
__init__ method, and so python was searching in the parent object for one.
I thought that made sense, so I put an __init__ in there, and hell ensued.
It was the usual ‘how many arguments’ problem, but when I tried giving it
only self, and leaving the rest blank or with ‘pass’, I got indentation
errors.
So I tried putting something innocuous like self.name = name, and we were
back to the how many arguments error. I tried giving it process_item as an
attribute, and after many go rounds and variations, that worked, but then
it wouldn’t take my call to the process_item method – back to the number of
arguments again. I imported my spider, and that helped, but still the
errors kept coming. It’s been about 6 hours. I have Googled all over the
place. I give up. I don’t get it. I need help.
Here is one full traceback, typical of most but hardly the only one,
followed by an abbreviated version of some others, including the last:
Traceback (most recent call last):
> File
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/twisted/internet/defer.py",
>
> line 1301, in _inlineCallbacks
> result = g.send(result)
> File
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/crawler.py",
>
> line 72, in crawl
> self.engine = self._create_engine()
> File
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/crawler.py",
>
> line 97, in _create_engine
> return ExecutionEngine(self, lambda _: self.stop())
> File
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/core/engine.py",
>
> line 70, in __init__
> self.scraper = Scraper(crawler)
> File
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/core/scraper.py",
>
> line 71, in __init__
> self.itemproc = itemproc_cls.from_crawler(crawler)
> File
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/middleware.py",
>
> line 58, in from_crawler
> return cls.from_settings(crawler.settings, crawler)
> File
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/middleware.py",
>
> line 34, in from_settings
> mwcls = load_object(clspath)
> File
> "/home/malikarumi/Projects/sukayna/lib/python3.5/site-packages/scrapy/utils/misc.py",
>
> line 44, in load_object
> mod = import_module(module)
> File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
> return _bootstrap._gcd_import(name[level:], package, level)
> File "<frozen importlib._bootstrap>", line 986, in _gcd_import
> File "<frozen importlib._bootstrap>", line 969, in _find_and_load
> File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
> File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
> File "<frozen importlib._bootstrap_external>", line 665, in exec_module
> File "<frozen importlib._bootstrap>", line 222, in
> _call_with_frames_removed
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py",
> line 87, in <module>
> class SavePipeline(object):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py",
> line 96, in SavePipeline
> SavePipeline(process_item)
> NameError: name 'SavePipeline' is not defined
> 2017-05-28 02:43:30,386:_legacy.py:154:publishToNewObserver:CRITICAL:
> Traceback (most recent call last):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py",
> line 87, in <module>
> class SavePipeline(object):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py",
> line 96, in SavePipeline
> SavePipeline(process_item)
> NameError: name 'SavePipeline' is not defined
> 2017-05-28
> 02:44:46,861:_legacy.py:154:publishToNewObserver:CRITICAL:Unhandled error
> in Deferred:
> 2017-05-28
> 02:44:46,861:_legacy.py:154:publishToNewObserver:CRITICAL:Unhandled error
> in Deferred:
> 2017-05-28 02:44:46,861:_legacy.py:154:publishToNewObserver:CRITICAL:
> Traceback (most recent call last):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py",
> line 96, in <module>
> SavePipeline(process_item)
> NameError: name 'process_item' is not defined
> 2017-05-28 02:44:46,862:_legacy.py:154:publishToNewObserver:CRITICAL:
> Traceback (most recent call last):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py",
> line 96, in <module>
> SavePipeline(process_item)
> NameError: name 'process_item' is not defined
> 2017-05-28 03:10:29,174:_legacy.py:154:publishToNewObserver:CRITICAL:
> Traceback (most recent call last):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py",
> line 100
> return cls(name = =crawler.settings.get('ITEM_PIPELINES'),)
> ^
> SyntaxError: invalid syntax
> 2017-05-28 03:10:51,021:middleware.py:53:from_settings:INFO:Enabled
> downloader middlewares:
> 2017-05-28
> 03:10:51,024:_legacy.py:154:publishToNewObserver:CRITICAL:Unhandled error
> in Deferred:
> 2017-05-28
> 03:10:51,025:_legacy.py:154:publishToNewObserver:CRITICAL:Unhandled error
> in Deferred:
> 2017-05-28 03:10:51,025:_legacy.py:154:publishToNewObserver:CRITICAL:
> Traceback (most recent call last):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py",
> line 100, in from_crawler
> return cls(name = crawler.settings.get('ITEM_PIPELINES'),)
> NameError: name 'crawler' is not defined
> 2017-05-28 03:10:51,026:_legacy.py:154:publishToNewObserver:CRITICAL:
> Traceback (most recent call last):
> File "/home/malikarumi/Projects/sukayna/acquire2/acquire2/pipeline.py",
> line 100, in from_crawler
> return cls(name = crawler.settings.get('ITEM_PIPELINES'),)
> NameError: name 'crawler' is not defined
PIPELINE.PY
> from items import Acquire2Item
> item = Acquire2Item()
> from acquire2.spiders import testerapp2
> class SavePipeline(object):
> def __init__(self, name):
> self.name = name
> def process_item(self, item, testerapp2):
> item.save()
> return
> process_item(self, item, testerapp2)
> @classmethod
> def from_crawler(cls, testerapp2):
> return cls(name = crawler.settings.get('ITEM_PIPELINES'),)
I notice there is something in there about crawler settings. I read this
http://mengyangyang.org/scrapy/topics/item-pipeline.html#from_crawler among
many other things. Obviously I don’t get it. Perhaps this is related to my
other question about settings earlier today?
I just noticed that url. This must be a Chinese copy of the docs. Don’t
think that makes a difference here.
Any help at all will be appreciated.
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.