Using Scrapy on local XML files

Sayth Renshaw Sat, 19 Dec 2015 09:45:06 -0800

Hi

Relatively new to scrapy, how can i use scrapy to parse XML files from a 
local file system.



I have a relatively modest alteration of base scaffold.

import scrapy


class ScrapexmlItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    meeting = scrapy.Field()
    number = scrapy.Field()
    name = scrapy.Field()

and example.py

# -*- coding: utf-8 -*-
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

class ScrapexmlItem(CrawlSpider):
    name = 'ScrapexmlItem'

    def __init__(self, filename=None):
        if filename:
            with open(filename, 'r') as f:
                self.start_urls = f.readlines()


    def parse(self, response):
        pass

then from root directory i am trying to run the spider with, below, which 
fails due to keykerror scrapy crawl MySpider -a filename=2015219RHIL0.xml

scrapy crawl MySpider -a filename=2015219RHIL0.xml


I have based my example.py on this SO 
post http://stackoverflow.com/a/17307762/461887  but I am not sure i am 
really approaching it in the correct way. I am hoping just to open and then 
use the xpath selectors in scrapy to put the data i want in a pipeline.


Is there a more default way to approach this in scrapy?

Cheers

Sayth

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Using Scrapy on local XML files

Reply via email to