On Fri, Sep 11, 2009 at 6:48 AM, venn hardy <venn.ha...@hotmail.com> wrote:
>
> Hi Fergus,
>
> When I debugged in the development console 
> http://localhost:9080/solr/admin/dataimport.jsp?handler=/dataimport
>
> I had no problems. Each category/item seems to be only indexed once, and no 
> parent fields are available (except the category name).
>
> I am not entirely sure how the forEach statement works, but my interpretation 
> of forEach="/document/category/item | /document/category" is something like 
> this:
>
> 1. Whenever DIH encounters a document/category it will extract the 
> /document/category/
>
> name field as a common field
> 2. Whenever DIH encounters a document/category/item it will extract all of 
> the item fields.
> 3. When all fields have been encountered, save the document in solr and go to 
> the next category/item

/document/category/item | /document/category

means there are two paths which triggers a new doc (it is possible to
have more). Whenever it encounters the closing tag of that xpath , it
emits all the fields it collected since the opening of the same tag.
after that it clears all the fields it collected since the opening of
the tag.

If there are fields it collected before opening of the same tag, it retains it



>
>
>> Date: Thu, 10 Sep 2009 14:19:31 +0100
>> To: solr-user@lucene.apache.org
>> From: fer...@twig.me.uk
>> Subject: RE: Extract info from parent node during data import
>>
>> >Hi Paul,
>> >The forEach="/document/category/item | /document/category/name" didn't work 
>> >(no categoryname was stored or indexed).
>> >However forEach="/document/category/item | /document/category" seems to 
>> >work well. I am not sure why category on its own works, but not 
>> >category/name...
>> >But thanks for tip. It wasn't as painful as I thought it would be.
>> >Venn
>>
>> Hmmm, I had bother with this. Although each occurance of 
>> /document/category/item
>> causes a new solr document to indexed, that document contained all the 
>> fields from
>> the parent element as well.
>>
>> Did you see this?
>>
>> >
>> >> From: noble.p...@corp.aol.com
>> >> Date: Thu, 10 Sep 2009 09:58:21 +0530
>> >> Subject: Re: Extract info from parent node during data import
>> >> To: solr-user@lucene.apache.org
>> >>
>> >> try this
>> >>
>> >> add two xpaths in your forEach
>> >>
>> >> forEach="/document/category/item | /document/category/name"
>> >>
>> >> and add a field as follows
>> >>
>> >> <field column="catgoryname" xpath ="/document/category/name"
>> >> commonField="true"/>
>> >>
>> >> Please try it out and let me know.
>> >>
>> >> On Thu, Sep 10, 2009 at 7:30 AM, venn hardy <venn.ha...@hotmail.com> 
>> >> wrote:
>> >> >
>> >> > Hello,
>> >> >
>> >> >
>> >> >
>> >> > I am using SOLR 1.4 (from nighly build) and its URLDataSource in 
>> >> > conjunction with the XPathEntityProcessor. I have successfully imported 
>> >> > XML content, but I think I may have found a limitation when it comes to 
>> >> > the commonField attribute in the DataImportHandler.
>> >> >
>> >> >
>> >> >
>> >> > Before writing my own parser to read in a whole XML document, I thought 
>> >> > I'd post the question here (since I got some great advice last time).
>> >> >
>> >> >
>> >> >
>> >> > The bulk of my content is contained within each <item> tag. However, 
>> >> > each item has a parent called <category> and each category has a name 
>> >> > which I would like to import. In my forEach loop I specify the 
>> >> > /document/category/item as the collection of items I am interested in. 
>> >> > Is there anyway to extract an element from underneath a parent node? To 
>> >> > be a more more specific (see eg xml below). I would like to index the 
>> >> > following:
>> >> >
>> >> > - category: Category 1; id: 1; author: Author 1
>> >> >
>> >> > - category: Category 1; id: 2; author: Author 2
>> >> >
>> >> > - category: Category 2; id: 3; author: Author 3
>> >> >
>> >> > - category: Category 2; id: 4; author: Author 4
>> >> >
>> >> >
>> >> >
>> >> > Any ideas on how I can get to a parent node from within a child during 
>> >> > data import? If it cant be done, what do you suggest would be the best 
>> >> > way so I can keep using the DataImportHandler... would XSLT be a good 
>> >> > idea to 'flatten out' the structure a bit?
>> >> >
>> >> >
>> >> >
>> >> > Thanks
>> >> >
>> >> >
>> >> >
>> >> > This is what my XML document looks like:
>> >> >
>> >> > <document>
>> >> > <category>
>> >> > <name>Category 1</name>
>> >> > <item>
>> >> > <id>1</id>
>> >> > <author>Author 1</author>
>> >> > </item>
>> >> > <item>
>> >> > <id>2</id>
>> >> > <author>Author 2</author>
>> >> > </item>
>> >> > </category>
>> >> > <category>
>> >> > <name>Category 2</name>
>> >> > <item>
>> >> > <id>3</id>
>> >> > <author>Author 3</author>
>> >> > </item>
>> >> > <item>
>> >> > <id>4</id>
>> >> > <author>Author 4</author>
>> >> > </item>
>> >> > </category>
>> >> > </document>
>> >> >
>> >> >
>> >> >
>> >> > And this is what my dataConfig looks like:
>> >> > <dataConfig>
>> >> > <dataSource type="URLDataSource" />
>> >> > <document>
>> >> > <entity name="archive" pk="id" 
>> >> > url="http://localhost:9080/data/20090817070752.xml"; 
>> >> > processor="XPathEntityProcessor" forEach="/document/category/item" 
>> >> > transformer="DateFormatTransformer" stream="true" 
>> >> > dataSource="dataSource">
>> >> > <field column="category" xpath="/document/category/name" 
>> >> > commonField="true" />
>> >> > <field column="id" xpath="/document/category/item/id" />
>> >> > <field column="author" xpath="/document/category/item/author" />
>> >> > </entity>
>> >> > </document>
>> >> > </dataConfig>
>> >> >
>> >> >
>> >> >
>> >> > This is how I have specified my schema
>> >> > <fields>
>> >> > <field name="id" type="string" indexed="true" stored="true" 
>> >> > required="true" />
>> >> > <field name="author" type="string" indexed="true" stored="true"/>
>> >> > <field name="category" type="string" indexed="true" stored="true"/>
>> >> > </fields>
>> >> >
>> >> > <uniqueKey>id</uniqueKey>
>> >> > <defaultSearchField>id</defaultSearchField>
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > _________________________________________________________________
>> >> > Need a place to rent, buy or share? Let us find your next place for you!
>> >> > http://clk.atdmt.com/NMN/go/157631292/direct/01/
>> >>
>> >>
>> >>
>> >> --
>> >> -----------------------------------------------------
>> >> Noble Paul | Principal Engineer| AOL | http://aol.com
>> >
>> >_________________________________________________________________
>> >Get Hotmail on your iPhone Find out how here
>> >http://windowslive.ninemsn.com.au/article.aspx?id=845706
>>
>> --
>>
>> ===============================================================
>> Fergus McMenemie Email:fer...@twig.me.uk
>> Techmore Ltd Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets Analyst Programmer
>> ===============================================================
>
> _________________________________________________________________
> Need a place to rent, buy or share? Let us find your next place for you!
> http://clk.atdmt.com/NMN/go/157631292/direct/01/



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Reply via email to