Hello, 

 

I am using SOLR 1.4 (from nighly build) and its URLDataSource in conjunction 
with the XPathEntityProcessor. I have successfully imported XML content, but I 
think I may have found a limitation when it comes to the commonField attribute 
in the DataImportHandler. 

 

Before writing my own parser to read in a whole XML document, I thought I'd 
post the question here (since I got some great advice last time).

 

The bulk of my content is contained within each <item> tag. However, each item 
has a parent called <category> and each category has a name which I would like 
to import. In my forEach loop I specify the /document/category/item as the 
collection of items I am interested in. Is there anyway to extract an element 
from underneath a parent node? To be a more more specific (see eg xml below). I 
would like to index the following:

- category: Category 1; id: 1; author: Author 1

- category: Category 1; id: 2; author: Author 2

- category: Category 2; id: 3; author: Author 3

- category: Category 2; id: 4; author: Author 4

 

Any ideas on how I can get to a parent node from within a child during data 
import? If it cant be done, what do you suggest would be the best way so I can 
keep using the DataImportHandler... would XSLT be a good idea to 'flatten out' 
the structure a bit?

 

Thanks

 

This is what my XML document looks like:

<document>
 <category>
  <name>Category 1</name>
  <item>
   <id>1</id>
   <author>Author 1</author>
  </item>
  <item>
   <id>2</id>
   <author>Author 2</author>
  </item>
 </category>
 <category>
  <name>Category 2</name>
  <item>
   <id>3</id>
   <author>Author 3</author>
  </item>
  <item>
   <id>4</id>
   <author>Author 4</author>
  </item>
 </category>
</document>

 

And this is what my dataConfig looks like:
<dataConfig>
  <dataSource type="URLDataSource" />
  <document>
   <entity name="archive" pk="id" 
url="http://localhost:9080/data/20090817070752.xml"; 
processor="XPathEntityProcessor" forEach="/document/category/item" 
transformer="DateFormatTransformer" stream="true" dataSource="dataSource">
    <field column="category" xpath="/document/category/name" commonField="true" 
/>
    <field column="id" xpath="/document/category/item/id" />
    <field column="author" xpath="/document/category/item/author" />
   </entity>
  </document>
</dataConfig>

 

This is how I have specified my schema
<fields>
   <field name="id" type="string" indexed="true" stored="true" required="true" 
/> 
   <field name="author" type="string" indexed="true" stored="true"/>
   <field name="category" type="string" indexed="true" stored="true"/>
</fields>

<uniqueKey>id</uniqueKey>
<defaultSearchField>id</defaultSearchField>

 


 

_________________________________________________________________
Need a place to rent, buy or share? Let us find your next place for you! 
http://clk.atdmt.com/NMN/go/157631292/direct/01/
  • Extract info from pare... venn hardy

Reply via email to