Re: Problems importing HTML content contained within XML document

Noble Paul നോബിള്‍ नोब्ळ् Wed, 19 Aug 2009 02:37:27 -0700

sorry
<field column="textContent" xpath="/document/category/BODY" flatten="true"/>


2009/8/19 Noble Paul നോബിള്‍  नोब्ळ् <noble.p...@corp.aol.com>:
> try this
> <field column="textContent" xpath="/document/category/BODY" faltten="true"/>
>
> this should slurp al the tags under body
>
> On Wed, Aug 19, 2009 at 1:44 PM, venn hardy<venn.ha...@hotmail.com> wrote:
>>
>> Hello,
>>
>> I have just started trying out SOLR to index some XML documents that I 
>> receive. I am
>> using the SOLR 1.3 and its HttpDataSource in conjunction with the 
>> XPathEntityProcessor.
>>
>>
>>
>> I am finding the data import really useful so far, but I am having a few 
>> problems when
>> I try and import HTML contained within one of the XML tags <BODY>. The data 
>> import just seems
>> to ignore the textContent silently but it imports everything else.
>>
>>
>>
>> When I do a query through the SOLR admin interface, only the id and author 
>> fields are displayed.
>>
>> Any ideas what I am doing wrong?
>>
>>
>>
>> Thanks
>>
>>
>>
>> This is what my dataConfig looks like:
>> <dataConfig>
>>  <dataSource type="HttpDataSource" />
>>  <document>
>>  <entity name="archive" pk="id" 
>> url="http://localhost:9080/data/20090817070752.xml"; 
>> processor="XPathEntityProcessor" forEach="/document/category" 
>> transformer="DateFormatTransformer" stream="true" dataSource="dataSource">
>>         <field column="id" xpath="/document/category/reference" />
>>  <field column="textContent" xpath="/document/category/BODY" />
>>  <field column="author" xpath="/document/category/author" />
>>  </entity>
>>  </document>
>> </dataConfig>
>>
>>
>>
>> This is how I have specified my schema
>> <fields>
>>   <field name="id" type="string" indexed="true" stored="true" 
>> required="true" />
>>   <field name="author" type="string" indexed="true" stored="true"/>
>>   <field name="textContent" type="text" indexed="true" stored="true" />
>> </fields>
>>
>>  <uniqueKey>id</uniqueKey>
>>  <defaultSearchField>id</defaultSearchField>
>>
>>
>>
>> And this is what my XML document looks like:
>>
>> <document>
>>  <category>
>>  <reference>123456</reference>
>>  <author>Authori name</author>
>>  <BODY>
>>  <P>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
>>  Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus 
>> varius varius felis ut vestibulum</P>
>>  <P>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem 
>> elit,
>>  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
>> vestibulum</P>
>>  <P>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem 
>> elit,
>>  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
>> vestibulum</P>
>>  </BODY>
>>  </category>
>> </document>
>>
>> _________________________________________________________________
>> Looking for a place to rent, share or buy this winter? Find your next place 
>> with Ninemsn property
>> http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline&_t=774152450&_r=Domain_tagline&_m=EXT
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Problems importing HTML content contained within XML document

Reply via email to