RE: Problems importing HTML content contained within XML document

2009-08-20 Thread venn hardy
Thanks Paul, I upgraded to solr 1.4 and used the flatten attribute as you suggested. It works well. > From: noble.p...@corp.aol.com > Date: Wed, 19 Aug 2009 15:05:48 +0530 > Subject: Re: Problems importing HTML content contained within XML document > To: solr-user@lucene.apache.

Re: Problems importing HTML content contained within XML document

2009-08-19 Thread Noble Paul നോബിള്‍ नोब्ळ्
sorry 2009/8/19 Noble Paul നോബിള്‍ नोब्ळ् : > try this > > > this should slurp al the tags under body > > On Wed, Aug 19, 2009 at 1:44 PM, venn hardy wrote: >> >> Hello, >> >> I have just started trying out SOLR to index some XML documents that I >> receive. I am >> using the SOLR 1.3 and its

Re: Problems importing HTML content contained within XML document

2009-08-19 Thread Noble Paul നോബിള്‍ नोब्ळ्
try this this should slurp al the tags under body On Wed, Aug 19, 2009 at 1:44 PM, venn hardy wrote: > > Hello, > > I have just started trying out SOLR to index some XML documents that I > receive. I am > using the SOLR 1.3 and its HttpDataSource in conjunction with the > XPathEntityProcessor.

Re: Problems importing HTML content contained within XML document

2009-08-19 Thread Martijn v Groningen
Hi Venn, I think what is happening when the BODY element is being processed by xpath expressen (/document/category/BODY), is that it does not retrieve the text content from the P elements inside the body element. The expression will only retrieve text content that is directly a child of the BODY e

Problems importing HTML content contained within XML document

2009-08-19 Thread venn hardy
Hello, I have just started trying out SOLR to index some XML documents that I receive. I am using the SOLR 1.3 and its HttpDataSource in conjunction with the XPathEntityProcessor. I am finding the data import really useful so far, but I am having a few problems when I try and import HTML c