It sounds like you haven't yet looked at the way Solr handles fields. I
assume that Solr-Cell (which I haven't looked at yet but hope to soon)
indexes everything into a single field. When using Solr on its own, the
first thing you do is create a schema that specifies the fields you want
in your index; you then massage your xml into the form Solr expects. In
your example you would end up with input documents somehting like 
<doc>
        <field name="inner_node1">XYZ</field>
        <field name="inner_node2">ABC</field>
        <field name="sometag">PPPP</field>
</doc>

(That applies to updating the index by posting xml to Solr; there are
many other mechanisms for populating the index now, but the basic ideas
of specifying fields remain the same).
The wiki page on Solr schemas (http://wiki.apache.org/solr/SchemaXml)
and the sample schema linked there will make it clear how to specify
your fields. 

You will then be able to specify fields in your queries like
"sometag:PPPP".  

Now you'll need to figure out how this underlying Solr functionality is
exposed by Solr-Cell, but I hope this is a start.

Peter


> -----Original Message-----
> From: Jana, Kumar Raja [mailto:kj...@ptc.com] 
> Sent: Monday, December 22, 2008 6:30 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Scoped searches in XML documents
> 
> Hi Shalin,
> 
> Thanks for the quick response. I've found my mistake. It was 
> actually a silly setting in my application before sending the 
> documents to Solr-Cell which was stripping off the xml tags. 
> I was able to index the document with the xml tags. Sorry for 
> being so hasty.
> 
> So the only question left is, will I be able to perform 
> scoped searches using Solr? Is this already implemented in 
> Solr or is there a workaround?
> 
> Thanks
> Kumar
> 
> 
> -----Original Message-----
> From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
> Sent: Monday, December 22, 2008 6:27 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Scoped searches in XML documents
> 
> If your XML documents are of a fixed schema, you may want to 
> look at DataImportHandler with XPathEntityProcessor
> 
> http://wiki.apache.org/solr/DataImportHandler
> 
> On Mon, Dec 22, 2008 at 5:49 PM, Jana, Kumar Raja 
> <kj...@ptc.com> wrote:
> 
> > Hi,
> >
> >
> >
> > I want to perform scoped searches in XML documents using Solr. I am 
> > using Solr-Cell to index my document files. I've noticed 
> that when I 
> > index an xml file to Solr (via Solr-Cell) the field tags 
> get stripped 
> > off and only the values are sent to Solr.
> >
> > i.e. Say I have an XML document which contains the following data:
> >
> > <test>
> >
> >    <node1>
> >
> >        <inner_node1>XYZ</inner_node1>
> >
> >        <inner_node2>ABC</inner_node2>
> >
> >        <sometag>PPPP</sometag>
> >
> >    </node1>
> >
> >    <node1>
> >
> >        ....
> >
> >    </node1>
> >
> > </test>
> >
> >
> >
> > When I index this xml file, only the field values(XYZ, ABC 
> and PPPP) 
> > seem to go to Solr and the tag elements are stripped off!!! 
> (Although 
> > probing a bit more into the cause seems to point out that 
> this is what 
> > Apache Tika does).
> >
> >
> >
> > Is there any setting or feature which would enable me to 
> preserve the 
> > field/tag information and hence allow me to perform scoped searches 
> > using Solr?
> >
> >
> >
> > Just to clear any confusion by the term "scoped search":
> >
> > What I mean by scoped search is when I index the above xml 
> document, 
> > Scoped search would allow me to find all occurrences of ABC 
> within the 
> > <inner_node2> XML tag.
> >
> >
> >
> >
> >
> > -Kumar
> >
> >
> 
> 
> --
> Regards,
> Shalin Shekhar Mangar.
> 
> 

Reply via email to