If your XML documents are of a fixed schema, you may want to look at DataImportHandler with XPathEntityProcessor
http://wiki.apache.org/solr/DataImportHandler On Mon, Dec 22, 2008 at 5:49 PM, Jana, Kumar Raja <kj...@ptc.com> wrote: > Hi, > > > > I want to perform scoped searches in XML documents using Solr. I am > using Solr-Cell to index my document files. I've noticed that when I > index an xml file to Solr (via Solr-Cell) the field tags get stripped > off and only the values are sent to Solr. > > i.e. Say I have an XML document which contains the following data: > > <test> > > <node1> > > <inner_node1>XYZ</inner_node1> > > <inner_node2>ABC</inner_node2> > > <sometag>PPPP</sometag> > > </node1> > > <node1> > > .... > > </node1> > > </test> > > > > When I index this xml file, only the field values(XYZ, ABC and PPPP) > seem to go to Solr and the tag elements are stripped off!!! (Although > probing a bit more into the cause seems to point out that this is what > Apache Tika does). > > > > Is there any setting or feature which would enable me to preserve the > field/tag information and hence allow me to perform scoped searches > using Solr? > > > > Just to clear any confusion by the term "scoped search": > > What I mean by scoped search is when I index the above xml document, > Scoped search would allow me to find all occurrences of ABC within the > <inner_node2> XML tag. > > > > > > -Kumar > > -- Regards, Shalin Shekhar Mangar.