Solr does not index random XML documents, (but see Martin's comments about DIH). Solr will index XML documents that have a specific format, however. The general form is: <add> <doc> <field name="xxxx">value to index</field> <field name="yyyy">value for this field </field> </doc> <doc> ................ </doc> </add>
So you can either try DIH or parse the raw XML yourself and put it in the above form for indexing... Best Erick On Thu, Mar 24, 2011 at 4:54 PM, Marcelo Iturbe <marc...@santiago.cl> wrote: > Hello, > I've been reading up on how to index XML content but have a few questions. > > How is data in element attributes handled or defined? How are nested > elements handled? > > In the following XML structure, I want to index the content of what is > between the <entry> tags. > In one XML document, there can be up to 100 <entry> tags. > So the <entry> tag would be equivalent to the <doc> tag... > > Can I somehow index this XML "as is" or will I have to parse it, creating > the <doc> tag and placing all the elements on the same level? > > Thanks for your help. > > <?xml version="1.0" encoding="utf-8"?> > <root> > <source>manual</source> > <author> > <name>MC Anon User</name> > <email>mca...@mcdomain.com</email> > </author> > > <entry> > <name> > <fullname>John Smith</fullname> > </name> > <email>jsmit...@gmail.com</email> > </entry> > > <entry> > <name> > <fullname>First Last</fullname> > <firstname>First</firstname> > <lastname>Last</lastname> > </name> > <organization> > <name>MC S.A.</name> > <tittle>CIO</tittle> > </organization> > <email type="work" primary="true">fi...@mcdomain.com</email> > <email>flas...@yahoo.com</email> > <phoneNumber type="work" primary="true">+5629460600</phoneNumber> > <im carrier="gtalk" primary="true">fi...@mcdomain.com</im> > <im carrier="skype">First.Last</im> > <postalAddress>111 Bude St, Toronto</postalAddress> > <custom name="blog">http://blog.mcdomain.com/</custom> > </entry> > </root> > > regards > Marcelo > WebRep > Overall rating >