Re: Newbie wants to index XML content.

Erick Erickson Fri, 25 Mar 2011 05:53:13 -0700

Solr does not index random XML documents, (but see Martin's comments
about DIH). Solr will index XML documents that have a specific format,
however. The general form is:
<add>
<doc>
  <field name="xxxx">value to index</field>
  <field name="yyyy">value for this field </field>
</doc>
<doc>
    ................
</doc>
</add>


So you can either try DIH or parse the raw XML yourself and put it in the above
form for indexing...

Best
Erick

On Thu, Mar 24, 2011 at 4:54 PM, Marcelo Iturbe <marc...@santiago.cl> wrote:
> Hello,
> I've been reading up on how to index XML content but have a few questions.
>
> How is data in element attributes handled or defined? How are nested
> elements handled?
>
> In the following XML structure, I want to index the content of what is
> between the <entry> tags.
> In one XML document, there can be up to 100 <entry> tags.
> So the <entry> tag would be equivalent to the <doc> tag...
>
> Can I somehow index this XML "as is" or will I have to parse it, creating
> the <doc> tag and placing all the elements on the same level?
>
> Thanks for your help.
>
> <?xml version="1.0" encoding="utf-8"?>
> <root>
>    <source>manual</source>
>    <author>
>        <name>MC Anon User</name>
>        <email>mca...@mcdomain.com</email>
>    </author>
>
>    <entry>
>        <name>
>            <fullname>John Smith</fullname>
>        </name>
>        <email>jsmit...@gmail.com</email>
>    </entry>
>
>    <entry>
>        <name>
>            <fullname>First Last</fullname>
>            <firstname>First</firstname>
>            <lastname>Last</lastname>
>        </name>
>        <organization>
>            <name>MC S.A.</name>
>            <tittle>CIO</tittle>
>        </organization>
>        <email type="work" primary="true">fi...@mcdomain.com</email>
>        <email>flas...@yahoo.com</email>
>        <phoneNumber type="work" primary="true">+5629460600</phoneNumber>
>        <im carrier="gtalk" primary="true">fi...@mcdomain.com</im>
>        <im carrier="skype">First.Last</im>
>        <postalAddress>111 Bude St, Toronto</postalAddress>
>        <custom name="blog">http://blog.mcdomain.com/</custom>
>    </entry>
> </root>
>
> regards
> Marcelo
> WebRep
> Overall rating
>

Re: Newbie wants to index XML content.

Reply via email to