You can use the DIH (Dataimport Import Handler) to split up and index that XML.
 http://wiki.apache.org/solr/DataImportHandler


Mit freundlichen Grüßen
M.Sc. Dipl.-Inf. (FH) Martin Rödig
 
SHI Elektronische Medien GmbH
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
AKTUELL - NEU - AB SOFORT 
Solr/Lucene Schulung vom 19. - 21. April in Berlin
 
Als erster zertifizierter Trainingspartner von Lucid Imagination in 
Deutschland, Österreich und Schweiz bietet SHI ab sofort 
deutschsprachige Solr Schulungen an.
Weitere Informationen: www.shi-gmbh.com/services/solr-training
Achtung: Die Anzahl der Plätze ist beschränkt!
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
- - - - - -
Postadresse: Watzmannstr. 23, 86316 Friedberg
Besuchsadresse: Curt-Frenzel-Str. 12, 86167 Augsburg
Tel.: 0821 7482633 18
Tel.: 0821 7482633 0 (Zentrale)
Fax: 0821 7482633 29

Internet: http://www.shi-gmbh.com
Registergericht Augsburg HRB 17382
Geschäftsführer: Peter Spiske
Steuernummer: 103/137/30412

-----Ursprüngliche Nachricht-----
Von: Marcelo Iturbe [mailto:marc...@santiago.cl] 
Gesendet: Donnerstag, 24. März 2011 21:55
An: solr-user@lucene.apache.org
Betreff: Newbie wants to index XML content.

Hello,
I've been reading up on how to index XML content but have a few questions.

How is data in element attributes handled or defined? How are nested elements 
handled?

In the following XML structure, I want to index the content of what is between 
the <entry> tags.
In one XML document, there can be up to 100 <entry> tags.
So the <entry> tag would be equivalent to the <doc> tag...

Can I somehow index this XML "as is" or will I have to parse it, creating the 
<doc> tag and placing all the elements on the same level?

Thanks for your help.

<?xml version="1.0" encoding="utf-8"?>
<root>
    <source>manual</source>
    <author>
        <name>MC Anon User</name>
        <email>mca...@mcdomain.com</email>
    </author>

    <entry>
        <name>
            <fullname>John Smith</fullname>
        </name>
        <email>jsmit...@gmail.com</email>
    </entry>

    <entry>
        <name>
            <fullname>First Last</fullname>
            <firstname>First</firstname>
            <lastname>Last</lastname>
        </name>
        <organization>
            <name>MC S.A.</name>
            <tittle>CIO</tittle>
        </organization>
        <email type="work" primary="true">fi...@mcdomain.com</email>
        <email>flas...@yahoo.com</email>
        <phoneNumber type="work" primary="true">+5629460600</phoneNumber>
        <im carrier="gtalk" primary="true">fi...@mcdomain.com</im>
        <im carrier="skype">First.Last</im>
        <postalAddress>111 Bude St, Toronto</postalAddress>
        <custom name="blog">http://blog.mcdomain.com/</custom>
    </entry>
</root>

regards
Marcelo
WebRep
Overall rating

Reply via email to