Many thanks Florent
Hey All
My docs are parsed and indexes are updated (using UpdateRichDocuments
patch). But tell me onething what will happen if i don't commit ?. If commit
is false where the docs are stored ?.
Regards
Dwarak R
----- Original Message -----
From: "SDIS M. Beauchamp" <[EMAIL PROTECTED]>
To: <solr-user@lucene.apache.org>
Sent: Wednesday, November 14, 2007 1:13 PM
Subject: RE: solr - other document formats
You should take a look at
http://wiki.apache.org/solr/UpdateRichDocuments?highlight=%28richdocument%29
It gives you a starting point to make the extractor you need
Regards
Florent
-----Message d'origine-----
De : Dwarak R [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 14 novembre 2007 05:17
À : solr-user@lucene.apache.org
Objet : solr - other document formats
Hey All
I read an article on http://www.xml.com/lpt/a/1668
Its states that
"As we've seen, the XML format used by Solr for indexing is quite simple.
Extracting the relevant metadata to create these XML documents from the many
formats floating around, however, is another story. Fortunately, Lucene
users have the same problem and have been working on it for quite a while;
the Lucene FAQ lists a number of references to parsers and filters which can
be used to extract content and metadata from many common document formats.
Solr won't index spreadsheets or other formats out of the box, but that is
not its role: you should see Solr as the "search engine" component of a
broader "search system," where extraction of content and metadata is handled
by other components. This will help to keep your search system maintainable
and testable, and it helps the Solr team focus on doing one thing well."
Parsing documents like pdf, ms word document, excel to xml will be done
other component ?
Somebody advise
Regards
Dwarak R
This message is for the designated recipient only and may contain
privileged, proprietary, or otherwise private information. If you have
received it in error, please notify the sender&[EMAIL PROTECTED]
immediately and delete the original. Any other use of the email by you is
prohibited.
This message is for the designated recipient only and may contain privileged,
proprietary, or otherwise private information. If you have received it in error,
please notify the sender&[EMAIL PROTECTED] immediately and delete the
original. Any other use of the email by you is prohibited.