solr - other document formats

Dwarak R Tue, 13 Nov 2007 20:22:20 -0800

Hey All

I read an article on http://www.xml.com/lpt/a/1668


Its states that 

"As we've seen, the XML format used by Solr for indexing is quite simple. 
Extracting the relevant metadata to create these XML documents from the many 
formats floating around, however, is another story. Fortunately, Lucene users 
have the same problem and have been working on it for quite a while; the Lucene 
FAQ lists a number of references to parsers and filters which can be used to 
extract content and metadata from many common document formats. 
Solr won't index spreadsheets or other formats out of the box, but that is not 
its role: you should see Solr as the "search engine" component of a broader 
"search system," where extraction of content and metadata is handled by other 
components. This will help to keep your search system maintainable and 
testable, and it helps the Solr team focus on doing one thing well."

Parsing documents like pdf, ms word document, excel to xml will be done other 
component ? 

Somebody advise 

Regards

Dwarak R

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise private information. If you have received it in 
error, please notify the sender&[EMAIL PROTECTED]  immediately and delete the 
original. Any other use of the email by you is prohibited.

solr - other document formats

Reply via email to