Data Import Handler Rich Format Documents

Tod Fri, 18 Jun 2010 05:54:05 -0700

I have a database containing Metadata from a content management system.Part of that data includes a URL pointing to the actual publisheddocument which can be an HTML file or a PDF, MS Word/Excel/Powerpoint, etc.

I'm already indexing the Metadata and that provides a lot of value. Thecustomer however would like that the content pointed to by the URL alsobe indexed for more discrete searching.


This article at Lucid:

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Searching-rich-format-documents-stored-DBMS

describes the process of coding a custom transformer. A separatearticle I've read implies Nutch could be used to provide thisfunctionality too.

What would be the best and most efficient way to accomplish what I'mtrying to do? I have a feeling the Lucid article might be dated andthere might ways to do this now without any coding and maybe withouteven needing to use Nutch. I'm using the current release version of Solr.


Thanks in advance.


- Tod

Data Import Handler Rich Format Documents

Reply via email to