DataImportHandler, BlobTransformer, FieldReaderDataSource and TikaEntityExtractor

Raymond Wiker Tue, 30 Jul 2013 00:01:14 -0700

I have a case where I want to documents and metadata content from a
datebase. The metadata is is not a problem, but it does not appear that I
can handle the document content (held as BLOBS in the database) with
out-of-the-box SOLR 4.4 functionality.


I was hoping to to be able to solve this by doing something like the
following:

*DataImportHandler *extracts all the columns (fields), including the
document (BLOB)

*BlobTransformer *to extract the BLOB content

*FieldReaderDataSource *as a bridge between the extracted BLOB and and Tika

*TikeEntityExtractor *to extract the text and embedded metadata from the
BLOB.

The first problem is that "BlobTransfomer" does not appear to exist. It
could be that I need to load some additional jar files, or it could be that
the "BlobTransfomer" functionality is simply not part of the Solr
distribution.

Is there a way of handling this type of content using DataImportHandler, or
do I need to write an external connector for it?

DataImportHandler, BlobTransformer, FieldReaderDataSource and TikaEntityExtractor

Reply via email to