Hi!
I'm importing BLOBs from an Oracle DB, and want to retrieve the textual
body/plaintext content for analyzing/indexing purposes. I'm using
TikaEntityProcessor to do the parsing of the documents, which works fine for
most of the documents. But in some cases , e.g. when a document is password
pro
Hi!
Not sure if this is a problem or if I just don't understand the debug
response, but it seems somewhat odd to me.
The "main" entity can have multiple BLOB documents. I'm using Tika Entity
Processor to retrieve the body (plaintext) from these documents and put the
result in a multivalued field,
I had to overcome this issue, as I needed to analyze multivalued fields. The
fact that UIMA don't analyse multivalued fields is a known bug in UIMA. With
the help of Maryam, I solved the issue. The JIRA issue, along with a working
patch, can be found here: https://issues.apache.org/jira/browse/SOLR