from:"5ton3"

gnoreTikaException flag not working

2014-11-10 Thread 5ton3

Hi! I'm importing BLOBs from an Oracle DB, and want to retrieve the textual body/plaintext content for analyzing/indexing purposes. I'm using TikaEntityProcessor to do the parsing of the documents, which works fine for most of the documents. But in some cases , e.g. when a document is password pro

The exact same query gets executed n times for the nth row when retrieving body (plaintext) from BLOB column with Tika Entity Processor

2014-10-31 Thread 5ton3

Hi! Not sure if this is a problem or if I just don't understand the debug response, but it seems somewhat odd to me. The "main" entity can have multiple BLOB documents. I'm using Tika Entity Processor to retrieve the body (plaintext) from these documents and put the result in a multivalued field,

Re: Issue with multivalued fields in UIMA

2014-10-30 Thread 5ton3

I had to overcome this issue, as I needed to analyze multivalued fields. The fact that UIMA don't analyse multivalued fields is a known bug in UIMA. With the help of Maryam, I solved the issue. The JIRA issue, along with a working patch, can be found here: https://issues.apache.org/jira/browse/SOLR