Re: Tika analyzers

2014-07-30 Thread Alexandre Rafalovitch
Solr effectively supports only one binary document that gets indexed. This is because you are not actually indexing the document. You are extracting metadata (e.g. Author) and content fields out of it and map it to the "Solr document". So, it makes no sense to have two fields that are binary becaus

Re: Tika analyzers

2014-07-30 Thread Erick Erickson
Hmmm, might a custom update processor do that? In an update processor, you'd get the binary and be able to do anything at all you wanted to with that. I'm not quite clear on how the binary gets through the Tika bits and gets passed in in the first place, but Best, Erick On Wed, Jul 30, 2014

Tika analyzers

2014-07-30 Thread Tommaso Teofili
Hi all, while SolrCell works nicely when in need of indexing binary documents, I am wondering about the possibility of having Lucene / Solr documents that have binaries in specific Lucene fields, e.g. title="a nice doc", name"blabla.doc", binary="0x1234...". In that case the "binary" field should