Re: Bypassing ExtractingRequestHandler

2016-06-13 Thread Justin Lee
Thanks everyone for the help and advice. The SolrJ exmaple makes sense to me. The import of SOLR-8166 was kind of mind boggling to me, but maybe I'll revisit after some time. Tim: for context, I'm ultimately trying to create an external highlighter. See https://issues.apache.org/jira/browse/SOLR

RE: Bypassing ExtractingRequestHandler

2016-06-13 Thread Allison, Timothy B.
>Two things: Here's a sample bit of SolrJ code, pulling out the DB stuff should >be straightforward: http://searchhub.org/2012/02/14/indexing-with-solrj/ +1 > We tend to prefer running Tika externally as it's entirely possible > that Tika will crash or hang with certain files - and that will

Re: Bypassing ExtractingRequestHandler

2016-06-12 Thread Erick Erickson
Two things: Here's a sample bit of SolrJ code, pulling out the DB stuff should be straightforward: http://searchhub.org/2012/02/14/indexing-with-solrj/ It's a little out of date, but not very much so. CloudSolrServer mentioned in one of the comments has been deprecated in favor of CloudSolrClient,

Re: Bypassing ExtractingRequestHandler

2016-06-10 Thread Charlie Hull
On 10/06/2016 02:20, Justin Lee wrote: Has anybody had any experience bypassing ExtractingRequestHandler and simply managing Tika manually? I want to make a small modification to Tika to get and save additional data from my PDFs, but I have been procrastinating in no small part due to the unplea