Re: ExtractingRequestHandler - extracted files caching?

Erick Erickson Mon, 30 Jun 2014 21:17:27 -0700

Here's an example of what Alexandre is
talking about:
http://searchhub.org/2012/02/14/indexing-with-solrj/


It mixes database fetching in with the
Tika processing, but that should be pretty easy
to pull out.

Best,
Erick

On Mon, Jun 30, 2014 at 8:21 PM, Alexandre Rafalovitch
<arafa...@gmail.com> wrote:
> Under the covers, Tika is used. You can use Tika yourself on the
> client side and cache it's output in the database or text file. Then,
> send that to Solr instead. Puts less load on Solr as well.
>
> Or you can use atomic update, but then all the primary (not copyField)
> fields must be stored="true".
>
> Regards,
>    Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr 
> proficiency
>
>
> On Tue, Jul 1, 2014 at 5:55 AM, Gili Nachum <gilinac...@gmail.com> wrote:
>> Hello,
>>
>> I plan to use ExtractingRequestHandler to index binary files text plus app
>> metadata (like literal.downloadCount and others) into a single document.
>> I expect the app metadata to change much more often than the binary file
>> itself. I would hate to have to extract text from the binary file whenever
>> I need to re-index the doc because of a metadata change.
>> Is there a some extraction caching solution for files content? or some
>> other workaround?
>>
>> Thanks!

Re: ExtractingRequestHandler - extracted files caching?

Reply via email to