Here's an example of what Alexandre is talking about: http://searchhub.org/2012/02/14/indexing-with-solrj/
It mixes database fetching in with the Tika processing, but that should be pretty easy to pull out. Best, Erick On Mon, Jun 30, 2014 at 8:21 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > Under the covers, Tika is used. You can use Tika yourself on the > client side and cache it's output in the database or text file. Then, > send that to Solr instead. Puts less load on Solr as well. > > Or you can use atomic update, but then all the primary (not copyField) > fields must be stored="true". > > Regards, > Alex. > Personal website: http://www.outerthoughts.com/ > Current project: http://www.solr-start.com/ - Accelerating your Solr > proficiency > > > On Tue, Jul 1, 2014 at 5:55 AM, Gili Nachum <gilinac...@gmail.com> wrote: >> Hello, >> >> I plan to use ExtractingRequestHandler to index binary files text plus app >> metadata (like literal.downloadCount and others) into a single document. >> I expect the app metadata to change much more often than the binary file >> itself. I would hate to have to extract text from the binary file whenever >> I need to re-index the doc because of a metadata change. >> Is there a some extraction caching solution for files content? or some >> other workaround? >> >> Thanks!