No direct help but a bunch of related random thoughts:
1) How are you running Tika? As a jar loading from scratch every time? Tika
can also run in a server mode where it listens to a network socket. You
send the file, it sends the extract back. Might be faster.
2) Deleting old stuff. You can inde
On 10/3/2013 11:29 PM, Sadler, Anthony wrote:
> Time:
> -
> On some servers we're dealing with something in the region of a million or
> more files. Indexing that many times takes upwards of 48 hours or more. While
> the script is now fairly stable and fault tolerant, that is still a pretty