Re: Indexing file system contents

2013-10-04 Thread Alexandre Rafalovitch
No direct help but a bunch of related random thoughts: 1) How are you running Tika? As a jar loading from scratch every time? Tika can also run in a server mode where it listens to a network socket. You send the file, it sends the extract back. Might be faster. 2) Deleting old stuff. You can inde

Re: Indexing file system contents

2013-10-04 Thread Shawn Heisey
On 10/3/2013 11:29 PM, Sadler, Anthony wrote: > Time: > - > On some servers we're dealing with something in the region of a million or > more files. Indexing that many times takes upwards of 48 hours or more. While > the script is now fairly stable and fault tolerant, that is still a pretty