Re: Indexing file system contents

2013-10-04 Thread Alexandre Rafalovitch
No direct help but a bunch of related random thoughts: 1) How are you running Tika? As a jar loading from scratch every time? Tika can also run in a server mode where it listens to a network socket. You send the file, it sends the extract back. Might be faster. 2) Deleting old stuff. You can inde

Re: Indexing file system contents

2013-10-04 Thread Shawn Heisey
On 10/3/2013 11:29 PM, Sadler, Anthony wrote: > Time: > - > On some servers we're dealing with something in the region of a million or > more files. Indexing that many times takes upwards of 48 hours or more. While > the script is now fairly stable and fault tolerant, that is still a pretty

Indexing file system contents

2013-10-03 Thread Sadler, Anthony
Hi all: I've had a quick look through the archives but am struggling to find a decent search query (a bad start to my solr career), so apologies if this has been asked multiple times before, as I'm sure it has. We've got several windows file servers across several locations and we'd like to in