Schema for indexing PDF/Doc/XLS files

2009-04-05 Thread Veselin K
ve to source all values myself, except "file contents" and pass them to solr when indexing? Thank you much. Regards, Veselin K

How could I limit a specific field size ?

2009-04-06 Thread Veselin K
ch. Is there a way to specify such size limits per field or something similar? Thank you much. Regards, Veselin K

Re: How could I avoid reindexing same files?

2009-04-06 Thread Veselin K
Hello Paul, I'm indexing with "curl http://localhost... -F myfi...@file.pdf" Regards, Veselin K On Mon, Apr 06, 2009 at 02:56:20PM +0530, Noble Paul ? ?? wrote: > how are you indexing? > > On Mon, Apr 6, 2009 at 2:54 PM,

Re: How could I avoid reindexing same files?

2009-04-07 Thread Veselin K
Thank you much Fergus, I was considering implementing a database which would hold a path name and an MD5 sum of each file. Then as a part of Solr indexing, one could check against the DB if a file path exists, if Yes, then compare MD5 and only index if different. Regards, Veselin K On Tue

Re: How could I avoid reindexing same files?

2009-04-08 Thread Veselin K
Useful tip Erik, this will save a lot of hassle. Thank you much. Regards, Veselin K On Tue, Apr 07, 2009 at 11:29:38AM -0400, Erik Hatcher wrote: > Note that Solr (trunk, soon to be 1.4) has a duplicate detection feature > that may work for your need. See > http://wiki.apache