Thank you much Fergus, I was considering implementing a database which would hold a path name and an MD5 sum of each file.
Then as a part of Solr indexing, one could check against the DB if a file path exists, if Yes, then compare MD5 and only index if different. Regards, Veselin K On Tue, Apr 07, 2009 at 09:01:31AM +0100, Fergus McMenemie wrote: > Veselin, > > Well, as far as solr is concerned, there is two issues here:- > > 1) To stop the same document ending up in the indexes twice, use the document > pathname as the unique ID. Then if you do index it twice, the previous > index > information will be discarded. Not very efficient, but it may be tolerable. > IMHO using pathname as the unique ID is often best practice. > > 2) To stop a document even being submitted to solr. You need to implement some > middle ware that either performs a search/lookup using a documents pathname > to see if it is already indexed. Or, after examining timestampts, only > submits > documents which have changed since the last folder scan. > > Fergus. > >Hello Paul, > >I'm indexing with "curl http://localhost... -F myfi...@file.pdf" > > > >Regards, > >Veselin K > > > > > >On Mon, Apr 06, 2009 at 02:56:20PM +0530, Noble Paul ????????????????????? > >?????????????????? wrote: > >> how are you indexing? > >> > >> On Mon, Apr 6, 2009 at 2:54 PM, Veselin Kantsev > >> <vese...@campbell-lange.net> wrote: > >> > Hello, > >> > apologies for the basic question. > >> > > >> > How can I avoid double indexing files? > >> > > >> > In case all my files are in one folder which is scanned frequently, is > >> > there a Solr feature of checking and skipping a file if it has already > >> > been indexed > >> > and not changed since? > >> > > >> > > >> > Thank you. > >> > > >> > Regards, > >> > Veselin K > >> > > >> > > >> > >> > >> > >> -- > >> --Noble Paul > > -- > > =============================================================== > Fergus McMenemie Email:fer...@twig.me.uk > Techmore Ltd Phone:(UK) 07721 376021 > > Unix/Mac/Intranets Analyst Programmer > ===============================================================