Thank you much Fergus,

I was considering implementing a database which would hold a path name
and an MD5 sum of each file.

Then as a part of Solr indexing, one could check against the DB if a
file path exists, if Yes, then compare MD5 and only index if different.


Regards,
Veselin K

On Tue, Apr 07, 2009 at 09:01:31AM +0100, Fergus McMenemie wrote:
> Veselin,
> 
> Well, as far as solr is concerned, there is two issues here:-
> 
> 1) To stop the same document ending up in the indexes twice, use the document
>    pathname as the unique ID. Then if you do index it twice, the previous 
> index
>    information will be discarded. Not very efficient, but it may be tolerable.
>    IMHO using pathname as the unique ID is often best practice.
> 
> 2) To stop a document even being submitted to solr. You need to implement some
>    middle ware that either performs a search/lookup using a documents pathname
>    to see if it is already indexed. Or, after examining timestampts, only 
> submits
>    documents which have changed since the last folder scan.
> 
> Fergus.
> >Hello Paul,
> >I'm indexing with "curl http://localhost... -F myfi...@file.pdf" 
> >
> >Regards,
> >Veselin K
> >
> >
> >On Mon, Apr 06, 2009 at 02:56:20PM +0530, Noble Paul ?????????????????????  
> >?????????????????? wrote:
> >> how are you indexing?
> >> 
> >> On Mon, Apr 6, 2009 at 2:54 PM, Veselin Kantsev
> >> <vese...@campbell-lange.net> wrote:
> >> > Hello,
> >> > apologies for the basic question.
> >> >
> >> > How can I avoid double indexing files?
> >> >
> >> > In case all my files are in one folder which is scanned frequently, is
> >> > there a Solr feature of checking and skipping a file if it has already 
> >> > been indexed
> >> > and not changed since?
> >> >
> >> >
> >> > Thank you.
> >> >
> >> > Regards,
> >> > Veselin K
> >> >
> >> >
> >> 
> >> 
> >> 
> >> -- 
> >> --Noble Paul
> 
> -- 
> 
> ===============================================================
> Fergus McMenemie               Email:fer...@twig.me.uk
> Techmore Ltd                   Phone:(UK) 07721 376021
> 
> Unix/Mac/Intranets             Analyst Programmer
> ===============================================================

Reply via email to