Re: Adding bias to Distributed search feature?

Andrzej Bialecki Mon, 15 Sep 2008 23:38:11 -0700

Lance Norskog wrote:

Thanks!  We made variants of this and a couple of other files.


As to why we have the same document in different shards with different
contents: once you hit a certain index size and ingest rate, it is easiest
to create a series of indexes and leave the older ones alone. In the future,
please consider this as a legitimate use case instead of simply a mistake.


You may be interested in implementing something like this:

"Compact Features for Detection of Near-Duplicates in DistributedRetrieval", Yaniv Bernstein, Milad Shokouhi, and Justin Zobel

It sounds straightforward, and relieves your from the need tode-duplicate your collection.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Adding bias to Distributed search feature?

Reply via email to