On 2010-09-06 16:41, Yonik Seeley wrote:
On Mon, Sep 6, 2010 at 10:18 AM, MitchK<mitc...@web.de> wrote:
[...consistent hashing...]
But it doesn't solve the problem at all, correct me if I am wrong, but: If
you add a new server, let's call him IP3-1, and IP3-1 is nearer to the
current ressource X, than doc x will be indexed at IP3-1 - even if IP2-1
holds the older version.
Am I right?
Right. You still need code to handle migration.
Consistent hashing is a way for everyone to be able to agree on the
mapping, and for the mapping to change incrementally. i.e. you add a
node and it only changes the docid->node mapping of a limited percent
of the mappings, rather than changing the mappings of potentially
everything, as a simple MOD would do.
Another strategy to avoid excessive reindexing is to keep splitting the
largest shards, and then your mapping becomes a regular MOD plus a list
of these additional splits. Really, there's an infinite number of ways
you could implement this...
For SolrCloud, I don't think we'll end up using consistent hashing -
we don't need it (although some of the concepts may still be useful).
I imagine there could be situations where a simple MOD won't do ;) so I
think it would be good to hide this strategy behind an
interface/abstract class. It costs nothing, and gives you flexibility in
how you implement this mapping.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com