On 6/29/2015 2:48 PM, Reitzel, Charles wrote:
> I take your point about shards and segments being different things.  I 
> understand that the hash ranges per segment are not kept in ZK.   I guess I 
> wish they were.
>
> In this regard, I liked Mongodb, uses a 2-level sharding scheme.   Each shard 
> manages a list of  "chunks", each has its own hash range which is kept in the 
> cluster state.   If data needs to be balanced across nodes, it works at the 
> chunk level.  No record/doc level I/O is necessary.   Much more targeted and 
> only the data that needs to move is touched.  Solr does most things better 
> than Mongo, imo.  But this is one area where the Mongo got it right.

Segment detail would not only lead to a data explosion in the
clusterstate, it would be crossing abstraction boundaries, and would
potentially require updating the clusterstate just because a single
document was inserted into the index.  That one tiny update could (and
probably would) create a new segment on one shard.  Due to the way
SolrCloud replicates data during normal operation, every replica for a
given shard might have a different set of segments, which means segments
would need to be tracked at the replica level, not the shard level.

Also, Solr cannot control which hash ranges end up in each segment. 
Solr only knows about the index as a whole ... implementation details
like segments are left entirely up to Lucene, and although I admit to
not knowing Lucene internals very well, I don't think Lucene offers any
way to control that either.  You mention that MongoDB dictates which
hash ranges end up in each chunk.  That implies that MongoDB can control
each chunk.  If we move the analogy to Solr, it breaks down because Solr
cannot control segments.  Although Solr does have several configuration
knobs that affect how segments are created, those configurations are
simply passed through to Lucene, Solr itself does not use that information.

Thanks,
Shawn

Reply via email to