Stephan and all, I am evaluating this like you are. You may want to check http://www.tomkleinpeter.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/. I would appreciate if others can shed some light on this, too.
Bests, James On Fri, Sep 10, 2010 at 6:07 AM, Stephan Raemy <stephan.ra...@gmail.com>wrote: > Hi solr-cloud users, > > I'm currently setting up a solr-cloud/zookeeper instance and so far, > everything works out fine. I downloaded the source from the cloud branch > yesterday and build it from source. > > I've got 10 shards distributed across 4 servers and a zookeeper instance. > Searching documents with the flag "distrib=true" works out and it returns > the expected result. > > But here comes the tricky question. I will add new documents every day and > therefore, I'd like to balance my shards to keep the system speedy. The > Wiki says that one can calculate the hash of a document id and then > determine the corresponding shard. But IMHO, this does not take into > account > that the cloud may become bigger or shrink over time by adding or removing > shards. Obviously adding has a higher priority since one wants to reduce > the shard size to improve the response time of distributed searches. > > When reading through the Wikis and existing documentation, it is still > unclear to me how to do the following operations: > - Modify/Delete a document stored in the cloud without having to store the > document:shard mapping information outside of the cloud. I would expect > something like shard attribute on each doc in the SOLR query result > (activated/deactivated by a flag), so that i can query the SOLR cloud for > a > doc and then delete it on the specific shard. > - Balance a cloud when adding/removing new shards or just balance them > after > many deletions. > > Of course there are solutions to this, but at the end, I'd love to have a > true cloud where i do not have to worry about shard performance > optimization. > Hints are greatly appreciated. > > Cheers, > Stephan >