Hi solr-cloud users, I'm currently setting up a solr-cloud/zookeeper instance and so far, everything works out fine. I downloaded the source from the cloud branch yesterday and build it from source.
I've got 10 shards distributed across 4 servers and a zookeeper instance. Searching documents with the flag "distrib=true" works out and it returns the expected result. But here comes the tricky question. I will add new documents every day and therefore, I'd like to balance my shards to keep the system speedy. The Wiki says that one can calculate the hash of a document id and then determine the corresponding shard. But IMHO, this does not take into account that the cloud may become bigger or shrink over time by adding or removing shards. Obviously adding has a higher priority since one wants to reduce the shard size to improve the response time of distributed searches. When reading through the Wikis and existing documentation, it is still unclear to me how to do the following operations: - Modify/Delete a document stored in the cloud without having to store the document:shard mapping information outside of the cloud. I would expect something like shard attribute on each doc in the SOLR query result (activated/deactivated by a flag), so that i can query the SOLR cloud for a doc and then delete it on the specific shard. - Balance a cloud when adding/removing new shards or just balance them after many deletions. Of course there are solutions to this, but at the end, I'd love to have a true cloud where i do not have to worry about shard performance optimization. Hints are greatly appreciated. Cheers, Stephan