Hi solr-cloud users,

I'm currently setting up a solr-cloud/zookeeper instance and so far,
everything works out fine. I downloaded the source from the cloud branch
yesterday and build it from source.

I've got 10 shards distributed across 4 servers and a zookeeper instance.
Searching documents with the flag "distrib=true" works out and it returns
the expected result.

But here comes the tricky question. I will add new documents every day and
therefore, I'd like to balance my shards to keep the system speedy. The
Wiki says that one can calculate the hash of a document id and then
determine the corresponding shard. But IMHO, this does not take into account
that the cloud may become bigger or shrink over time by adding or removing
shards. Obviously adding has a higher priority since one wants to reduce
the shard size to improve the response time of distributed searches.

When reading through the Wikis and existing documentation, it is still
unclear to me how to do the following operations:
- Modify/Delete a document stored in the cloud without having to store the
  document:shard mapping information outside of the cloud. I would expect
  something like shard attribute on each doc in the SOLR query result
  (activated/deactivated by a flag), so that i can query the SOLR cloud for
a
  doc and then delete it on the specific shard.
- Balance a cloud when adding/removing new shards or just balance them after
  many deletions.

Of course there are solutions to this, but at the end, I'd love to have a
true cloud where i do not have to worry about shard performance
optimization.
Hints are greatly appreciated.

Cheers,
Stephan

Reply via email to