On 1/8/2015 4:37 AM, Bram Van Dam wrote: > Hmm. That is a good point. I wonder if there's some kind of middle > ground here? Something that lets me send an update (or new document) to > an arbitrary node/shard but which is still routed according to my > specific requirements? Maybe this can already be achieved by messing > with the routing?
<snip> > That's fine. We have a lot of query (pre-)processing outside of Solr. > It's no problem for us to send a couple of queries to a couple of shards > and aggregate the result ourselves. It would, of course, be nice if > everything worked in distributed mode, but at least for us it's not an > issue. This is a side effect of our complex reporting requirements -- we > do aggregation, filtering and other magic on data that is partially in > Solr and partially elsewhere. SolrCloud, when you do fully automatic document routing, does handle everything for you. You can query any node and send updates to any node, and they will end up in the right place. There is currently a strong caveat: Indexing performance sucks when updates are initially sent to the wrong node. The performance hit is far larger than we expected it to be, so there is an issue in Jira to try and make that better. No visible work has been done on the issue yet: https://issues.apache.org/jira/browse/SOLR-6717 The Java client (SolrJ, specifically CloudSolrServer) sends all updates to the correct nodes, because it can access the clusterstate and knows where updates need to go and where the shard leaders are. > This is a very good point. But I don't think SPLITSHARD is the magical > answer here. If you have N shards on N boxes, and they are all getting > nearly "full" and you decide to split one and move half to a new box, > you'll end up with N-2 nearly full boxes and 2 half-full boxes. What > happens if the disks fill up further? Do I have to split each shard? > That sounds pretty nightmareish! Planning ahead for growth is critical with SolrCloud, but there is something you can do if you discover that you need to radically re-shard: Create a whole new collection with the number of shards you want, likely using the original set of Solr servers plus some new ones. Rebuild the index into that collection. Delete the old collection, and create a collection alias pointing the original name at the new collection. The alias will work for both queries and updates. Thanks, Shawn