Re: How large is your solr index?

Shawn Heisey Thu, 08 Jan 2015 07:10:31 -0800

On 1/8/2015 4:37 AM, Bram Van Dam wrote:
> Hmm. That is a good point. I wonder if there's some kind of middle
> ground here? Something that lets me send an update (or new document) to
> an arbitrary node/shard but which is still routed according to my
> specific requirements? Maybe this can already be achieved by messing
> with the routing?


<snip>

> That's fine. We have a lot of query (pre-)processing outside of Solr.
> It's no problem for us to send a couple of queries to a couple of shards
> and aggregate the result ourselves. It would, of course, be nice if
> everything worked in distributed mode, but at least for us it's not an
> issue. This is a side effect of our complex reporting requirements -- we
> do aggregation, filtering and other magic on data that is partially in
> Solr and partially elsewhere.

SolrCloud, when you do fully automatic document routing, does handle
everything for you.  You can query any node and send updates to any
node, and they will end up in the right place.  There is currently a
strong caveat: Indexing performance sucks when updates are initially
sent to the wrong node.  The performance hit is far larger than we
expected it to be, so there is an issue in Jira to try and make that
better.  No visible work has been done on the issue yet:

https://issues.apache.org/jira/browse/SOLR-6717

The Java client (SolrJ, specifically CloudSolrServer) sends all updates
to the correct nodes, because it can access the clusterstate and knows
where updates need to go and where the shard leaders are.

> This is a very good point. But I don't think SPLITSHARD is the magical
> answer here. If you have N shards on N boxes, and they are all getting
> nearly "full" and you decide to split one and move half to a new box,
> you'll end up with N-2 nearly full boxes and 2 half-full boxes. What
> happens if the disks fill up further? Do I have to split each shard?
> That sounds pretty nightmareish!

Planning ahead for growth is critical with SolrCloud, but there is
something you can do if you discover that you need to radically
re-shard:  Create a whole new collection with the number of shards you
want, likely using the original set of Solr servers plus some new ones.
 Rebuild the index into that collection.  Delete the old collection, and
create a collection alias pointing the original name at the new
collection.  The alias will work for both queries and updates.

Thanks,
Shawn

Re: How large is your solr index?

Reply via email to