Hi Matt, See: http://search-lucene.com/?q=query+routing&fc_project=Solr https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Thu, Feb 12, 2015 at 2:09 PM, Matt Kuiper <matt.kui...@issinc.com> wrote: > Otis, > > Thanks for your reply. I see your point about too many shards and search > efficiency. I also agree that I need to get a better handle on customer > requirements and expected loads. > > Initially I figured that with the shard splitting option, I would need to > double my Solr nodes every time I split (as I would want to split every > shard within the collection). Where actually only the number of shards > would double, and then I would have the opportunity to rebalance the shards > over the existing Solr nodes plus a number of new nodes that make sense at > the time. This may be preferable to defining many micro shards up front. > > The time-base collections may be an option for this project. I am not > familiar with query routing, can you point me to any documentation on how > this might be implemented? > > Thanks, > Matt > > -----Original Message----- > From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] > Sent: Wednesday, February 11, 2015 9:13 PM > To: solr-user@lucene.apache.org > Subject: Re: How to make SolrCloud more elastic > > Hi Matt, > > You could create extra shards up front, but if your queries are fanned out > to all of them, you can run into situations where there are too many > concurrent queries per node causing lots of content switching and > ultimately being less efficient than if you had fewer shards. So while > this is an approach to take, I'd personally first try to run tests to see > how much a single node can handle in terms of volume, expected query rates, > and target latency, and then use monitoring/alerting/whatever-helps tools > to keep an eye on the cluster so that when you start approaching the target > limits you are ready with additional nodes and shard splitting if needed. > > Of course, if your data and queries are such that newer documents are > queries more, you should look into time-based collections... and if your > queries can only query a subset of data you should look into query routing. > > Otis > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > > On Wed, Feb 11, 2015 at 3:32 PM, Matt Kuiper <matt.kui...@issinc.com> > wrote: > > > I am starting a new project and one of the requirements is that Solr > > must scale to handle increasing load (both search performance and index > size). > > > > My understanding is that one way to address search performance is by > > adding more replicas. > > > > I am more concerned about handling a growing index size. I have > > already been given some good input on this topic and am considering a > > shard splitting approach, but am more focused on a rebalancing > > approach that includes defining many shards up front and then moving > > these existing shards on to new Solr servers as needed. Plan to > > experiment with this approach first. > > > > Before I got too deep, I wondered if anyone has any tips or warnings > > on these approaches, or has scaled Solr in a different manner. > > > > Thanks, > > Matt > > >