Re: SolrCloud Performance for High Query Volume

Niran Fajemisin Fri, 18 Jan 2013 15:25:29 -0800

Hi Otis,

Thanks for the response.


The primary difference in the schema and solrconfig are the settings that are 
needed/required for 4.0 compatibility; so things like version field, schema 
version number, auto commit settings etc. 

A quick note on our SolrCloud topology: we have 2 Shards with one replica per 
shard. So essentially 2 servers per shard, which makes up the 4 servers that I 
referred to below. (Sorry for not being specific)

As for the comment about the RAM, given our SolrCloud setup we felt that we 
wouldn't need an equal amount of memory given the size of the shard would be 
roughly 50% of the entire document collection...at least that was our 
rationale. We might be totally off-base here.

Our index will contain about 175 million documents, with each document having 
about 65 fields. The actual physical size of the index is estimated at about 
75GB.

Almost 90-95% of the queries executed against the index are filter queries, as 
the site is based on faceted searches. Hence I'll say that the queries will be 
diverse, as it's based on various user driven permutations. 

We're going to need to work with our infrastructure team to determine the disk 
IO utilization between the 3.6 and 4.0 environments.

Hopefully that all makes sense.

Any immediate thoughts on any of this?

Thanks as usual.

-Niran 




>________________________________
> From: Otis Gospodnetic <otis.gospodne...@gmail.com>
>To: solr-user@lucene.apache.org; Niran Fajemisin <afa...@yahoo.com> 
>Sent: Thursday, January 17, 2013 10:12 AM
>Subject: Re: SolrCloud Performance for High Query Volume
> 
>
>Hello Niran,
>
>
>> Now with the roughly the "same" schema and solrconfig configuration
>
>
>Can you be more specific about what was changed and how?
>
>
>> * 4 Solr server instances each with 4 CPUs (each 6 cores, 2.67GHz), 8GB of 
>>RAM and 150GB HDD
>
>
>That's less RAM than before.  Could it be that this causes more disk IO 
>because the index is not as well cached?
>
>
>Note that you are comparing a non-real-time master-slave setup with a 
>real-time SolrCloud setup (with an unknown number of shards, replicas, etc.)
>
>
>SSDs will help if there is a lot of disk IO (i.e. if indices are big, queries 
>diverse, and free memory scarce).  I'd start by looking at all system-level 
>indicators and metrics. SPM for Solr may help: 
>http://sematext.com/spm/solr-performance-monitoring/index.html .  Maybe you 
>can show us disk IO graphs for the old cluster vs. new cluster?
>
>
>
>Otis
>
>--
>
>Solr & ElasticSearch Support
>http://sematext.com/
>
>
>
>
>
>
>
>
>
>On Tue, Jan 15, 2013 at 11:54 AM, Niran Fajemisin <afa...@yahoo.com> wrote:
>
>Hi all,
>>
>>I'm currently in the process of doing some performance testing in 
>>preparations for upgrading from Solr 3.6.1 to Solr 4.0. (We're badly in need 
>>of NRT functionality)
>>
>>Our existing deployment is not a typical deployment for Solr, as we use it to 
>>search and facet on financial data such as accounts, positions and 
>>transactions records. To make matters worse, each request could potentially 
>>return upwards of 50,000 or more records from the index. As I said, it's not 
>>an ideal use case for Solr but this is the system that is in place and it 
>>really can't be changed at this point. With this defined use case, our 
>>current 3.6.1 deployment is able to scale to about 1500 queries per minute, 
>>with an average response time in the low 100-200ms. Note that this time 
>>includes the query time and the transport time (time to stream all the 
>>documents to the calling services). At the 50,000 document mark, we're 
>>getting about 1.6-2 sec. response time. The client is willing to live with 
>>this as these type of requests are not very frequent.
>>
>>Our hardware configuration on the 3.6.1 environment is as follows:
>>        * 1 Master Server for indexing with 2 CPU (each 6 cores, 2.67GHz)  
>>4GB of RAM and 150GB HDD
>>        * 2 Slaves Servers for query only each with 2 of CPUs (each 6 cores, 
>>2.67GHz) with 12GB of RAM each and same HDD space. (mechanical drive)
>>Each of the servers are virtual servers in a VMWare environment. 
>>
>>Now with the roughly the "same" schema and solrconfig configuration, the 
>>performance on Solr 4.0 is quite bad. Running just 500 queries per minute our 
>>query performance degrades to almost 2 minute response times in some cases. 
>>The average is about 40-50 sec. response time. Note that the index at the 
>>moment is only a fraction of the size of the existing environment (about 
>>1/8th the size). 
>>
>>The hardware setup for the SolrCloud deployment is as follows:
>>        * 4 Solr server instances each with 4 CPUs (each 6 cores, 2.67GHz), 
>>8GB of RAM and 150GB HDD
>>
>>        * 3 ZooKeeper server instances. We are using each Solr server 
>>instance to run 1 ZK instance, with the 4th server not running a ZK server.
>>We haven't observed any issues with memory utilization. Additionally the 
>>virtual servers are co-located. We're wondering if upgrading to Solid State 
>>Drives would improve performance significantly?
>>
>>Are there any other pointers or configuration changes that we can make to 
>>help bring down our query times? Any tips will be greatly appreciated.
>>
>>Thanks all!
>
>
>

Re: SolrCloud Performance for High Query Volume

Reply via email to