Thank you Erick. What about cache size? If we add replicas to our cluster and each replica has nGBytes of RAM allocated for HDFS caching, would that help performance? Specifically the performance we want to increase is time to facet data, time to cluster data and search time. While we index a lot of data (~4 million docs per day), we do not perform that many searches of the data (~250 searches per day).

-Joe

On 8/20/2015 4:21 PM, Erick Erickson wrote:
Yes. Maybe. It Depends (tm).

Details matter (tm).

If you're firing just a few QPS at the system, then improved
throughput by adding replicas is unlikely. OTOH, if you're firing lots
of simultaneous queries at Solr and are pegging the processors, then
adding replication will increase aggregate QPS.

If your soft commit interval is very short and you're not doing proper
warming, it won't help at all in all probability.

Replication in Solr is about increasing the number of instances
available to serve queries. The two types of replication (HDFS or
Solr) are really orthogonal, the first is about data integrity and the
second is about increasing the number of Solr nodes available to
service queries.

Best,
Erick

On Thu, Aug 20, 2015 at 9:23 AM, Joseph Obernberger
<j...@lovehorsepower.com> wrote:
Hi - we currently have a multi-shard setup running solr cloud without
replication running on top of HDFS.  Does it make sense to use replication
when using HDFS?  Will we expect to see a performance increase in searches?
Thank you!

-Joe

Reply via email to