Re: replication and HDFS

Erick Erickson Mon, 31 Aug 2015 13:28:24 -0700

Yes, No, Maybe.

bq; Specifically the performance we want to increase is time to facet
data, time to cluster data and search time

Well, that about covers everything ;)

You cannot talk about this without also taking about cache warming. Given your
setup, I'm guessing you have very few searches on the same Solr
searcher. Every time
you commit (hard with openSearcher=true or soft), you get a new searcher and
your top-level caches are  thrown away. The next request in will not
have any benefit
from the caches unless you've also done autowarming, look at the
counts for filterCache,
queryResultsCache and the newSearch and firstSearcher events.

So talking about significantly increasing cache size is premature until you know
you _use_ the caches.

And don't go wild with the autowarm counts for your caches, start
quite low in the
20-30 range IMO.

You'll particularly want to make newSearcher searches that exercise
your faceting and
reference all the fields you care about at least once.

Best,
Erick

On Mon, Aug 31, 2015 at 12:41 PM, Joseph Obernberger
<j...@lovehorsepower.com> wrote:
> Thank you Erick.  What about cache size?  If we add replicas to our cluster
> and each replica has nGBytes of RAM allocated for HDFS caching, would that
> help performance?  Specifically the performance we want to increase is time
> to facet data, time to cluster data and search time.  While we index a lot
> of data (~4 million docs per day), we do not perform that many searches of
> the data (~250 searches per day).
>
> -Joe
>
> On 8/20/2015 4:21 PM, Erick Erickson wrote:
>>
>> Yes. Maybe. It Depends (tm).
>>
>> Details matter (tm).
>>
>> If you're firing just a few QPS at the system, then improved
>> throughput by adding replicas is unlikely. OTOH, if you're firing lots
>> of simultaneous queries at Solr and are pegging the processors, then
>> adding replication will increase aggregate QPS.
>>
>> If your soft commit interval is very short and you're not doing proper
>> warming, it won't help at all in all probability.
>>
>> Replication in Solr is about increasing the number of instances
>> available to serve queries. The two types of replication (HDFS or
>> Solr) are really orthogonal, the first is about data integrity and the
>> second is about increasing the number of Solr nodes available to
>> service queries.
>>
>> Best,
>> Erick
>>
>> On Thu, Aug 20, 2015 at 9:23 AM, Joseph Obernberger
>> <j...@lovehorsepower.com> wrote:
>>>
>>> Hi - we currently have a multi-shard setup running solr cloud without
>>> replication running on top of HDFS.  Does it make sense to use
>>> replication
>>> when using HDFS?  Will we expect to see a performance increase in
>>> searches?
>>> Thank you!
>>>
>>> -Joe
>
>

Re: replication and HDFS

Reply via email to