Re: Solrcloud performance issues

Otis Gospodnetic Thu, 12 Feb 2015 10:38:47 -0800

Hi,

Did you say you have 150 servers in this cluster?  And 10 shards for just
90M docs?  If so, that 150 hosts sounds like too much for all other numbers
I see here.  I'd love to see some metrics here.  e.g. what happens with
disk IO around those commits?  How about GC time/size info?  Are JVM memory
pools full-ish and is the CPU jumping like crazy?  Can you share more info
to give us a more complete picture of your system? SPM for Solr
<http://sematext.com/spm/> will help if you don't already capture these
types of things.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Feb 12, 2015 at 11:07 AM, Vijay Sekhri <sekhrivi...@gmail.com>
wrote:

> Hi Erick,
> We have following configuration of our solr cloud
>
>    1. 10 Shards
>    2. 15 replicas per shard
>    3. 9 GB of index size per shard
>    4. a total of around 90 mil documents
>    5. 2 collection viz search1 serving live traffic and search 2 for
>    indexing. We swap collection when indexing finishes
>    6. On 150 hosts we have 2 JVMs running one for search1 collection and
>    other for search2 collection
>    7. Each jvm has 12 GB of heap assigned to it while the host has 50GB in
>    total
>    8. Each host has 16 processors
>    9. Linux XXXXXXX 2.6.32-431.5.1.el6.x86_64 #1 SMP Wed Feb 12 00:41:43
>    UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>    10. We have two ways to index data.
>    1. Bulk indexing . All 90 million docs pumped in from 14 parallel
>       process (on 14 different client hosts). This is done on
> collection that is
>       not serving live traffic
>       2.  Incremental indexing . Only delta changes (Range from 100K to 5
>       Mil) every two hours. This is done on collection also serving live
> traffic
>    11. The request per second count on live collection is around 300 TPS
>    12. Hard commit setting is every 30 second with open searcher false and
>    soft commit setting is every 15 minutes . We have tried a lot of
> different
>    setting here BTW.
>
>
>
>
> Now we have two issues with indexing
> 1) Solr just could not keep up with the bulk indexing when replicas are
> also active. We have concluded this by changing the number of replicas to
> just 2 , to 4 and then to 15. When the number of replicas increases the
> bulk indexing time increase almost exponentially
> We seem to have encountered the same issue reported here
> https://issues.apache.org/jira/browse/SOLR-6816
> It gets to a point that even to index 100 docs the solr cluster would take
> 300 second. It would start of indexing 100 docs in 55 millisecond and
> slowly increase over time and within hour and a half just could not keep
> up. We have a workaround for this and i.e we stop all the replicas , do the
> bulk indexing and bring all the replicas up one by one . This sort of
> defeats the purpose of solr cloud but we can still work with this
> workaround. We can do this because , bulk indexing happen on the collection
> that is not serving live traffic. However we would love to have a solution
> from the solr cloud itself like ask it to stop replication and start via an
> API at the end of indexing.
>
> 2) This issues is related to soft commit with incremental indexing . When
> we do incremental indexing, it is done on the same collection serving live
> traffic with 300 request per second throughput.  Everything is fine except
> whenever the soft commit happens. Each time soft commit (autosoftcommit in
> sorlconfig.xml) happens which BTW happens almost at the same time
> throughout the cluster , there is a spike in the response times and
> throughput decreases almost to 150 tps. The spike continues for 2 minutes
> and then it happens again at the exact interval when the soft commit
> happens. We have monitored the logs and found a direct co relation when the
> soft commit happens and when the response time tanks.
>
> Now the latter issue is quite disturbing , because it is serving live
> traffic and we cannot sustain these periodic degradation. We have played
> around with different soft commit setting . Interval ranging from 2 minutes
> to 30 minutes . Auto warming half cache  , auto warming full cache, auto
> warming only 10 %. Doing warm up queries on every new searcher , doing NONE
> warm up queries on every new searching and all the different setting yields
> the same results . As and when soft commit happens the response time tanks
> and throughput deceases. The difference is almost 50 % in response times
> and 50 % in throughput
>
>
> Our workaround for this solution is to also do incremental delta indexing
> on the collection not serving live traffic and swap when it is done. As you
> can see that this also defeats the purpose of solr cloud . We cannot do
> bulk indexing because replicas cannot keeps up and we cannot do incremental
> indexing because of soft commit performance.
>
> Is there a way to make the cluster not do soft commit all at the same time
> or is there a way to make soft commit not cause this degradation ?
> We are open to any ideas at this time now.
>
>
>
>
>
>
> --
> *********************************************
> Vijay Sekhri
> *********************************************
>

Re: Solrcloud performance issues

Reply via email to