Re: NRT vs TLOG bulk indexing performances

Shawn Heisey Fri, 25 Oct 2019 04:55:25 -0700

On 10/25/2019 1:16 AM, Dominique Bejean wrote:

For collection created with all replicas as NRT


* Indexing time : 22 minutes


<snip>

For collection created with all replicas as TLOG

* Indexing time : 34 minutes

NRT indexes simultaneously on all replicas. So when indexing is done onone, it is also done on all the others.

PULL and non-leader TLOG replicas must copy the index from the leader.The leader will do the indexing and the other replicas will copy thecompleted index from the leader. This takes time. If the index islarge, it can take a LOT of time, especially if the disks or network areslow. TLOG replicas can become leader and PULL replicas cannot.

What I would do personally is set two replicas for each shard to TLOGand all the rest to PULL. When a TLOG replica is acting as leader, itwill function exactly like an NRT replica.

The conclusion seems to be that by using TLOG :

* You save CPU resources on non leaders nodes at index time
* The JVM Heap and GC are the same
* Indexing performance ares really less with TLOG

Java works in such a way that it will always eventually allocate and usethe entire max heap that it is allowed. It is not always possible todetermine how much heap is truly needed, though analyzing large GC logswill sometimes reveal that info.

Non-leader replicas will probably require less heap if they are TLOG orPULL. I cannot say how much less, that will be something that has to bedetermined. Those replicas will also use less CPU.

With newer Solr versions, you can ask SolrCloud to prefer PULL replicasfor querying, so queries will be targeted to those replicas, unless theyall go down, in which case it will go to non-preferred replica types. Ido not know how to do this, I only know that it is possible.


Thanks,
Shawn

Re: NRT vs TLOG bulk indexing performances

Reply via email to