Hi,

I made some benchmarks for bulk indexing in order to compare performances
and ressources usage for NRT versus TLOG replica.

Environnent :
* Solrcloud with 4 Solr nodes (8 Gb RAM, 4 Gb Heap)
* 1 collection with 2 shards x 2 replicas (all NRT or all TLOG)
* 1 core per Solr Server

Indexing of a 10.000.000 documents in one json file with bin/post script

If I compare NRT vs TLOG indexing, I see :

For collection created with all replicas as NRT

* Indexing time : 22 minutes
* GC times : identical on all nodes
* GC count : identical on all nodes
* Heap size : identical on all nodes
* CPU Load / CPU usage : identical on all nodes

For collection created with all replicas as TLOG

* Indexing time : 34 minutes
* GC times : identical on all nodes
* GC count : identical on all nodes
* Heap size : identical on all nodes
* CPU Load / CPU usage : identical on NRT leaders, divide by 4 on TLOG not
leaders


The conclusion seems to be that by using TLOG :

* You save CPU resources on non leaders nodes at index time
* The JVM Heap and GC are the same
* Indexing performance ares really less with TLOG

I am disappointed in TLOG mode by very slower indexing time and by JVM Heap
/ GC.

Are these results conform to what we could expect ?
What can explain bad batch indexing performances in TLOG mode ?

I have Grafana graph for all these metrics during tests.

Rergards.

Dominique

Reply via email to