Indexing is CPU bound. If you have enough RAM, SSD disks, and enough client threads, you should be able to drive CPU to over 90%.
Start with two client threads per CPU. That allows one thread to be sending data over the network while another is waiting for Solr to process the batch. A couple of years ago, I was indexing a million docs per minute into a Solr Cloud cluster. I think that was four shards on instances with 16 CPUs, so it was 64 CPUs available for indexing. That was with Java 8, G1GC, and 8 GB of heap. Your document are averaging about 50 kbytes, which is pretty big. Our documents average about 3.5 kbytes. A lot of the indexing work is handling the text, so those larger documents would be at least 10X slower than ours. Are you doing atomic updates? That would slow things down a lot. If you want to use G1GC, use the configuration I sent earlier. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 19, 2019, at 7:05 AM, Bernd Fehling <bernd.fehl...@uni-bielefeld.de> > wrote: > > Isn't there somthing about largePageTables which must be enabled > in JAVA and also supported by OS for such huge heaps? > > Just a guess. > > Am 19.03.19 um 15:01 schrieb Jörn Franke: >> It could be an issue with jdk 8 that may not be suitable for such large >> heaps. Have more nodes with smaller heaps (eg 31 gb) >>> Am 18.03.2019 um 11:47 schrieb Aaron Yingcai Sun <y...@vizrt.com>: >>> >>> Hello, Solr! >>> >>> >>> We are having some performance issue when try to send documents for solr to >>> index. The repose time is very slow and unpredictable some time. >>> >>> >>> Solr server is running on a quit powerful server, 32 cpus, 400GB RAM, while >>> 300 GB is reserved for solr, while this happening, cpu usage is around 30%, >>> mem usage is 34%. io also look ok according to iotop. SSD disk. >>> >>> >>> Our application send 100 documents to solr per request, json encoded. the >>> size is around 5M each time. some times the response time is under 1 >>> seconds, some times could be 300 seconds, the slow response happens very >>> often. >>> >>> >>> "Soft AutoCommit: disabled", "Hard AutoCommit: if uncommited for 3600000ms; >>> if 1000000 uncommited docs" >>> >>> >>> There are around 100 clients sending those documents at the same time, but >>> each for the client is blocking call which wait the http response then send >>> the next one. >>> >>> >>> I tried to make the number of documents smaller in one request, such as 20, >>> but still I see slow response time to time, like 80 seconds. >>> >>> >>> Would you help to give some hint how improve the response time? solr does >>> not seems very loaded, there must be a way to make the response faster. >>> >>> >>> BRs >>> >>> //Aaron >>> >>> >>>