(this is long, just trying to be thorough) I'm working on upgrading from Solr 7.3 to Solr 8.7 and I am seeing a significant drop in indexing throughput during a full index reload - from ~1300 documents per second to ~450 documents/sec
Background: VM hosts (these are configured identically): - Our Solr clusters run in a virtualized environment. - Each Virtual Machine has 8 CPUs and 64Gb RAM. - The hosts are organized into 2 4-host clusters - one for 7.3 and one for 8.7. - Each cluster has its own 3 VM Zookeeper cluster (running the version that was current at the time of install). JVM: - all the JVMs are set-up with -Xms28G and -Xmx28Gb - the Solr 8.7 cluster is running with the default JVM settings (i.e., as configured by the Solr install script) **other than memory** - the Solr 7.3 cluster was configured awhile ago, but I'm fairly sure it's running pretty vanilla JVM settings (if not outright default) **other than memory** - the most obvious difference between the JVM settings for the environments is the garbage collector: ConcurrentMarkSweep for 7.3 and G1GC for 8.7 - both run Java 1.8, but 7.3 is running HotSpot and 8.7 is running OpenJDK (and a bit newer) Solr: - 1 shard, 1 replica per host - all NRT (both clusters) - Both the Solr 7.3 and 8.7 clusters are running the same schema - with one exception, only the most minimal changes were made to the default Solr 8.7 solrconfig.xml to keep it in-line with the 7.3 solrconfig (mostly around Cache settings) - the exception: running with luceneMatchVersion=7.3.0 Data Loading: - Data is loaded by a completely separate VM running a custom Java process that collects data from source and generates SolrInputDocuments from that source and sends it via CloudSolrClient - this Java process is multi-threaded with an upper-limit on the number of simultaneous threads sending documents and the size of the document payload - we are loading ~10 million documents during a full-reload - this is a product catalog, so the documents actually represent data about SKUs we sell (and they aren't particularly large, though the size is variable) - the existing Solr 7.3 cluster has a full-reload time of around 2.5 hours, the Solr 8.7 cluster requires around 6.25 hours Efforts so far: - checked network speed from the VM generating updates (it's the same server for both 7.3 and 8.7) and the clusters - performance to the 8.7 cluster is actually better - as best as possible, controlling for VM topology (i.e., distribution of the VMs across hosts within the VM cluster) - real-time JVM monitoring with VisualVM during indexing on 8.7 cluster - looked nice - same as I've always seen for the 7.3 cluster - checked the GC logs with GCEasy - reported as healthy Thoughts/questions/considerations: - could running an older LuceneMatchVersion affect indexing performance? - still a little concerned that the VM topology is affecting things (our VM-crew split the 7.3 cluster across VM clusters in an attempt to improve resiliency in case VM cluster failure and that's not something we can or want to replicate) - that said, the performance difference is consistent with what I've seen in our QA environment and that environment has a less even spread of VMs across hosts (e.g., multiple Solr VMs on the same VM host) - we have a couple of custom tokenizers and tokenFilters - those were rebuilt using the 8.7.0 versions of solr-core and apache-core - they're pretty simple and I'm not terribly concerned about this, but it is non-standard - query performance is comparable between 7.3 and 8.7 and documents returned are reasonably consistent (few really big differences, mostly just scoring differences that affect ordering) - after watching the 8.7 JVMs in real-time during indexing, I decided to drop the memory to -Xms20g and -Xmx20g - this had no effect on indexing speed (or GC impacts) - so, I think it's at least safe to say this is not memory-bound Final question: is it simply typical to see significantly worse indexing performance on 8.7 than 7.3? Any suggestions on where to look would be highly appreciated. Thanks, Ron