Hi David, In the example of SolrZkClient.ProcessWatchWithExecutor#process, it should be possible to use ExecutorService#submit and ExecutorService#execute methods interchangeably - because either method would run the task in the background. So I went ahead and replaced `submit` with `execute`, and ran the tests. The outcome was that 200+ tests broke, and reported stacktraces like below:
> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=68, name=zkCallback-23-thread-1, state=RUNNABLE, group=TGRP-TestConfigSetsAPI] > at __randomizedtesting.SeedInfo.seed([5ECEE82AA0B50470:5D74B82B43D74CFA]:0) > > Caused by: > org.apache.solr.common.SolrException: Error updating shard term for collection: newcollection > at __randomizedtesting.SeedInfo.seed([5ECEE82AA0B50470]:0) > at app//org.apache.solr.cloud.ZkShardTerms.refreshTerms(ZkShardTerms.java:377) > at app//org.apache.solr.cloud.ZkShardTerms.lambda$registerWatcher$8(ZkShardTerms.java:426) > at app//org.apache.solr.common.cloud.SolrZkClient$ProcessWatchWithExecutor.lambda$process$1(SolrZkClient.java:1083) > at app//org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:363) > at java.base@17.0.11 /java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at java.base@17.0.11 /java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base@17.0.11/java.lang.Thread.run(Thread.java:840) > > Caused by: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /collections/newcollection/terms/shard1 > at app//org.apache.zookeeper.KeeperException.create(KeeperException.java:117) > at app//org.apache.zookeeper.KeeperException.create(KeeperException.java:53) > at app//org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1972) > at app//org.apache.solr.common.cloud.SolrZkClient.lambda$getData$6(SolrZkClient.java:448) > at app//org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:70) > at app//org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:448) > at app//org.apache.solr.cloud.ZkShardTerms.refreshTerms(ZkShardTerms.java:373) > ... 6 more 2> NOTE: reproduce with: gradlew test --tests TestConfigSetsAPI.testUpload -Dtests.seed=5ECEE82AA0B50470 -Dtests.locale=mer-Latn-KE -Dtests.timezone=Asia/Magadan -Dtests.asserts=true -Dtests.file.encoding=UTF-8 While the error indicates that there is some minor bug/race condition/other issue with either ZkShardTerms#refreshTerms or the test itself, my takeaway here is that this issue (and potentially other kinds of issues) was unintentionally concealed because the result of ExecutorService#submit was discarded. There is nothing wrong with the implementation of ExecutorService#submit, but you could say that the method implies "must-use-return-value", and the code didn't do that. For the schema reload scenario, please refer to the existing test "org.apache.solr.pkg.TestPackages.testSchemaPlugins". In that scenario, uploading the new package version to the package store triggers reloading of the schema: - https://github.com/apache/solr/blob/661b1dac2284ab556573605ae81a4951a5703c49/solr/core/src/java/org/apache/solr/core/SolrResourceLoader.java#L975-L983 - https://github.com/apache/solr/blob/661b1dac2284ab556573605ae81a4951a5703c49/solr/core/src/java/org/apache/solr/pkg/PackageListeners.java#L82 This test also has a similarly concealed issue, which could be revealed by replacing `submit` method with `execute` in CoreContainer#runAsync. As for the next steps, I agree with Jason that we could start by auditing the usage of ExecutorService#submit in the codebase, to see if there are any more concealed issues and address them. I don't think that there's a need to drastically change any exception handling/logging in Solr. It may suffice to just discourage/disallow the pattern where the code discards the result of ExecutorService#submit - but we can definitely discuss more. Best, Andrey Bozhko