Hi David,

In the example of SolrZkClient.ProcessWatchWithExecutor#process, it should
be possible to use ExecutorService#submit and ExecutorService#execute
methods interchangeably - because either method would run the task in the
background. So I went ahead and replaced `submit` with `execute`, and ran
the tests. The outcome was that 200+ tests broke, and reported stacktraces
like below:

> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an
uncaught exception in thread: Thread[id=68, name=zkCallback-23-thread-1,
state=RUNNABLE, group=TGRP-TestConfigSetsAPI]
 >     at
__randomizedtesting.SeedInfo.seed([5ECEE82AA0B50470:5D74B82B43D74CFA]:0)
 >
 >     Caused by:
 >     org.apache.solr.common.SolrException: Error updating shard term for
collection: newcollection
 >         at __randomizedtesting.SeedInfo.seed([5ECEE82AA0B50470]:0)
 >         at
app//org.apache.solr.cloud.ZkShardTerms.refreshTerms(ZkShardTerms.java:377)
 >         at
app//org.apache.solr.cloud.ZkShardTerms.lambda$registerWatcher$8(ZkShardTerms.java:426)
 >         at
app//org.apache.solr.common.cloud.SolrZkClient$ProcessWatchWithExecutor.lambda$process$1(SolrZkClient.java:1083)
 >         at
app//org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:363)
 >         at java.base@17.0.11
/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
 >         at java.base@17.0.11
/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
 >         at java.base@17.0.11/java.lang.Thread.run(Thread.java:840)
 >
 >         Caused by:
 >         org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /collections/newcollection/terms/shard1
 >             at
app//org.apache.zookeeper.KeeperException.create(KeeperException.java:117)
 >             at
app//org.apache.zookeeper.KeeperException.create(KeeperException.java:53)
 >             at
app//org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1972)
 >             at
app//org.apache.solr.common.cloud.SolrZkClient.lambda$getData$6(SolrZkClient.java:448)
 >             at
app//org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:70)
 >             at
app//org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:448)
 >             at
app//org.apache.solr.cloud.ZkShardTerms.refreshTerms(ZkShardTerms.java:373)
 >             ... 6 more
2> NOTE: reproduce with: gradlew test --tests TestConfigSetsAPI.testUpload
-Dtests.seed=5ECEE82AA0B50470 -Dtests.locale=mer-Latn-KE
-Dtests.timezone=Asia/Magadan -Dtests.asserts=true
-Dtests.file.encoding=UTF-8

While the error indicates that there is some minor bug/race condition/other
issue with either ZkShardTerms#refreshTerms or the test itself, my takeaway
here is that this issue (and potentially other kinds of issues) was
unintentionally concealed because the result of ExecutorService#submit was
discarded. There is nothing wrong with the implementation of
ExecutorService#submit, but you could say that the method implies
"must-use-return-value", and the code didn't do that.

For the schema reload scenario, please refer to the existing test
"org.apache.solr.pkg.TestPackages.testSchemaPlugins". In that scenario,
uploading the new package version to the package store triggers reloading
of the schema:
-
https://github.com/apache/solr/blob/661b1dac2284ab556573605ae81a4951a5703c49/solr/core/src/java/org/apache/solr/core/SolrResourceLoader.java#L975-L983
-
https://github.com/apache/solr/blob/661b1dac2284ab556573605ae81a4951a5703c49/solr/core/src/java/org/apache/solr/pkg/PackageListeners.java#L82

This test also has a similarly concealed issue, which could be revealed by
replacing `submit` method with `execute` in CoreContainer#runAsync.

As for the next steps, I agree with Jason that we could start by auditing
the usage of ExecutorService#submit in the codebase, to see if there are
any more concealed issues and address them.

I don't think that there's a need to drastically change any exception
handling/logging in Solr. It may suffice to just discourage/disallow the
pattern where the code discards the result of ExecutorService#submit - but
we can definitely discuss more.

Best,
Andrey Bozhko

Reply via email to