[ 
https://issues.apache.org/jira/browse/SOLR-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-13781:
----------------------------
    Attachment: egrep-out.local.log.txt
                egrep-out.apache_Lucene-Solr-Tests-8.x_587.testPackageAPI.txt
                apache_Lucene-Solr-Tests-8.x_587.testPackageAPI.txt
                local.log.txt
        Status: Open  (was: Open)

I'm attaching 4 files...
 * apache_Lucene-Solr-Tests-8.x_587.testPackageAPI.txt
 ** the section of logs from the apache/Lucene-Solr-Tests-8.x/587 jenkins build 
that includes the FAIL of testPackageAPI
 * local.log.txt
 ** logs from my local PASS run of testPackageAPI using the same git SHA and 
reproduce line
 * egrep-out.apache_Lucene-Solr-Tests-8.x_587.testPackageAPI.txt
 * egrep-out.local.log.txt

...the last 2 being the output of running the following grep command on the 
first 2...
{noformat}
egrep 'Loaded cluster properties|MemClassLoader|PackageManager RequestHandlers 
being reloaded|PackageManager Registering request handlers|webapp=null 
path=/node/ext|webapp=null path=/cluster'
{noformat}
...in order to extract the lines that seem the most important to notice for 
comparison.

The key bit i'd like to point out is this section of egrep-out.local.log.txt, 
which begins with the last request client request of {{/node/ext/bar}} to 
verify it exists, and shows the initial log messages resuting from it's 
subsequent deletion...
{noformat}
6354 INFO  (qtp822543449-83) [n:127.0.0.1:36249_solr     ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/node/ext/bar params={wt=javabin&version=2} status=0 
QTime=1
6373 DEBUG (zkCallback-49-thread-1) [     ] o.a.s.c.c.ZkStateReader Loaded 
cluster properties: {package={global={name=global, 
url=http://localhost:40725/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
6373 DEBUG (zkCallback-64-thread-1) [     ] o.a.s.c.c.ZkStateReader Loaded 
cluster properties: {package={global={name=global, 
url=http://localhost:40725/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
6373 DEBUG (zkCallback-79-thread-1) [     ] o.a.s.c.c.ZkStateReader Loaded 
cluster properties: {package={global={name=global, 
url=http://localhost:40725/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
6373 DEBUG (zkCallback-70-thread-1) [     ] o.a.s.c.c.ZkStateReader Loaded 
cluster properties: {package={global={name=global, 
url=http://localhost:40725/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
6374 DEBUG (zkCallback-63-thread-1) [     ] o.a.s.c.c.ZkStateReader Loaded 
cluster properties: {package={global={name=global, 
url=http://localhost:40725/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
6374 DEBUG (zkCallback-72-thread-1) [     ] o.a.s.c.c.ZkStateReader Loaded 
cluster properties: {package={global={name=global, 
url=http://localhost:40725/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
6375 DEBUG (zkCallback-88-thread-1) [     ] o.a.s.c.c.ZkStateReader Loaded 
cluster properties: {package={global={name=global, 
url=http://localhost:40725/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
6375 DEBUG (zkCallback-53-thread-1) [     ] o.a.s.c.c.ZkStateReader Loaded 
cluster properties: {package={global={name=global, 
url=http://localhost:40725/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
6378 DEBUG (zkCallback-45-thread-1) [     ] o.a.s.c.c.ZkStateReader Loaded 
cluster properties: {package={global={name=global, 
url=http://localhost:40725/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
6380 DEBUG (zkCallback-72-thread-1) [     ] o.a.s.c.PackageManager 
RequestHandlers being reloaded : []
6381 DEBUG (zkCallback-72-thread-1) [     ] o.a.s.c.PackageManager Registering 
request handlers [] 
6381 DEBUG (zkCallback-49-thread-1) [     ] o.a.s.c.PackageManager 
RequestHandlers being reloaded : []
6381 DEBUG (zkCallback-49-thread-1) [     ] o.a.s.c.PackageManager Registering 
request handlers [] 
6386 DEBUG (zkCallback-53-thread-1) [     ] o.a.s.c.PackageManager 
RequestHandlers being reloaded : []
6387 DEBUG (zkCallback-53-thread-1) [     ] o.a.s.c.PackageManager Registering 
request handlers [] 
6388 DEBUG (zkCallback-45-thread-1) [     ] o.a.s.c.PackageManager 
RequestHandlers being reloaded : []
6388 DEBUG (zkCallback-45-thread-1) [     ] o.a.s.c.PackageManager Registering 
request handlers [] 
... 
{noformat}
 * There are 9 zkCallback threads that detect the new cluster properties 
indicating an empty list of requestHandler's
 * 4 of these zkCallback threads (corrisponding to the 4 nodes in the 
MiniSolrCloudCluster) invoke the PackageManager (the others are evidently 
unrelated cluster property watchers)
 * These 4 zkCallback threads all agree that no request handlers need reloaded, 
and no request handlers need registered.

Compare that with the same section of logs from 
egrep-out.apache_Lucene-Solr-Tests-8.x_587.testPackageAPI.txt ...
{noformat}
3532248 INFO  (qtp1010395414-41961) [n:127.0.0.1:42699_solr     ] 
o.a.s.s.HttpSolrCall [admin] webapp=null path=/node/ext/bar 
params={wt=javabin&version=2} status=0 QTime=0
3532251 DEBUG (zkCallback-14907-thread-1) [     ] o.a.s.c.c.ZkStateReader 
Loaded cluster properties: {package={global={name=global, 
url=http://localhost:43314/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
3532251 DEBUG (zkCallback-14919-thread-1) [     ] o.a.s.c.c.ZkStateReader 
Loaded cluster properties: {package={global={name=global, 
url=http://localhost:43314/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
3532251 DEBUG (zkCallback-14941-thread-1) [     ] o.a.s.c.c.ZkStateReader 
Loaded cluster properties: {package={global={name=global, 
url=http://localhost:43314/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
3532251 DEBUG (zkCallback-14935-thread-1) [     ] o.a.s.c.c.ZkStateReader 
Loaded cluster properties: {package={global={name=global, 
url=http://localhost:43314/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
3532263 DEBUG (zkCallback-14892-thread-2) [     ] o.a.s.c.c.ZkStateReader 
Loaded cluster properties: {package={global={name=global, 
url=http://localhost:43314/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
3532263 DEBUG (zkCallback-14896-thread-2) [     ] o.a.s.c.c.ZkStateReader 
Loaded cluster properties: {package={global={name=global, 
url=http://localhost:43314/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
3532264 DEBUG (zkCallback-14896-thread-2) [     ] o.a.s.c.PackageManager 
RequestHandlers being reloaded : []
3532264 DEBUG (zkCallback-14896-thread-2) [     ] o.a.s.c.PackageManager 
Registering request handlers [] 
3532265 DEBUG (zkCallback-14898-thread-2) [     ] o.a.s.c.c.ZkStateReader 
Loaded cluster properties: {package={global={name=global, 
url=http://localhost:43314/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
3532265 DEBUG (zkCallback-14912-thread-1) [     ] o.a.s.c.c.ZkStateReader 
Loaded cluster properties: {package={global={name=global, 
url=http://localhost:43314/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
3532266 DEBUG (qtp1010395414-41962) [n:127.0.0.1:42699_solr     ] 
o.a.s.c.PackageManager RequestHandlers being reloaded : []
3532266 DEBUG (qtp1010395414-41962) [n:127.0.0.1:42699_solr     ] 
o.a.s.c.PackageManager Registering request handlers [] 
3532269 DEBUG (zkCallback-14927-thread-2) [     ] o.a.s.c.c.ZkStateReader 
Loaded cluster properties: {package={global={name=global, 
url=http://localhost:43314/jar3.jar, 
sha256=20e0bfaec71b2e93c4da9f2ed3745dda04dc3fc915b66cc0275863982e73b2a3}}, 
requestHandler={}} to version 3
3532321 DEBUG (zkCallback-14892-thread-2) [     ] o.a.s.c.PackageManager 
RequestHandlers being reloaded : []
3532321 DEBUG (zkCallback-14892-thread-2) [     ] o.a.s.c.PackageManager 
Registering request handlers [] 
3532340 DEBUG (zkCallback-14898-thread-2) [     ] o.a.s.c.PackageManager 
RequestHandlers being reloaded : []
3532340 DEBUG (zkCallback-14898-thread-2) [     ] o.a.s.c.PackageManager 
Registering request handlers [] 
3532371 INFO  (zkCallback-14898-thread-1) [     ] o.a.s.c.MemClassLoader 
Defining_class org.apache.solr.core.RuntimeLibReqHandler from runtime jar 
global 
3532372 INFO  (zkCallback-14896-thread-1) [     ] o.a.s.c.MemClassLoader 
Defining_class org.apache.solr.core.RuntimeLibReqHandler from runtime jar 
global 
3532372 INFO  (zkCallback-14927-thread-1) [     ] o.a.s.c.MemClassLoader 
Defining_class org.apache.solr.core.RuntimeLibReqHandler from runtime jar 
global 
3532372 DEBUG (zkCallback-14927-thread-1) [     ] o.a.s.c.PackageManager 
Registering request handlers [bar] 
3532373 DEBUG (zkCallback-14896-thread-1) [     ] o.a.s.c.PackageManager 
Registering request handlers [bar] 
3532389 DEBUG (zkCallback-14898-thread-1) [     ] o.a.s.c.PackageManager 
Registering request handlers [bar] 
3532390 INFO  (zkCallback-14892-thread-1) [     ] o.a.s.c.MemClassLoader 
Defining_class org.apache.solr.core.RuntimeLibReqHandler from runtime jar 
global 
3532390 DEBUG (zkCallback-14892-thread-1) [     ] o.a.s.c.PackageManager 
Registering request handlers [bar] 
{noformat}
 * Again, we see 9 zkCallback threads that detect the new cluster properties 
indicating an empty list of requestHandler's
 * Again, 4 of these zkCallback threads invoke the PackageManager
 * Again, these 4 zkCallback threads all agree that no request handlers need 
reloaded, and no request handlers need registered.

... BUT! ...
 * the naming of these 4 zkCallback threads indicate they are the *2nd* thread 
created from their respective threadpools
 * subsequent to them finishing, a similarly named *1st* thread from each of 
those 4 zkCallback pools calls MemClassLoader to Defining_class & 
PackageManager to register the 'bar' request handler
 * if we search backwards in the log for these threads, it seems that these are 
still left over (and still processing) the cluster property changed events that 
occured as a result of the {{update-package}} command ... they didn't finish 
their "Defining_class" & "Registering request handlers" execution by the time 
the test executed the {{delete-requesthandler}}

(FWIW: i'm also a little perplexed why the logs in the failure case show the 
jetty request handling thread {{qtp1010395414-41962}} currently processing the 
{{/cluster}} request *also* calling PackageManager directly in addition to 
whichever zkCallback thread corrisponds to that node)

> TestContainerReqHandler.testPackageAPI failures imply race condition between 
> update-package and delete-requesthandler
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13781
>                 URL: https://issues.apache.org/jira/browse/SOLR-13781
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Priority: Major
>         Attachments: apache_Lucene-Solr-Tests-8.x_587.testPackageAPI.txt, 
> egrep-out.apache_Lucene-Solr-Tests-8.x_587.testPackageAPI.txt, 
> egrep-out.local.log.txt, local.log.txt
>
>
> We're seeing roughly an 8% failure rate from 
> {{TestContainerReqHandler.testPackageAPI}} with failures occuring on both 
> master and branch_8x, and on various jenkins servers and various OSes.
> All of the failures occur at the same place: A V2 request to {{/node/ext}} to 
> verify that that the {{requestHandler}} List is empty after issuing a 
> {{delete-requesthandler: 'bar'}} payload to the {{/cluster}} API. The logs 
> and failure message indicate that the {{'bar'}} request handler still exists 
> even the assertion does a "sleep/retry" of the verification query 10 times.
> While i don't fully understand this test, or the underlying code being 
> tested, i spent a little time digging into the logs from some of these 
> jenkins failures, and comparing them to the logs i see generated when i get a 
> successful test run locally, and I think what's happening here - and the 
> reason that {{delete-requesthandler}} seems to "fail" frequently in this test 
> method, but not in {{testSetClusterReqHandler}} - is because the prior 
> {{update-package}} command is still in process.
> After the test code runs an {{update-package}} command, the test executes 
> requests against {{/node/ext/bar}} to verify that the {{version}} has changed 
> as a result of updating the package, but i suspect this is only looking at 
> the _metadata_ that has changed as a result of the {{update-package}} command 
> and not actaully ensuring that the request handler has fully loaded - because 
> the logs when this test fails seem to show that the zkCallback threads kicked 
> off by {{update-package}} command are still running when the zkCallback 
> threads kicked off by the subsequent {{delete-requesthandler}} command are 
> running, and finish *after* them, "re-registering" the handler that was just 
> deleted.
> ----
> It's not 100% clear to me if this is _just_ a test bug - and it should be 
> monitoring something else to know when the request handler's a finished 
> loading - or if this indicates a broader flaw in the design of how commands 
> like {{add-package}} / {{update-package}} / {{add-requesthandler}} / 
> {{delete-requesthandler}} should interact if/when they occur in close 
> temporal proximity.
> (ie: if there are zkCallback watchers loading classes and initializing 
> objects as a result of cluster property changes, shouldn't there be some sort 
> of lineraization/synchronization logic to ensure that they get executed in 
> the same order on all the nodes in the cluster?)
> ----
> More detail and log file attachments to follow...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to