Re: PerReplicaStatesIntegrationTest

2021-09-27 Thread Mark Miller
> > I would hope there are few developers doing cloud work that don’t understand the lazy local cluster state - it’s entirely fundamental to everything. The busy waiting, I would less surprised if someone didn’t understand, but as far as I’m concerned they are bugs too. It’s an event driven system

Re: PerReplicaStatesIntegrationTest

2021-09-27 Thread Mark Miller
David’s issue and my response are referring to the number of zk servers in the zk cluster. His issue requires more than one zk server. The tests have always used 1. Yes the whole system is supposed to work fine with a stale local cache of what’s in zk. That is the design. When that doesn’t work,

Re: PerReplicaStatesIntegrationTest

2021-09-27 Thread Ilan Ginzburg
I don't know for the fix to this specific test, but the way cluster state is maintained on a node does not depend on how many ZK nodes there are. When a node does an action against ZK, it does its write to ZK. When it needs to read, it reads from its local cache. The local cache of the node is upd

Re: PerReplicaStatesIntegrationTest

2021-09-26 Thread Mark Miller
Okay never mind. Somehow I cling to this idea that it’s easier not to get drawn into every test or feature that’s causing me problems, but I have should have known the 30 seconds it takes to address most of these things will easily be dwarfed by the theoretical back and forth over them. I’ll put in

Re: PerReplicaStatesIntegrationTest

2021-09-26 Thread Mark Miller
I should also mention, I promise this test can be 100% reliable. It’s not code I’m going to ramp up on soon though. Also, as I said I may have a different test experience than others. What tests run together and how things run will depend on hardware, core count, etc. It’s just the most common fail

Re: PerReplicaStatesIntegrationTest

2021-09-26 Thread Mark Miller
I believe all tests still run with a 1 zk cluster, if still the case, zk consistency shouldn’t matter. It’s been a long while since I’ve looked into that particular doc/issue, but even with more than 1 zk instance I believe that is only in an issue in a fairly specific case - when a client does so

Re: PerReplicaStatesIntegrationTest

2021-09-26 Thread David Smiley
This drives me crazy too. +1 to Ilan's point. For a CloudSolrClient, it's state knowledge should merely be a hint and not the final word -- need to go to ZK for that. For the HTTP based ClusterStateProvider, the receiving Solr side needs to use non-cached information -- must go to ZK always (may

Re: PerReplicaStatesIntegrationTest

2021-09-22 Thread Ilan Ginzburg
Not sure Gus I would blame the create collection code. To the best of my recollection, when the create collection call returns the collection IS fully created. This doesn't mean though (and that's the problem IMO) that the cluster state on the node that issued the collection creation call is aware

Re: PerReplicaStatesIntegrationTest

2021-09-22 Thread Gus Heck
> > why it often can’t find the collection it’s currently supposed to be > creating This sounds like things that pestered us while writing TRA tests. IIRC the problem basically comes from 2 things: 1) we return from create collection before the collection is fully created and ready to use, 2) wat

Re: PerReplicaStatesIntegrationTest

2021-09-22 Thread Ishan Chattopadhyaya
Sure, Mark. Noble or I will get to this at the earliest, hopefully by end of this week. Unfortunately, I haven't been paying attention to test failures lately. On Wed, Sep 22, 2021 at 8:09 PM Mark Miller wrote: > Perhaps I just have a unique test running experience, but this test has > been an o

PerReplicaStatesIntegrationTest

2021-09-22 Thread Mark Miller
Perhaps I just have a unique test running experience, but this test has been an outlier failure test in my test runs for months. Given that it’s newer than most tests, I imagine it’s attention grabbing days are on a downslope, so here is a poke if someone wants to check out why it often can’t find