[ 
https://issues.apache.org/jira/browse/GEODE-8589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Burcham updated GEODE-8589:
--------------------------------
    Description: 
A recent commit added a new test: 
{{MembershipIntegrationTest.locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted()}}

That's a good test. But it would be better if it ran faster and was less 
susceptible to timing issues. The problem is that the logic we are trying to 
test, {{GMSJoinLeave.leave()}} uses {{System.currentTimeMillis()}} to get the 
current time and it uses {{Thread.sleep()}} to sleep:

{code:java}
        long now = System.currentTimeMillis();
...
            if (state.joinedMembersContacted <= 0 && ((now >= locatorGiveUpTime 
&&
                tries >= minimumRetriesBeforeBecomingCoordinator) ||
                state.locatorsContacted >= locators.size())) {
...
            if (System.currentTimeMillis() > giveupTime) {
              break;
            }
...
              Thread.sleep(retrySleep);
...
              giveupTime = System.currentTimeMillis() + timeout;
...
      if (!this.isJoined) {
        logger.debug("giving up attempting to join the distributed system after 
"
            + (System.currentTimeMillis() - startTime) + "ms");
      }
...
      if (!this.isJoined && state.hasContactedAJoinedLocator) {
        throw new MemberStartupException("Unable to join the distributed system 
in "
            + (System.currentTimeMillis() - startTime) + "ms");
      }
{code}

The opportunity is to _inject_ objects that handle these two functions (time 
keeping and sleeping). This will enable us to artifically control these in our 
test! Sleeping doesn't have to take any time at all. And we can make time pass 
as quickly as we want to.

For an example of how this can be done, see [PR 
#5422|https://github.com/apache/geode/pull/5422] for GEODE-6950.

This particular test 
({{locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted()}}) is a little 
bit more involved than that one in that more objects are involved. In addition 
to {{GMSJoinLeave}}, other classes involved in the test also need current time 
and need to sleep.

When this ticket is complete:

* functional interfaces {{MillisecondProvider}} and {{Sleeper}}, currently 
defined inside {{PrimaryHandler}} will be moved higher in the (internal) 
hierarchy for use by other membership classes
* {{GMSJoinLeave}} will take a {{MillisecondProvider}} and {{Sleeper}} in its 
constructor and will delegate to those instead of calling 
{{System.currentTimeMillis()}} and {{Thread.sleep()}} directly
* TBD other classes may require injection of {{MillisecondProvider}} and 
{{Sleeper}}
* TBD other changes may be necessary in cases where collaborating classes 
currently spin up threads or synchronize between threads e.g. calling {{wait()}}
* {{locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted()}} will no 
longer employ {{Thread.sleep()}} nor will it call {{CompletableFuture.get(long 
timeout, TimeUnit unit)}}—instead it will operate deterministically (ideally in 
the same thread but not necessarily) with respect to the unit under test.


  was:
A recent commit added a new test: 
{{MembershipIntegrationTest.locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted()}}

That's a good test. But it would be better if it ran faster and was less 
susceptible to timing issues. The problem is that the logic we are trying to 
test, {{GMSJoinLeave.leave()}} uses {{System.currentTimeMillis()}} to get the 
current time and it uses {{Thread.sleep()}} to sleep:

{code:java}
        long now = System.currentTimeMillis();
...
            if (state.joinedMembersContacted <= 0 && ((now >= locatorGiveUpTime 
&&
                tries >= minimumRetriesBeforeBecomingCoordinator) ||
                state.locatorsContacted >= locators.size())) {
...
            if (System.currentTimeMillis() > giveupTime) {
              break;
            }
...
              Thread.sleep(retrySleep);
...
              giveupTime = System.currentTimeMillis() + timeout;
...
      if (!this.isJoined) {
        logger.debug("giving up attempting to join the distributed system after 
"
            + (System.currentTimeMillis() - startTime) + "ms");
      }
...
      if (!this.isJoined && state.hasContactedAJoinedLocator) {
        throw new MemberStartupException("Unable to join the distributed system 
in "
            + (System.currentTimeMillis() - startTime) + "ms");
      }
{code}

The opportunity is to _inject_ objects that handle these two functions (time 
keeping and sleeping). This will enable us to artifically control these in our 
test! Sleeping doesn't have to take any time at all. And we can make time pass 
as quickly as we want to.

For an example of how this can be done, see [PR 
#5422|https://github.com/apache/geode/pull/5422] for GEODE-6950.

This particular test 
({{locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted()}}) is a little 
bit more involved than that one in that more objects are involved. In addition 
to {{GMSJoinLeave}}, other classes involved in the test also need current time 
and need to sleep.

When this ticket is complete, {{GMSJoinLeave}} and TBD other classes will have

* functional interfaces {{MillisecondProvider}} and {{Sleeper}}, currently 
defined inside {{PrimaryHandler}} will be moved higher in the (internal) 
hierarchy for use by other membership classes
* {{GMSJoinLeave}} will take a {{MillisecondProvider}} and {{Sleeper}} in its 
constructor and will delegate to those instead of calling 
{{System.currentTimeMillis()}} and {{Thread.sleep()}} directly
* TBD other classes may require injection of {{MillisecondProvider}} and 
{{Sleeper}}
* TBD other changes may be necessary in cases where collaborating classes 
currently spin up threads or synchronize between threads e.g. calling {{wait()}}
* {{locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted()}} will no 
longer employ {{Thread.sleep()}} nor will it call {{CompletableFuture.get(long 
timeout, TimeUnit unit)}}—instead it will operate deterministically (ideally in 
the same thread but not necessarily) with respect to the unit under test.



> make locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted test 
> deterministic
> -----------------------------------------------------------------------------------
>
>                 Key: GEODE-8589
>                 URL: https://issues.apache.org/jira/browse/GEODE-8589
>             Project: Geode
>          Issue Type: Improvement
>          Components: membership
>            Reporter: Bill Burcham
>            Priority: Major
>
> A recent commit added a new test: 
> {{MembershipIntegrationTest.locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted()}}
> That's a good test. But it would be better if it ran faster and was less 
> susceptible to timing issues. The problem is that the logic we are trying to 
> test, {{GMSJoinLeave.leave()}} uses {{System.currentTimeMillis()}} to get the 
> current time and it uses {{Thread.sleep()}} to sleep:
> {code:java}
>         long now = System.currentTimeMillis();
> ...
>             if (state.joinedMembersContacted <= 0 && ((now >= 
> locatorGiveUpTime &&
>                 tries >= minimumRetriesBeforeBecomingCoordinator) ||
>                 state.locatorsContacted >= locators.size())) {
> ...
>             if (System.currentTimeMillis() > giveupTime) {
>               break;
>             }
> ...
>               Thread.sleep(retrySleep);
> ...
>               giveupTime = System.currentTimeMillis() + timeout;
> ...
>       if (!this.isJoined) {
>         logger.debug("giving up attempting to join the distributed system 
> after "
>             + (System.currentTimeMillis() - startTime) + "ms");
>       }
> ...
>       if (!this.isJoined && state.hasContactedAJoinedLocator) {
>         throw new MemberStartupException("Unable to join the distributed 
> system in "
>             + (System.currentTimeMillis() - startTime) + "ms");
>       }
> {code}
> The opportunity is to _inject_ objects that handle these two functions (time 
> keeping and sleeping). This will enable us to artifically control these in 
> our test! Sleeping doesn't have to take any time at all. And we can make time 
> pass as quickly as we want to.
> For an example of how this can be done, see [PR 
> #5422|https://github.com/apache/geode/pull/5422] for GEODE-6950.
> This particular test 
> ({{locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted()}}) is a 
> little bit more involved than that one in that more objects are involved. In 
> addition to {{GMSJoinLeave}}, other classes involved in the test also need 
> current time and need to sleep.
> When this ticket is complete:
> * functional interfaces {{MillisecondProvider}} and {{Sleeper}}, currently 
> defined inside {{PrimaryHandler}} will be moved higher in the (internal) 
> hierarchy for use by other membership classes
> * {{GMSJoinLeave}} will take a {{MillisecondProvider}} and {{Sleeper}} in its 
> constructor and will delegate to those instead of calling 
> {{System.currentTimeMillis()}} and {{Thread.sleep()}} directly
> * TBD other classes may require injection of {{MillisecondProvider}} and 
> {{Sleeper}}
> * TBD other changes may be necessary in cases where collaborating classes 
> currently spin up threads or synchronize between threads e.g. calling 
> {{wait()}}
> * {{locatorsStopWaitingForLocatorWaitTimeIfAllLocatorsContacted()}} will no 
> longer employ {{Thread.sleep()}} nor will it call 
> {{CompletableFuture.get(long timeout, TimeUnit unit)}}—instead it will 
> operate deterministically (ideally in the same thread but not necessarily) 
> with respect to the unit under test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to