Good question. I will have to look into that. Thanks, Mark
> On Sep 11, 2019, at 10:53 AM, Dan Smith <dsm...@pivotal.io> wrote: > >> The idea I am working with at the moment that Kirk pointed me at was to > use the pid file in the directory as indicator. Once that file disappears > the server is stopped. > > How will this work if stop server --member is invoked some a different > machine than the member that is being stopped? > > -Dan > > On Wed, Sep 11, 2019 at 10:28 AM Mark Hanson <mhan...@pivotal.io> wrote: > >> The idea I am working with at the moment that Kirk pointed me at was to >> use the pid file in the directory as indicator. Once that file disappears >> the server is stopped. >> >> That seems to work in my testing. >> >> Thoughts? >> >> Thanks, >> Mark >> >>> On Sep 11, 2019, at 10:23 AM, Dan Smith <dsm...@pivotal.io> wrote: >>> >>> It does seem like we should make stop synchronous, or at least make start >>> wait for the old process to die as Bruce suggested. Otherwise it is >>> difficult for someone to script the restart of a server. >>> >>> Looking at the code, it does look like gfsh stop is asynchronous. There >> are >>> multiple ways to stop a server: >>> * gfsh stop --dir - it looks like we write out some stop file and return >>> immediately. Or, if we can connect over JMX, we invoke the >>> MemberMBean.shutDownMember method, which launches a thread to close the >>> cache, which is also asynchronous. >>> * gfsh stop --pid - this seems to be similar to --dir >>> * With a member name - this appears to go to the >> MemberMBean.shutDownMember >>> method as well. >>> >>> I think one issue is that the JMX methods to stopping the server may be >>> hard to ensure the process is really gone, because they can be invoked >>> remotely. That may be why they are asynchronous - they need to return >>> something to the caller before shutting down. So maybe Bruce's suggestion >>> is better. >>> >>> As Jens pointed out - tests should generally just use port 0 for servers. >>> >>> -Dan >>> >>> On Wed, Sep 11, 2019 at 8:46 AM Jens Deppe <jensde...@apache.org> wrote: >>> >>>> To circle back to the original test failure that prompted this >> discussion - >>>> the failing test was getting intermittent bind exceptions on subsequent >>>> server restarts. >>>> >>>> I believe it's quite likely that a process' ports will remain >> unavailable >>>> even after it is gone (I'm not sure if we create listening sockets with >>>> SO_REUSEADDR). So, as to John's comment that gfsh is already >> synchronous, I >>>> don't think that adding extra functionality to gfsh, to ultimately just >>>> wait longer before exiting, is really solving the problem. I'd suggest >> you >>>> adjust the tests to always start servers with `--server-port=0` so that >>>> there are no port conflicts and let the OS handle it. >>>> >>>> --Jens >>>> >>>> On Wed, Sep 11, 2019 at 8:17 AM Bruce Schuchardt < >> bschucha...@pivotal.io> >>>> wrote: >>>> >>>>> Blocking or non-blocking, I don't have a strong opinion. What I'd >>>>> really like to have gfsh ensure, though, is that no-one is able to >> start >>>>> a new instance of a server while the old process is still around. >> Maybe >>>>> the PID file is the way to do that. >>>>> >>>>> On 9/10/19 3:08 PM, Mark Hanson wrote: >>>>>> Hello All, >>>>>> >>>>>> I would like to propose that we make the gfsh “stop server” command >>>>> synchronous. It is causing some issues with some tests as the rest of >> the >>>>> calls are blocking. Stop on the other hand immediately returns by >>>>> comparison. >>>>>> This causes issues as shown in GEODE-7017 specifically. >>>>>> >>>>>> GEODE:7017 CI failure: >>>>> org.apache.geode.launchers.ServerStartupValueRecoveryNotificationTest > >>>>> startupReportsOnlineOnlyAfterRedundancyRestored >>>>>> https://issues.apache.org/jira/browse/GEODE-7017 < >>>>> https://issues.apache.org/jira/browse/GEODE-7017> >>>>>> >>>>>> >>>>>> What do people think? >>>>>> >>>>>> Thanks, >>>>>> Mark >>>>> >>>> >> >>