To circle back to the original test failure that prompted this discussion -
the failing test was getting intermittent bind exceptions on subsequent
server restarts.

I believe it's quite likely that a process' ports will remain unavailable
even after it is gone (I'm not sure if we create listening sockets with
SO_REUSEADDR). So, as to John's comment that gfsh is already synchronous, I
don't think that adding extra functionality to gfsh, to ultimately just
wait longer before exiting, is really solving the problem. I'd suggest you
adjust the tests to always start servers with `--server-port=0` so that
there are no port conflicts and let the OS handle it.

--Jens

On Wed, Sep 11, 2019 at 8:17 AM Bruce Schuchardt <bschucha...@pivotal.io>
wrote:

> Blocking or non-blocking, I don't have a strong opinion.  What I'd
> really like to have gfsh ensure, though, is that no-one is able to start
> a new instance of a server while the old process is still around.  Maybe
> the PID file is the way to do that.
>
> On 9/10/19 3:08 PM, Mark Hanson wrote:
> > Hello All,
> >
> > I would like to propose that we make the gfsh “stop server” command
> synchronous. It is causing some issues with some tests as the rest of the
> calls are blocking. Stop on the other hand immediately returns by
> comparison.
> > This causes issues as shown in GEODE-7017 specifically.
> >
> > GEODE:7017 CI failure:
> org.apache.geode.launchers.ServerStartupValueRecoveryNotificationTest >
> startupReportsOnlineOnlyAfterRedundancyRestored
> > https://issues.apache.org/jira/browse/GEODE-7017 <
> https://issues.apache.org/jira/browse/GEODE-7017>
> >
> >
> > What do people think?
> >
> > Thanks,
> > Mark
>

Reply via email to