Re: [DISCUSS] Improvements on client function execution API

Anilkumar Gingade Mon, 16 Sep 2019 12:37:57 -0700

Alberto,

Sorry for late response....Currently geode (java client) does provide
ability to set function timeout; but its through internal system property
"gemfire.CLIENT_FUNCTION_TIMEOUT"....Some of the tests using this property
are Tests extending "FunctionRetryTestBase".


Since this is through internal system property; we need a cleaner api to
achieve the timeout behavior.

+1 for the option a) proposed.

-Anil




On Mon, Sep 16, 2019 at 9:03 AM Dan Smith <dsm...@pivotal.io> wrote:

> Thanks for following up on this!
>
> -Dan
>
> On Mon, Sep 16, 2019 at 3:07 AM Alberto Gomez <alberto.go...@est.tech>
> wrote:
>
> > Thanks for the feedback. I also give a +1 to option a) including Dan's
> > comments.
> >
> > I'll move the RFC to the Development state and will open a ticket to
> > follow up on the implementation.
> >
> > -Alberto G.
> >
> > On 12/9/19 8:15, Jacob Barrett wrote:
> > > +1
> > >
> > > I echo Dan’s comments as well.
> > >
> > > Thanks for tackling this.
> > >
> > > -jake
> > >
> > >
> > >> On Sep 11, 2019, at 2:36 PM, Dan Smith <dsm...@pivotal.io> wrote:
> > >>
> > >> +1 - Ok, I think I've come around to option (a). We can go head and
> > add a
> > >> new execute(timeout, TimeUnit) method to the java API that is
> blocking.
> > We
> > >> can leave the existing execute() method alone, except for documenting
> > what
> > >> it is doing.
> > >>
> > >> I would like implement execute(timeout,  TimeUnit) on the server side
> as
> > >> well. Since this Execution class is shared between both client and
> > server
> > >> APIs, it would be unfortunate to have a method on Execution that
> simply
> > >> doesn't work on the server side.
> > >>
> > >> -Dan
> > >>
> > >>
> > >>> On Thu, Sep 5, 2019 at 9:25 AM Alberto Gomez <alberto.go...@est.tech
> >
> > wrote:
> > >>>
> > >>> Hi all,
> > >>>
> > >>> First of all, thanks a lot Dan and Jacob for your feedback.
> > >>>
> > >>> As we are getting close to the deadline I am adding here some
> > conclusions
> > >>> and a refined proposal in order to get some more feedback and if
> > possible
> > >>> some voting on the two alternatives proposed (or any other in between
> > if
> > >>> you feel any of them is lacking something).
> > >>>
> > >>> I also add some draft code to try to clarify a bit the more complex
> of
> > the
> > >>> alternatives.
> > >>>
> > >>>
> > >>> Proposal summary (needs a decision on which option to implement):
> > >>>
> > >>>
> >
> -------------------------------------------------------------------------------------------
> > >>>
> > >>> In order to make the API more coherent two alternatives are proposed:
> > >>>
> > >>> a) Remove the timeout from the ResultCollector::getResult() /
> document
> > >>> that the timeout has no effect, taking into account that
> > >>> Execution::execute() is always blocking.
> > >>> Additionally we could add the timeout parameter to the
> > >>> Execution::execute() method of the Java API in order to align it with
> > the
> > >>> native client APIs. This timeout would not be the read timeout on the
> > >>> socket but a timeout for the execution of the operation.
> > >>>
> > >>> b) Change the implementation of the Execution::execute() method
> without
> > >>> timeout to be non-blocking on both the Java and native APIs. This
> > change
> > >>> has backward compatibility implications, would probably bring some
> > >>> performance decrease and could pose some difficulties in the
> > implementation
> > >>> on the C++ side (in the  handling of timed out operations that hold
> > >>> resources).
> > >>>
> > >>>
> > >>> The first option (a) is less risky and does not have impacts
> regarding
> > >>> backward compatibility and performance.
> > >>>
> > >>> The second one (b) is the preferred alternative in terms of the
> > expected
> > >>> behavior from the users of the API. This option is more complex to
> > >>> implement and as mentioned above has performance and backward
> > compatibility
> > >>> issues not easy to be solved.
> > >>>
> > >>> Following is a draft version of the implementation of b) on the Java
> > >>> client:
> > >>>
> > >>>
> >
> https://github.com/Nordix/geode/commit/507a795e34c6083c129bda7e976b9223d1a893da
> > >>>
> > >>> Following is a draft version of the implementation of b) on the C++
> > native
> > >>> client:
> > >>>
> > >>>
> >
> https://github.com/apache/geode-native/commit/a03a56f229bb8d75ee71044cf6196df07f43150d
> > >>>
> > >>> Note that the above implementation of b) in the C++ client implies
> that
> > >>> the Execution object returned by the FunctionService cannot be
> > destroyed
> > >>> until the thread executing the function asynchronously has finished.
> > If the
> > >>> function times out, the Execution object must be kept until the
> thread
> > >>> finishes.
> > >>>
> > >>>
> > >>> Other considerations
> > >>> -------------------------
> > >>>
> > >>> * Currently, in the function execution Java client there is not a
> > >>> possibility to set a timeout for the execution of functions. The
> > closest to
> > >>> this is the read timeout that may be set globally for function
> > executions
> > >>> but this is not really a timeout for operations.
> > >>>
> > >>> * Even if the API for function execution is the same on clients and
> > >>> servers, the implementation is different between them. On the
> clients,
> > the
> > >>> execute() methods are blocking while on the servers it is
> non-blocking
> > and
> > >>> the invoker of the function blocks on the getResult() method of the
> > >>> ResultCollector returned by the execute() method.
> > >>> Even if having both blocking and non-blocking implementation of
> > execute()
> > >>> in both clients and servers sounds desirable from the point of view
> of
> > >>> orthogonality, this  could bring complications in terms of backward
> > >>> compatibility. Besides, a need for a blocking version of function
> > execution
> > >>> has not been found.
> > >>>
> > >>> -Alberto G.
> > >>>
> > >>> On 29/8/19 16:48, Alberto Gomez wrote:
> > >>>
> > >>> Sorry, some corrections on my comments after revisiting the native
> > >>> client code.
> > >>>
> > >>> When I said that the timeout used in the execution() method (by means
> > of
> > >>> a system property) was to set a read timeout on the socket, I was
> only
> > >>> talking about the Java client. In the case of the native clients, the
> > >>> timeout set in the execute() method is not translated into a socket
> > >>> timeout but it is the time to wait for the operation to complete,
> i.e.,
> > >>> to get all the results back.
> > >>>
> > >>> Things being so, I would change my proposal to:
> > >>>
> > >>> - Change the implementation of execute() on both Java and native
> > clients
> > >>> to be non-blocking (having the blocking/non-blocking behavior
> > >>> configurable in the release this is introduced and leaving only the
> > >>> non-blocking behavior in the next release).
> > >>>
> > >>> - Either remove the execute() with timeout methods in the native
> > clients
> > >>> (with a deprecation release) or implement the execute(timeout) method
> > in
> > >>> the Java client to be blocking (to work as the native client does
> > >>> today). In case the method times out, the connection will not be
> > closed.
> > >>> If the operation times out due to the socket timeout (system
> property),
> > >>> then the connection will be closed as it is now done in the Java
> > client.
> > >>>
> > >>> - Do not implement the blocking execute(timeout) method on the server
> > >>> and leave the current execute() implementation on the server as it is
> > >>> (non-blocking)
> > >>>
> > >>> Does this make sense?
> > >>>
> > >>> -Alberto
> > >>>
> > >>> On 29/8/19 12:56, Alberto Gómez wrote:
> > >>>
> > >>>
> > >>> Hi Dan,
> > >>>
> > >>> Discussing these matters by e-mail is getting tricky.
> > >>>
> > >>> Let's see if I understand you correctly and also if I am being clear
> > >>> enough.
> > >>>
> > >>> Please, see my comments inline.
> > >>>
> > >>> On 29/8/19 0:49, Dan Smith wrote:
> > >>>
> > >>>
> > >>> Sorry for the slow response, I've been trying to decide what I think
> > >>> is the
> > >>> right approach here.
> > >>>
> > >>> For (1) - conceptually, I don't have a problem with having both
> > blocking
> > >>> and non blocking methods on Execution. So adding blocking versions of
> > >>> execute() with a timeout seems ok. But I do think if we add them we
> > >>> need to
> > >>> implement them on both the client and the server to behave the same
> > way.
> > >>> That shouldn't be too hard on the server since execute(timeout) can
> > just
> > >>> call getResult(timeout) internally.
> > >>>
> > >>>
> > >>>
> > >>> We have to take into account that, currently, the timeout in
> execute()
> > >>> is not the same thing as the timeout in getResult().
> > >>>
> > >>> On the one hand, the timeout set in execute() (via System property in
> > >>> the Java client, and with a parameter in the native client) sets a
> > >>> readtimeout on the socket which just means that if nothing is read
> > >>> from the socket after sending the request to the server for the given
> > >>> timeout, the corresponding exception will be thrown. It looks to me
> > >>> more like a protection against possible communication failures rather
> > >>> than a mechanism to decide if results took too long to be provided.
> So
> > >>> I would not link the presence of the timeout parameter in the method
> > >>> to the nature of the method (blocking or non-blocking). I think we
> > >>> could have this read timeout set and at the same time keep that
> method
> > >>> as non-blocking.
> > >>>
> > >>> On the other hand, the timeout set in getResult() is a timeout to
> wait
> > >>> for all the results to be received from the moment the method is
> > invoked.
> > >>>
> > >>> Therefore, I would not implement the blocking version of execute() on
> > >>> the server by calling getResult() at the end.
> > >>>
> > >>> Apart from that, I doubt if it would make sense to set this
> > >>> readTimeout in the execute() methods from servers given that the
> > >>> communication is very different to the one done with clients. I also
> > >>> doubt that anyone would be interested in the blocking version of
> > >>> execute() on the server.
> > >>>
> > >>> My proposal was to add the readtimeout to the execute() methods in
> the
> > >>> Java client in order to align the Java client and the native client.
> > >>> This change would be independent to the decision we make regarding
> the
> > >>> change of execute() to non-blocking. To achieve this alignment,
> > >>> alternatively, we could remove the timeout parameter in execute()
> from
> > >>> the native clients and have it as a global property for the client to
> > >>> be set by whatever mechanism available as it is done in the Java
> > >>> client today.
> > >>>
> > >>> Were you proposing that the execute() methods with timeout were
> > >>> blocking and the ones without timeout non-blocking? Not sure if this
> > >>> is something you meant.
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> For (2) - Although I think the original authors of this API probably
> > did
> > >>> intend for execute() to be non-blocking, the fact is that it does
> > >>> block on
> > >>> the client and most users are probably calling execute from a client.
> > >>> So I
> > >>> do agree we probably shouldn't change the behavior at this point.
> > >>> Perhaps
> > >>> we can just clearly document the current behavior of execute() as
> > >>> part of
> > >>> adding these new methods. Going forward we can add new methods to
> > >>> Execution
> > >>> that are clearly non-blocking (submit?, invoke?) and implement them
> > >>> consistently on *both* the client in the server, but that doesn't
> > >>> have to
> > >>> be in the scope of this proposal.
> > >>>
> > >>>
> > >>>
> > >>> The problem I see with adding new non-blocking methods
> (new/submit...)
> > >>> is that it would be a solution for the current users of the client
> > >>> regarding backwards compatibility. But, on the server side, we would
> > >>> have to move the current logic of execute() which is non-blocking to
> > >>> the new methods and change the current execute() behavior to
> blocking.
> > >>> We would not impact the users of the client but we would impact the
> > >>> users of the server.
> > >>>
> > >>> Again, I would propose to aim at:
> > >>>
> > >>> a) either leave execute() on the client as blocking
> > >>>
> > >>> b) or change execute() on the client to be non-blocking but on the
> > >>> Geode release this is introduced, have it configurable. The default
> > >>> behavior would be blocking (deprecated behavior) but could be set to
> > >>> non-blocking with a system property. On the next release, the
> blocking
> > >>> behavior would be removed.
> > >>>
> > >>> - Alberto G.
> > >>>
> > >>>
> > >>>
> > >>> -Dan
> > >>>
> > >>> On Fri, Aug 23, 2019 at 4:28 AM Alberto Gomez <alberto.go...@est.tech
> > >>>> <mailto:alberto.go...@est.tech>
> > >>> wrote:
> > >>>
> > >>>
> > >>>
> > >>> Hi Jake,
> > >>>
> > >>> Please, see my answers below.
> > >>>
> > >>> On 22/8/19 21:16, Jacob Barrett wrote:
> > >>>
> > >>>
> > >>> On Aug 21, 2019, at 8:49 AM, Alberto Gomez <alberto.go...@est.tech
> > >>>> <mailto:alberto.go...@est.tech>
> > >>>
> > >>> wrote:
> > >>>
> > >>>
> > >>> 2. Timeout in ResultCollector::getResult() and Execution::execute()
> > >>>
> > >>>
> > >>> blocking
> > >>>
> > >>>
> > >>> Regarding the timeout in the ResultCollector::getResult() method
> > >>>
> > >>>
> > >>> problem and the blocking/non-blocking confusion for
> > >>> Execution::execute()
> > >>> two alternatives are considered:
> > >>>
> > >>>
> > >>> a) Remove the possibility of setting a timeout on the
> > >>>
> > >>>
> > >>> ResultCollector::getResult() method on the client side as with the
> > >>> current
> > >>> client implementation it is useless. This could be done by removing
> the
> > >>> method with the timeout parameter from the public API.
> > >>>
> > >>>
> > >>> It would be advisable to make explicit in the documentation that the
> > >>>
> > >>>
> > >>> getResult() method does not wait for results to arrive as that
> > >>> should have
> > >>> already been done in the Execution::execute() invocation.
> > >>>
> > >>>
> > >>> This alternative is very simple and would keep things pretty much as
> > >>>
> > >>>
> > >>> they are today.
> > >>>
> > >>>
> > >>> To be honest I think approach would go against what a user “thinks”
> is
> > >>>
> > >>>
> > >>> going on. Given that there hasn’t been a timeout on execute I
> > >>> assumed it
> > >>> was asynchronous and that the getResult blocked until timeout or
> > >>> results
> > >>> arrived. Typically these two calls were done back to back.
> > >>>
> > >>> You are right if you look at the Java client. But if you look at the
> > >>> native clients, the timeout is there, both in the C++ and the C#
> cases
> > >>> which would indicate that it is a blocking call.
> > >>>
> > >>> See
> > >>>
> > >>>
> > >>>
> >
> https://geode.apache.org/releases/latest/cppdocs/a00725.html#aa918a5e193745950e12ca4feb9c5d776
> > >>>
> > >>> and
> > >>>
> > >>>
> > >>>
> >
> https://geode.apache.org/releases/latest/dotnetdocs/a00882.html#ae0a814049482ca424f89c13ab1099c3d
> > >>>
> > >>>
> > >>>
> > >>> b) Transform the Execution::execute() method on the client side
> > >>> into a
> > >>>
> > >>>
> > >>> non-blocking method.
> > >>>
> > >>>
> > >>> This alternative is more complex and requires changes in all the
> > >>>
> > >>>
> > >>> clients. Apart from that it has implications on the public client
> > >>> API it
> > >>> requires moving the exceptions offered currently by the
> > >>> Execution::execute() method to the ResultCollector::getResult() and
> new
> > >>> threads will have to be managed.
> > >>>
> > >>>
> > >>> I think this is more in line with what users expect is going on
> > >>> based on
> > >>>
> > >>>
> > >>> the current API, I know I have. If were are going to make any change
> I
> > >>> think this is the one. I don’t think the behavior change is a
> > >>> problem since
> > >>> it's what is expected to be happening anyway.
> > >>>
> > >>>
> > >>> An outline of a possible implementation for option b) would be:
> > >>>
> > >>>    *   Instead of invoking the ServerRegionProxy::executeFunction()
> > >>>
> > >>>
> > >>> directly as it is done today, create a Future that invokes this
> > >>> method and
> > >>> returns the resultCollector passed as parameter.
> > >>>
> > >>>
> > >>> Do you really think we need to introduce Futures here into the API? I
> > >>>
> > >>>
> > >>> feel like the ResultCollector acts as the Future. I don’t think any
> > >>> change
> > >>> needs to be made to the API in this regard. The ResultCollector
> > >>> implementation would just simply block as implied by the api for the
> > >>> timeout period. I would change the method to have units though and
> > >>> deprecate the method without units.
> > >>>
> > >>> I did not mean to introduce Futures in the API. My idea was to use
> Java
> > >>> Futures internally so that the ResultCollector returned by the
> > >>> getResult() method would wrap the Java Future with the
> ResultCollector
> > >>> that would actually hold the result.
> > >>>
> > >>> An alternative would be to leave the logic of blocking to each
> > >>> implementation ResultCollector. In the case of the
> > >>> DefaultResultCollector, we could use a CountDownLatch that would be
> > >>> decremented when endResults() is called and that would make
> getResult()
> > >>> block by using CountDown.await(...).
> > >>>
> > >>> The advantage of using Futures internally is that the blocking logic
> > >>> would not have to be implemented on every ResultCollector
> > >>> implementation.
> > >>>
> > >>>
> > >>>
> > >>> -Jake
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> - Alberto
> > >>>
> > >>>
> > >>>
> >
>

Re: [DISCUSS] Improvements on client function execution API

Reply via email to