Alberto, Sorry for late response....Currently geode (java client) does provide ability to set function timeout; but its through internal system property "gemfire.CLIENT_FUNCTION_TIMEOUT"....Some of the tests using this property are Tests extending "FunctionRetryTestBase".
Since this is through internal system property; we need a cleaner api to achieve the timeout behavior. +1 for the option a) proposed. -Anil On Mon, Sep 16, 2019 at 9:03 AM Dan Smith <dsm...@pivotal.io> wrote: > Thanks for following up on this! > > -Dan > > On Mon, Sep 16, 2019 at 3:07 AM Alberto Gomez <alberto.go...@est.tech> > wrote: > > > Thanks for the feedback. I also give a +1 to option a) including Dan's > > comments. > > > > I'll move the RFC to the Development state and will open a ticket to > > follow up on the implementation. > > > > -Alberto G. > > > > On 12/9/19 8:15, Jacob Barrett wrote: > > > +1 > > > > > > I echo Dan’s comments as well. > > > > > > Thanks for tackling this. > > > > > > -jake > > > > > > > > >> On Sep 11, 2019, at 2:36 PM, Dan Smith <dsm...@pivotal.io> wrote: > > >> > > >> +1 - Ok, I think I've come around to option (a). We can go head and > > add a > > >> new execute(timeout, TimeUnit) method to the java API that is > blocking. > > We > > >> can leave the existing execute() method alone, except for documenting > > what > > >> it is doing. > > >> > > >> I would like implement execute(timeout, TimeUnit) on the server side > as > > >> well. Since this Execution class is shared between both client and > > server > > >> APIs, it would be unfortunate to have a method on Execution that > simply > > >> doesn't work on the server side. > > >> > > >> -Dan > > >> > > >> > > >>> On Thu, Sep 5, 2019 at 9:25 AM Alberto Gomez <alberto.go...@est.tech > > > > wrote: > > >>> > > >>> Hi all, > > >>> > > >>> First of all, thanks a lot Dan and Jacob for your feedback. > > >>> > > >>> As we are getting close to the deadline I am adding here some > > conclusions > > >>> and a refined proposal in order to get some more feedback and if > > possible > > >>> some voting on the two alternatives proposed (or any other in between > > if > > >>> you feel any of them is lacking something). > > >>> > > >>> I also add some draft code to try to clarify a bit the more complex > of > > the > > >>> alternatives. > > >>> > > >>> > > >>> Proposal summary (needs a decision on which option to implement): > > >>> > > >>> > > > ------------------------------------------------------------------------------------------- > > >>> > > >>> In order to make the API more coherent two alternatives are proposed: > > >>> > > >>> a) Remove the timeout from the ResultCollector::getResult() / > document > > >>> that the timeout has no effect, taking into account that > > >>> Execution::execute() is always blocking. > > >>> Additionally we could add the timeout parameter to the > > >>> Execution::execute() method of the Java API in order to align it with > > the > > >>> native client APIs. This timeout would not be the read timeout on the > > >>> socket but a timeout for the execution of the operation. > > >>> > > >>> b) Change the implementation of the Execution::execute() method > without > > >>> timeout to be non-blocking on both the Java and native APIs. This > > change > > >>> has backward compatibility implications, would probably bring some > > >>> performance decrease and could pose some difficulties in the > > implementation > > >>> on the C++ side (in the handling of timed out operations that hold > > >>> resources). > > >>> > > >>> > > >>> The first option (a) is less risky and does not have impacts > regarding > > >>> backward compatibility and performance. > > >>> > > >>> The second one (b) is the preferred alternative in terms of the > > expected > > >>> behavior from the users of the API. This option is more complex to > > >>> implement and as mentioned above has performance and backward > > compatibility > > >>> issues not easy to be solved. > > >>> > > >>> Following is a draft version of the implementation of b) on the Java > > >>> client: > > >>> > > >>> > > > https://github.com/Nordix/geode/commit/507a795e34c6083c129bda7e976b9223d1a893da > > >>> > > >>> Following is a draft version of the implementation of b) on the C++ > > native > > >>> client: > > >>> > > >>> > > > https://github.com/apache/geode-native/commit/a03a56f229bb8d75ee71044cf6196df07f43150d > > >>> > > >>> Note that the above implementation of b) in the C++ client implies > that > > >>> the Execution object returned by the FunctionService cannot be > > destroyed > > >>> until the thread executing the function asynchronously has finished. > > If the > > >>> function times out, the Execution object must be kept until the > thread > > >>> finishes. > > >>> > > >>> > > >>> Other considerations > > >>> ------------------------- > > >>> > > >>> * Currently, in the function execution Java client there is not a > > >>> possibility to set a timeout for the execution of functions. The > > closest to > > >>> this is the read timeout that may be set globally for function > > executions > > >>> but this is not really a timeout for operations. > > >>> > > >>> * Even if the API for function execution is the same on clients and > > >>> servers, the implementation is different between them. On the > clients, > > the > > >>> execute() methods are blocking while on the servers it is > non-blocking > > and > > >>> the invoker of the function blocks on the getResult() method of the > > >>> ResultCollector returned by the execute() method. > > >>> Even if having both blocking and non-blocking implementation of > > execute() > > >>> in both clients and servers sounds desirable from the point of view > of > > >>> orthogonality, this could bring complications in terms of backward > > >>> compatibility. Besides, a need for a blocking version of function > > execution > > >>> has not been found. > > >>> > > >>> -Alberto G. > > >>> > > >>> On 29/8/19 16:48, Alberto Gomez wrote: > > >>> > > >>> Sorry, some corrections on my comments after revisiting the native > > >>> client code. > > >>> > > >>> When I said that the timeout used in the execution() method (by means > > of > > >>> a system property) was to set a read timeout on the socket, I was > only > > >>> talking about the Java client. In the case of the native clients, the > > >>> timeout set in the execute() method is not translated into a socket > > >>> timeout but it is the time to wait for the operation to complete, > i.e., > > >>> to get all the results back. > > >>> > > >>> Things being so, I would change my proposal to: > > >>> > > >>> - Change the implementation of execute() on both Java and native > > clients > > >>> to be non-blocking (having the blocking/non-blocking behavior > > >>> configurable in the release this is introduced and leaving only the > > >>> non-blocking behavior in the next release). > > >>> > > >>> - Either remove the execute() with timeout methods in the native > > clients > > >>> (with a deprecation release) or implement the execute(timeout) method > > in > > >>> the Java client to be blocking (to work as the native client does > > >>> today). In case the method times out, the connection will not be > > closed. > > >>> If the operation times out due to the socket timeout (system > property), > > >>> then the connection will be closed as it is now done in the Java > > client. > > >>> > > >>> - Do not implement the blocking execute(timeout) method on the server > > >>> and leave the current execute() implementation on the server as it is > > >>> (non-blocking) > > >>> > > >>> Does this make sense? > > >>> > > >>> -Alberto > > >>> > > >>> On 29/8/19 12:56, Alberto Gómez wrote: > > >>> > > >>> > > >>> Hi Dan, > > >>> > > >>> Discussing these matters by e-mail is getting tricky. > > >>> > > >>> Let's see if I understand you correctly and also if I am being clear > > >>> enough. > > >>> > > >>> Please, see my comments inline. > > >>> > > >>> On 29/8/19 0:49, Dan Smith wrote: > > >>> > > >>> > > >>> Sorry for the slow response, I've been trying to decide what I think > > >>> is the > > >>> right approach here. > > >>> > > >>> For (1) - conceptually, I don't have a problem with having both > > blocking > > >>> and non blocking methods on Execution. So adding blocking versions of > > >>> execute() with a timeout seems ok. But I do think if we add them we > > >>> need to > > >>> implement them on both the client and the server to behave the same > > way. > > >>> That shouldn't be too hard on the server since execute(timeout) can > > just > > >>> call getResult(timeout) internally. > > >>> > > >>> > > >>> > > >>> We have to take into account that, currently, the timeout in > execute() > > >>> is not the same thing as the timeout in getResult(). > > >>> > > >>> On the one hand, the timeout set in execute() (via System property in > > >>> the Java client, and with a parameter in the native client) sets a > > >>> readtimeout on the socket which just means that if nothing is read > > >>> from the socket after sending the request to the server for the given > > >>> timeout, the corresponding exception will be thrown. It looks to me > > >>> more like a protection against possible communication failures rather > > >>> than a mechanism to decide if results took too long to be provided. > So > > >>> I would not link the presence of the timeout parameter in the method > > >>> to the nature of the method (blocking or non-blocking). I think we > > >>> could have this read timeout set and at the same time keep that > method > > >>> as non-blocking. > > >>> > > >>> On the other hand, the timeout set in getResult() is a timeout to > wait > > >>> for all the results to be received from the moment the method is > > invoked. > > >>> > > >>> Therefore, I would not implement the blocking version of execute() on > > >>> the server by calling getResult() at the end. > > >>> > > >>> Apart from that, I doubt if it would make sense to set this > > >>> readTimeout in the execute() methods from servers given that the > > >>> communication is very different to the one done with clients. I also > > >>> doubt that anyone would be interested in the blocking version of > > >>> execute() on the server. > > >>> > > >>> My proposal was to add the readtimeout to the execute() methods in > the > > >>> Java client in order to align the Java client and the native client. > > >>> This change would be independent to the decision we make regarding > the > > >>> change of execute() to non-blocking. To achieve this alignment, > > >>> alternatively, we could remove the timeout parameter in execute() > from > > >>> the native clients and have it as a global property for the client to > > >>> be set by whatever mechanism available as it is done in the Java > > >>> client today. > > >>> > > >>> Were you proposing that the execute() methods with timeout were > > >>> blocking and the ones without timeout non-blocking? Not sure if this > > >>> is something you meant. > > >>> > > >>> > > >>> > > >>> > > >>> For (2) - Although I think the original authors of this API probably > > did > > >>> intend for execute() to be non-blocking, the fact is that it does > > >>> block on > > >>> the client and most users are probably calling execute from a client. > > >>> So I > > >>> do agree we probably shouldn't change the behavior at this point. > > >>> Perhaps > > >>> we can just clearly document the current behavior of execute() as > > >>> part of > > >>> adding these new methods. Going forward we can add new methods to > > >>> Execution > > >>> that are clearly non-blocking (submit?, invoke?) and implement them > > >>> consistently on *both* the client in the server, but that doesn't > > >>> have to > > >>> be in the scope of this proposal. > > >>> > > >>> > > >>> > > >>> The problem I see with adding new non-blocking methods > (new/submit...) > > >>> is that it would be a solution for the current users of the client > > >>> regarding backwards compatibility. But, on the server side, we would > > >>> have to move the current logic of execute() which is non-blocking to > > >>> the new methods and change the current execute() behavior to > blocking. > > >>> We would not impact the users of the client but we would impact the > > >>> users of the server. > > >>> > > >>> Again, I would propose to aim at: > > >>> > > >>> a) either leave execute() on the client as blocking > > >>> > > >>> b) or change execute() on the client to be non-blocking but on the > > >>> Geode release this is introduced, have it configurable. The default > > >>> behavior would be blocking (deprecated behavior) but could be set to > > >>> non-blocking with a system property. On the next release, the > blocking > > >>> behavior would be removed. > > >>> > > >>> - Alberto G. > > >>> > > >>> > > >>> > > >>> -Dan > > >>> > > >>> On Fri, Aug 23, 2019 at 4:28 AM Alberto Gomez <alberto.go...@est.tech > > >>>> <mailto:alberto.go...@est.tech> > > >>> wrote: > > >>> > > >>> > > >>> > > >>> Hi Jake, > > >>> > > >>> Please, see my answers below. > > >>> > > >>> On 22/8/19 21:16, Jacob Barrett wrote: > > >>> > > >>> > > >>> On Aug 21, 2019, at 8:49 AM, Alberto Gomez <alberto.go...@est.tech > > >>>> <mailto:alberto.go...@est.tech> > > >>> > > >>> wrote: > > >>> > > >>> > > >>> 2. Timeout in ResultCollector::getResult() and Execution::execute() > > >>> > > >>> > > >>> blocking > > >>> > > >>> > > >>> Regarding the timeout in the ResultCollector::getResult() method > > >>> > > >>> > > >>> problem and the blocking/non-blocking confusion for > > >>> Execution::execute() > > >>> two alternatives are considered: > > >>> > > >>> > > >>> a) Remove the possibility of setting a timeout on the > > >>> > > >>> > > >>> ResultCollector::getResult() method on the client side as with the > > >>> current > > >>> client implementation it is useless. This could be done by removing > the > > >>> method with the timeout parameter from the public API. > > >>> > > >>> > > >>> It would be advisable to make explicit in the documentation that the > > >>> > > >>> > > >>> getResult() method does not wait for results to arrive as that > > >>> should have > > >>> already been done in the Execution::execute() invocation. > > >>> > > >>> > > >>> This alternative is very simple and would keep things pretty much as > > >>> > > >>> > > >>> they are today. > > >>> > > >>> > > >>> To be honest I think approach would go against what a user “thinks” > is > > >>> > > >>> > > >>> going on. Given that there hasn’t been a timeout on execute I > > >>> assumed it > > >>> was asynchronous and that the getResult blocked until timeout or > > >>> results > > >>> arrived. Typically these two calls were done back to back. > > >>> > > >>> You are right if you look at the Java client. But if you look at the > > >>> native clients, the timeout is there, both in the C++ and the C# > cases > > >>> which would indicate that it is a blocking call. > > >>> > > >>> See > > >>> > > >>> > > >>> > > > https://geode.apache.org/releases/latest/cppdocs/a00725.html#aa918a5e193745950e12ca4feb9c5d776 > > >>> > > >>> and > > >>> > > >>> > > >>> > > > https://geode.apache.org/releases/latest/dotnetdocs/a00882.html#ae0a814049482ca424f89c13ab1099c3d > > >>> > > >>> > > >>> > > >>> b) Transform the Execution::execute() method on the client side > > >>> into a > > >>> > > >>> > > >>> non-blocking method. > > >>> > > >>> > > >>> This alternative is more complex and requires changes in all the > > >>> > > >>> > > >>> clients. Apart from that it has implications on the public client > > >>> API it > > >>> requires moving the exceptions offered currently by the > > >>> Execution::execute() method to the ResultCollector::getResult() and > new > > >>> threads will have to be managed. > > >>> > > >>> > > >>> I think this is more in line with what users expect is going on > > >>> based on > > >>> > > >>> > > >>> the current API, I know I have. If were are going to make any change > I > > >>> think this is the one. I don’t think the behavior change is a > > >>> problem since > > >>> it's what is expected to be happening anyway. > > >>> > > >>> > > >>> An outline of a possible implementation for option b) would be: > > >>> > > >>> * Instead of invoking the ServerRegionProxy::executeFunction() > > >>> > > >>> > > >>> directly as it is done today, create a Future that invokes this > > >>> method and > > >>> returns the resultCollector passed as parameter. > > >>> > > >>> > > >>> Do you really think we need to introduce Futures here into the API? I > > >>> > > >>> > > >>> feel like the ResultCollector acts as the Future. I don’t think any > > >>> change > > >>> needs to be made to the API in this regard. The ResultCollector > > >>> implementation would just simply block as implied by the api for the > > >>> timeout period. I would change the method to have units though and > > >>> deprecate the method without units. > > >>> > > >>> I did not mean to introduce Futures in the API. My idea was to use > Java > > >>> Futures internally so that the ResultCollector returned by the > > >>> getResult() method would wrap the Java Future with the > ResultCollector > > >>> that would actually hold the result. > > >>> > > >>> An alternative would be to leave the logic of blocking to each > > >>> implementation ResultCollector. In the case of the > > >>> DefaultResultCollector, we could use a CountDownLatch that would be > > >>> decremented when endResults() is called and that would make > getResult() > > >>> block by using CountDown.await(...). > > >>> > > >>> The advantage of using Futures internally is that the blocking logic > > >>> would not have to be implemented on every ResultCollector > > >>> implementation. > > >>> > > >>> > > >>> > > >>> -Jake > > >>> > > >>> > > >>> > > >>> > > >>> - Alberto > > >>> > > >>> > > >>> > > >