Thanks Anil. This is how this started, trying to pop the timeout up from a system property to a parameter in the Execution methods.
-Alberto G. On 16/9/19 21:37, Anilkumar Gingade wrote: > Alberto, > > Sorry for late response....Currently geode (java client) does provide > ability to set function timeout; but its through internal system property > "gemfire.CLIENT_FUNCTION_TIMEOUT"....Some of the tests using this property > are Tests extending "FunctionRetryTestBase". > > Since this is through internal system property; we need a cleaner api to > achieve the timeout behavior. > > +1 for the option a) proposed. > > -Anil > > > > > On Mon, Sep 16, 2019 at 9:03 AM Dan Smith <dsm...@pivotal.io> wrote: > >> Thanks for following up on this! >> >> -Dan >> >> On Mon, Sep 16, 2019 at 3:07 AM Alberto Gomez <alberto.go...@est.tech> >> wrote: >> >>> Thanks for the feedback. I also give a +1 to option a) including Dan's >>> comments. >>> >>> I'll move the RFC to the Development state and will open a ticket to >>> follow up on the implementation. >>> >>> -Alberto G. >>> >>> On 12/9/19 8:15, Jacob Barrett wrote: >>>> +1 >>>> >>>> I echo Dan’s comments as well. >>>> >>>> Thanks for tackling this. >>>> >>>> -jake >>>> >>>> >>>>> On Sep 11, 2019, at 2:36 PM, Dan Smith <dsm...@pivotal.io> wrote: >>>>> >>>>> +1 - Ok, I think I've come around to option (a). We can go head and >>> add a >>>>> new execute(timeout, TimeUnit) method to the java API that is >> blocking. >>> We >>>>> can leave the existing execute() method alone, except for documenting >>> what >>>>> it is doing. >>>>> >>>>> I would like implement execute(timeout, TimeUnit) on the server side >> as >>>>> well. Since this Execution class is shared between both client and >>> server >>>>> APIs, it would be unfortunate to have a method on Execution that >> simply >>>>> doesn't work on the server side. >>>>> >>>>> -Dan >>>>> >>>>> >>>>>> On Thu, Sep 5, 2019 at 9:25 AM Alberto Gomez <alberto.go...@est.tech >>> wrote: >>>>>> Hi all, >>>>>> >>>>>> First of all, thanks a lot Dan and Jacob for your feedback. >>>>>> >>>>>> As we are getting close to the deadline I am adding here some >>> conclusions >>>>>> and a refined proposal in order to get some more feedback and if >>> possible >>>>>> some voting on the two alternatives proposed (or any other in between >>> if >>>>>> you feel any of them is lacking something). >>>>>> >>>>>> I also add some draft code to try to clarify a bit the more complex >> of >>> the >>>>>> alternatives. >>>>>> >>>>>> >>>>>> Proposal summary (needs a decision on which option to implement): >>>>>> >>>>>> >> ------------------------------------------------------------------------------------------- >>>>>> In order to make the API more coherent two alternatives are proposed: >>>>>> >>>>>> a) Remove the timeout from the ResultCollector::getResult() / >> document >>>>>> that the timeout has no effect, taking into account that >>>>>> Execution::execute() is always blocking. >>>>>> Additionally we could add the timeout parameter to the >>>>>> Execution::execute() method of the Java API in order to align it with >>> the >>>>>> native client APIs. This timeout would not be the read timeout on the >>>>>> socket but a timeout for the execution of the operation. >>>>>> >>>>>> b) Change the implementation of the Execution::execute() method >> without >>>>>> timeout to be non-blocking on both the Java and native APIs. This >>> change >>>>>> has backward compatibility implications, would probably bring some >>>>>> performance decrease and could pose some difficulties in the >>> implementation >>>>>> on the C++ side (in the handling of timed out operations that hold >>>>>> resources). >>>>>> >>>>>> >>>>>> The first option (a) is less risky and does not have impacts >> regarding >>>>>> backward compatibility and performance. >>>>>> >>>>>> The second one (b) is the preferred alternative in terms of the >>> expected >>>>>> behavior from the users of the API. This option is more complex to >>>>>> implement and as mentioned above has performance and backward >>> compatibility >>>>>> issues not easy to be solved. >>>>>> >>>>>> Following is a draft version of the implementation of b) on the Java >>>>>> client: >>>>>> >>>>>> >> https://github.com/Nordix/geode/commit/507a795e34c6083c129bda7e976b9223d1a893da >>>>>> Following is a draft version of the implementation of b) on the C++ >>> native >>>>>> client: >>>>>> >>>>>> >> https://github.com/apache/geode-native/commit/a03a56f229bb8d75ee71044cf6196df07f43150d >>>>>> Note that the above implementation of b) in the C++ client implies >> that >>>>>> the Execution object returned by the FunctionService cannot be >>> destroyed >>>>>> until the thread executing the function asynchronously has finished. >>> If the >>>>>> function times out, the Execution object must be kept until the >> thread >>>>>> finishes. >>>>>> >>>>>> >>>>>> Other considerations >>>>>> ------------------------- >>>>>> >>>>>> * Currently, in the function execution Java client there is not a >>>>>> possibility to set a timeout for the execution of functions. The >>> closest to >>>>>> this is the read timeout that may be set globally for function >>> executions >>>>>> but this is not really a timeout for operations. >>>>>> >>>>>> * Even if the API for function execution is the same on clients and >>>>>> servers, the implementation is different between them. On the >> clients, >>> the >>>>>> execute() methods are blocking while on the servers it is >> non-blocking >>> and >>>>>> the invoker of the function blocks on the getResult() method of the >>>>>> ResultCollector returned by the execute() method. >>>>>> Even if having both blocking and non-blocking implementation of >>> execute() >>>>>> in both clients and servers sounds desirable from the point of view >> of >>>>>> orthogonality, this could bring complications in terms of backward >>>>>> compatibility. Besides, a need for a blocking version of function >>> execution >>>>>> has not been found. >>>>>> >>>>>> -Alberto G. >>>>>> >>>>>> On 29/8/19 16:48, Alberto Gomez wrote: >>>>>> >>>>>> Sorry, some corrections on my comments after revisiting the native >>>>>> client code. >>>>>> >>>>>> When I said that the timeout used in the execution() method (by means >>> of >>>>>> a system property) was to set a read timeout on the socket, I was >> only >>>>>> talking about the Java client. In the case of the native clients, the >>>>>> timeout set in the execute() method is not translated into a socket >>>>>> timeout but it is the time to wait for the operation to complete, >> i.e., >>>>>> to get all the results back. >>>>>> >>>>>> Things being so, I would change my proposal to: >>>>>> >>>>>> - Change the implementation of execute() on both Java and native >>> clients >>>>>> to be non-blocking (having the blocking/non-blocking behavior >>>>>> configurable in the release this is introduced and leaving only the >>>>>> non-blocking behavior in the next release). >>>>>> >>>>>> - Either remove the execute() with timeout methods in the native >>> clients >>>>>> (with a deprecation release) or implement the execute(timeout) method >>> in >>>>>> the Java client to be blocking (to work as the native client does >>>>>> today). In case the method times out, the connection will not be >>> closed. >>>>>> If the operation times out due to the socket timeout (system >> property), >>>>>> then the connection will be closed as it is now done in the Java >>> client. >>>>>> - Do not implement the blocking execute(timeout) method on the server >>>>>> and leave the current execute() implementation on the server as it is >>>>>> (non-blocking) >>>>>> >>>>>> Does this make sense? >>>>>> >>>>>> -Alberto >>>>>> >>>>>> On 29/8/19 12:56, Alberto Gómez wrote: >>>>>> >>>>>> >>>>>> Hi Dan, >>>>>> >>>>>> Discussing these matters by e-mail is getting tricky. >>>>>> >>>>>> Let's see if I understand you correctly and also if I am being clear >>>>>> enough. >>>>>> >>>>>> Please, see my comments inline. >>>>>> >>>>>> On 29/8/19 0:49, Dan Smith wrote: >>>>>> >>>>>> >>>>>> Sorry for the slow response, I've been trying to decide what I think >>>>>> is the >>>>>> right approach here. >>>>>> >>>>>> For (1) - conceptually, I don't have a problem with having both >>> blocking >>>>>> and non blocking methods on Execution. So adding blocking versions of >>>>>> execute() with a timeout seems ok. But I do think if we add them we >>>>>> need to >>>>>> implement them on both the client and the server to behave the same >>> way. >>>>>> That shouldn't be too hard on the server since execute(timeout) can >>> just >>>>>> call getResult(timeout) internally. >>>>>> >>>>>> >>>>>> >>>>>> We have to take into account that, currently, the timeout in >> execute() >>>>>> is not the same thing as the timeout in getResult(). >>>>>> >>>>>> On the one hand, the timeout set in execute() (via System property in >>>>>> the Java client, and with a parameter in the native client) sets a >>>>>> readtimeout on the socket which just means that if nothing is read >>>>>> from the socket after sending the request to the server for the given >>>>>> timeout, the corresponding exception will be thrown. It looks to me >>>>>> more like a protection against possible communication failures rather >>>>>> than a mechanism to decide if results took too long to be provided. >> So >>>>>> I would not link the presence of the timeout parameter in the method >>>>>> to the nature of the method (blocking or non-blocking). I think we >>>>>> could have this read timeout set and at the same time keep that >> method >>>>>> as non-blocking. >>>>>> >>>>>> On the other hand, the timeout set in getResult() is a timeout to >> wait >>>>>> for all the results to be received from the moment the method is >>> invoked. >>>>>> Therefore, I would not implement the blocking version of execute() on >>>>>> the server by calling getResult() at the end. >>>>>> >>>>>> Apart from that, I doubt if it would make sense to set this >>>>>> readTimeout in the execute() methods from servers given that the >>>>>> communication is very different to the one done with clients. I also >>>>>> doubt that anyone would be interested in the blocking version of >>>>>> execute() on the server. >>>>>> >>>>>> My proposal was to add the readtimeout to the execute() methods in >> the >>>>>> Java client in order to align the Java client and the native client. >>>>>> This change would be independent to the decision we make regarding >> the >>>>>> change of execute() to non-blocking. To achieve this alignment, >>>>>> alternatively, we could remove the timeout parameter in execute() >> from >>>>>> the native clients and have it as a global property for the client to >>>>>> be set by whatever mechanism available as it is done in the Java >>>>>> client today. >>>>>> >>>>>> Were you proposing that the execute() methods with timeout were >>>>>> blocking and the ones without timeout non-blocking? Not sure if this >>>>>> is something you meant. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> For (2) - Although I think the original authors of this API probably >>> did >>>>>> intend for execute() to be non-blocking, the fact is that it does >>>>>> block on >>>>>> the client and most users are probably calling execute from a client. >>>>>> So I >>>>>> do agree we probably shouldn't change the behavior at this point. >>>>>> Perhaps >>>>>> we can just clearly document the current behavior of execute() as >>>>>> part of >>>>>> adding these new methods. Going forward we can add new methods to >>>>>> Execution >>>>>> that are clearly non-blocking (submit?, invoke?) and implement them >>>>>> consistently on *both* the client in the server, but that doesn't >>>>>> have to >>>>>> be in the scope of this proposal. >>>>>> >>>>>> >>>>>> >>>>>> The problem I see with adding new non-blocking methods >> (new/submit...) >>>>>> is that it would be a solution for the current users of the client >>>>>> regarding backwards compatibility. But, on the server side, we would >>>>>> have to move the current logic of execute() which is non-blocking to >>>>>> the new methods and change the current execute() behavior to >> blocking. >>>>>> We would not impact the users of the client but we would impact the >>>>>> users of the server. >>>>>> >>>>>> Again, I would propose to aim at: >>>>>> >>>>>> a) either leave execute() on the client as blocking >>>>>> >>>>>> b) or change execute() on the client to be non-blocking but on the >>>>>> Geode release this is introduced, have it configurable. The default >>>>>> behavior would be blocking (deprecated behavior) but could be set to >>>>>> non-blocking with a system property. On the next release, the >> blocking >>>>>> behavior would be removed. >>>>>> >>>>>> - Alberto G. >>>>>> >>>>>> >>>>>> >>>>>> -Dan >>>>>> >>>>>> On Fri, Aug 23, 2019 at 4:28 AM Alberto Gomez <alberto.go...@est.tech >>>>>>> <mailto:alberto.go...@est.tech> >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> Hi Jake, >>>>>> >>>>>> Please, see my answers below. >>>>>> >>>>>> On 22/8/19 21:16, Jacob Barrett wrote: >>>>>> >>>>>> >>>>>> On Aug 21, 2019, at 8:49 AM, Alberto Gomez <alberto.go...@est.tech >>>>>>> <mailto:alberto.go...@est.tech> >>>>>> wrote: >>>>>> >>>>>> >>>>>> 2. Timeout in ResultCollector::getResult() and Execution::execute() >>>>>> >>>>>> >>>>>> blocking >>>>>> >>>>>> >>>>>> Regarding the timeout in the ResultCollector::getResult() method >>>>>> >>>>>> >>>>>> problem and the blocking/non-blocking confusion for >>>>>> Execution::execute() >>>>>> two alternatives are considered: >>>>>> >>>>>> >>>>>> a) Remove the possibility of setting a timeout on the >>>>>> >>>>>> >>>>>> ResultCollector::getResult() method on the client side as with the >>>>>> current >>>>>> client implementation it is useless. This could be done by removing >> the >>>>>> method with the timeout parameter from the public API. >>>>>> >>>>>> >>>>>> It would be advisable to make explicit in the documentation that the >>>>>> >>>>>> >>>>>> getResult() method does not wait for results to arrive as that >>>>>> should have >>>>>> already been done in the Execution::execute() invocation. >>>>>> >>>>>> >>>>>> This alternative is very simple and would keep things pretty much as >>>>>> >>>>>> >>>>>> they are today. >>>>>> >>>>>> >>>>>> To be honest I think approach would go against what a user “thinks” >> is >>>>>> >>>>>> going on. Given that there hasn’t been a timeout on execute I >>>>>> assumed it >>>>>> was asynchronous and that the getResult blocked until timeout or >>>>>> results >>>>>> arrived. Typically these two calls were done back to back. >>>>>> >>>>>> You are right if you look at the Java client. But if you look at the >>>>>> native clients, the timeout is there, both in the C++ and the C# >> cases >>>>>> which would indicate that it is a blocking call. >>>>>> >>>>>> See >>>>>> >>>>>> >>>>>> >> https://geode.apache.org/releases/latest/cppdocs/a00725.html#aa918a5e193745950e12ca4feb9c5d776 >>>>>> and >>>>>> >>>>>> >>>>>> >> https://geode.apache.org/releases/latest/dotnetdocs/a00882.html#ae0a814049482ca424f89c13ab1099c3d >>>>>> >>>>>> >>>>>> b) Transform the Execution::execute() method on the client side >>>>>> into a >>>>>> >>>>>> >>>>>> non-blocking method. >>>>>> >>>>>> >>>>>> This alternative is more complex and requires changes in all the >>>>>> >>>>>> >>>>>> clients. Apart from that it has implications on the public client >>>>>> API it >>>>>> requires moving the exceptions offered currently by the >>>>>> Execution::execute() method to the ResultCollector::getResult() and >> new >>>>>> threads will have to be managed. >>>>>> >>>>>> >>>>>> I think this is more in line with what users expect is going on >>>>>> based on >>>>>> >>>>>> >>>>>> the current API, I know I have. If were are going to make any change >> I >>>>>> think this is the one. I don’t think the behavior change is a >>>>>> problem since >>>>>> it's what is expected to be happening anyway. >>>>>> >>>>>> >>>>>> An outline of a possible implementation for option b) would be: >>>>>> >>>>>> * Instead of invoking the ServerRegionProxy::executeFunction() >>>>>> >>>>>> >>>>>> directly as it is done today, create a Future that invokes this >>>>>> method and >>>>>> returns the resultCollector passed as parameter. >>>>>> >>>>>> >>>>>> Do you really think we need to introduce Futures here into the API? I >>>>>> >>>>>> >>>>>> feel like the ResultCollector acts as the Future. I don’t think any >>>>>> change >>>>>> needs to be made to the API in this regard. The ResultCollector >>>>>> implementation would just simply block as implied by the api for the >>>>>> timeout period. I would change the method to have units though and >>>>>> deprecate the method without units. >>>>>> >>>>>> I did not mean to introduce Futures in the API. My idea was to use >> Java >>>>>> Futures internally so that the ResultCollector returned by the >>>>>> getResult() method would wrap the Java Future with the >> ResultCollector >>>>>> that would actually hold the result. >>>>>> >>>>>> An alternative would be to leave the logic of blocking to each >>>>>> implementation ResultCollector. In the case of the >>>>>> DefaultResultCollector, we could use a CountDownLatch that would be >>>>>> decremented when endResults() is called and that would make >> getResult() >>>>>> block by using CountDown.await(...). >>>>>> >>>>>> The advantage of using Futures internally is that the blocking logic >>>>>> would not have to be implemented on every ResultCollector >>>>>> implementation. >>>>>> >>>>>> >>>>>> >>>>>> -Jake >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> - Alberto >>>>>> >>>>>> >>>>>>