+1 I echo Dan’s comments as well.
Thanks for tackling this. -jake > On Sep 11, 2019, at 2:36 PM, Dan Smith <dsm...@pivotal.io> wrote: > > +1 - Ok, I think I've come around to option (a). We can go head and add a > new execute(timeout, TimeUnit) method to the java API that is blocking. We > can leave the existing execute() method alone, except for documenting what > it is doing. > > I would like implement execute(timeout, TimeUnit) on the server side as > well. Since this Execution class is shared between both client and server > APIs, it would be unfortunate to have a method on Execution that simply > doesn't work on the server side. > > -Dan > > >> On Thu, Sep 5, 2019 at 9:25 AM Alberto Gomez <alberto.go...@est.tech> wrote: >> >> Hi all, >> >> First of all, thanks a lot Dan and Jacob for your feedback. >> >> As we are getting close to the deadline I am adding here some conclusions >> and a refined proposal in order to get some more feedback and if possible >> some voting on the two alternatives proposed (or any other in between if >> you feel any of them is lacking something). >> >> I also add some draft code to try to clarify a bit the more complex of the >> alternatives. >> >> >> Proposal summary (needs a decision on which option to implement): >> >> ------------------------------------------------------------------------------------------- >> >> In order to make the API more coherent two alternatives are proposed: >> >> a) Remove the timeout from the ResultCollector::getResult() / document >> that the timeout has no effect, taking into account that >> Execution::execute() is always blocking. >> Additionally we could add the timeout parameter to the >> Execution::execute() method of the Java API in order to align it with the >> native client APIs. This timeout would not be the read timeout on the >> socket but a timeout for the execution of the operation. >> >> b) Change the implementation of the Execution::execute() method without >> timeout to be non-blocking on both the Java and native APIs. This change >> has backward compatibility implications, would probably bring some >> performance decrease and could pose some difficulties in the implementation >> on the C++ side (in the handling of timed out operations that hold >> resources). >> >> >> The first option (a) is less risky and does not have impacts regarding >> backward compatibility and performance. >> >> The second one (b) is the preferred alternative in terms of the expected >> behavior from the users of the API. This option is more complex to >> implement and as mentioned above has performance and backward compatibility >> issues not easy to be solved. >> >> Following is a draft version of the implementation of b) on the Java >> client: >> >> https://github.com/Nordix/geode/commit/507a795e34c6083c129bda7e976b9223d1a893da >> >> Following is a draft version of the implementation of b) on the C++ native >> client: >> >> https://github.com/apache/geode-native/commit/a03a56f229bb8d75ee71044cf6196df07f43150d >> >> Note that the above implementation of b) in the C++ client implies that >> the Execution object returned by the FunctionService cannot be destroyed >> until the thread executing the function asynchronously has finished. If the >> function times out, the Execution object must be kept until the thread >> finishes. >> >> >> Other considerations >> ------------------------- >> >> * Currently, in the function execution Java client there is not a >> possibility to set a timeout for the execution of functions. The closest to >> this is the read timeout that may be set globally for function executions >> but this is not really a timeout for operations. >> >> * Even if the API for function execution is the same on clients and >> servers, the implementation is different between them. On the clients, the >> execute() methods are blocking while on the servers it is non-blocking and >> the invoker of the function blocks on the getResult() method of the >> ResultCollector returned by the execute() method. >> Even if having both blocking and non-blocking implementation of execute() >> in both clients and servers sounds desirable from the point of view of >> orthogonality, this could bring complications in terms of backward >> compatibility. Besides, a need for a blocking version of function execution >> has not been found. >> >> -Alberto G. >> >> On 29/8/19 16:48, Alberto Gomez wrote: >> >> Sorry, some corrections on my comments after revisiting the native >> client code. >> >> When I said that the timeout used in the execution() method (by means of >> a system property) was to set a read timeout on the socket, I was only >> talking about the Java client. In the case of the native clients, the >> timeout set in the execute() method is not translated into a socket >> timeout but it is the time to wait for the operation to complete, i.e., >> to get all the results back. >> >> Things being so, I would change my proposal to: >> >> - Change the implementation of execute() on both Java and native clients >> to be non-blocking (having the blocking/non-blocking behavior >> configurable in the release this is introduced and leaving only the >> non-blocking behavior in the next release). >> >> - Either remove the execute() with timeout methods in the native clients >> (with a deprecation release) or implement the execute(timeout) method in >> the Java client to be blocking (to work as the native client does >> today). In case the method times out, the connection will not be closed. >> If the operation times out due to the socket timeout (system property), >> then the connection will be closed as it is now done in the Java client. >> >> - Do not implement the blocking execute(timeout) method on the server >> and leave the current execute() implementation on the server as it is >> (non-blocking) >> >> Does this make sense? >> >> -Alberto >> >> On 29/8/19 12:56, Alberto Gómez wrote: >> >> >> Hi Dan, >> >> Discussing these matters by e-mail is getting tricky. >> >> Let's see if I understand you correctly and also if I am being clear >> enough. >> >> Please, see my comments inline. >> >> On 29/8/19 0:49, Dan Smith wrote: >> >> >> Sorry for the slow response, I've been trying to decide what I think >> is the >> right approach here. >> >> For (1) - conceptually, I don't have a problem with having both blocking >> and non blocking methods on Execution. So adding blocking versions of >> execute() with a timeout seems ok. But I do think if we add them we >> need to >> implement them on both the client and the server to behave the same way. >> That shouldn't be too hard on the server since execute(timeout) can just >> call getResult(timeout) internally. >> >> >> >> We have to take into account that, currently, the timeout in execute() >> is not the same thing as the timeout in getResult(). >> >> On the one hand, the timeout set in execute() (via System property in >> the Java client, and with a parameter in the native client) sets a >> readtimeout on the socket which just means that if nothing is read >> from the socket after sending the request to the server for the given >> timeout, the corresponding exception will be thrown. It looks to me >> more like a protection against possible communication failures rather >> than a mechanism to decide if results took too long to be provided. So >> I would not link the presence of the timeout parameter in the method >> to the nature of the method (blocking or non-blocking). I think we >> could have this read timeout set and at the same time keep that method >> as non-blocking. >> >> On the other hand, the timeout set in getResult() is a timeout to wait >> for all the results to be received from the moment the method is invoked. >> >> Therefore, I would not implement the blocking version of execute() on >> the server by calling getResult() at the end. >> >> Apart from that, I doubt if it would make sense to set this >> readTimeout in the execute() methods from servers given that the >> communication is very different to the one done with clients. I also >> doubt that anyone would be interested in the blocking version of >> execute() on the server. >> >> My proposal was to add the readtimeout to the execute() methods in the >> Java client in order to align the Java client and the native client. >> This change would be independent to the decision we make regarding the >> change of execute() to non-blocking. To achieve this alignment, >> alternatively, we could remove the timeout parameter in execute() from >> the native clients and have it as a global property for the client to >> be set by whatever mechanism available as it is done in the Java >> client today. >> >> Were you proposing that the execute() methods with timeout were >> blocking and the ones without timeout non-blocking? Not sure if this >> is something you meant. >> >> >> >> >> For (2) - Although I think the original authors of this API probably did >> intend for execute() to be non-blocking, the fact is that it does >> block on >> the client and most users are probably calling execute from a client. >> So I >> do agree we probably shouldn't change the behavior at this point. >> Perhaps >> we can just clearly document the current behavior of execute() as >> part of >> adding these new methods. Going forward we can add new methods to >> Execution >> that are clearly non-blocking (submit?, invoke?) and implement them >> consistently on *both* the client in the server, but that doesn't >> have to >> be in the scope of this proposal. >> >> >> >> The problem I see with adding new non-blocking methods (new/submit...) >> is that it would be a solution for the current users of the client >> regarding backwards compatibility. But, on the server side, we would >> have to move the current logic of execute() which is non-blocking to >> the new methods and change the current execute() behavior to blocking. >> We would not impact the users of the client but we would impact the >> users of the server. >> >> Again, I would propose to aim at: >> >> a) either leave execute() on the client as blocking >> >> b) or change execute() on the client to be non-blocking but on the >> Geode release this is introduced, have it configurable. The default >> behavior would be blocking (deprecated behavior) but could be set to >> non-blocking with a system property. On the next release, the blocking >> behavior would be removed. >> >> - Alberto G. >> >> >> >> -Dan >> >> On Fri, Aug 23, 2019 at 4:28 AM Alberto Gomez <alberto.go...@est.tech >>> <mailto:alberto.go...@est.tech> >> wrote: >> >> >> >> Hi Jake, >> >> Please, see my answers below. >> >> On 22/8/19 21:16, Jacob Barrett wrote: >> >> >> On Aug 21, 2019, at 8:49 AM, Alberto Gomez <alberto.go...@est.tech >>> <mailto:alberto.go...@est.tech> >> >> >> wrote: >> >> >> 2. Timeout in ResultCollector::getResult() and Execution::execute() >> >> >> blocking >> >> >> Regarding the timeout in the ResultCollector::getResult() method >> >> >> problem and the blocking/non-blocking confusion for >> Execution::execute() >> two alternatives are considered: >> >> >> a) Remove the possibility of setting a timeout on the >> >> >> ResultCollector::getResult() method on the client side as with the >> current >> client implementation it is useless. This could be done by removing the >> method with the timeout parameter from the public API. >> >> >> It would be advisable to make explicit in the documentation that the >> >> >> getResult() method does not wait for results to arrive as that >> should have >> already been done in the Execution::execute() invocation. >> >> >> This alternative is very simple and would keep things pretty much as >> >> >> they are today. >> >> >> To be honest I think approach would go against what a user “thinks” is >> >> >> going on. Given that there hasn’t been a timeout on execute I >> assumed it >> was asynchronous and that the getResult blocked until timeout or >> results >> arrived. Typically these two calls were done back to back. >> >> You are right if you look at the Java client. But if you look at the >> native clients, the timeout is there, both in the C++ and the C# cases >> which would indicate that it is a blocking call. >> >> See >> >> >> https://geode.apache.org/releases/latest/cppdocs/a00725.html#aa918a5e193745950e12ca4feb9c5d776 >> >> and >> >> >> https://geode.apache.org/releases/latest/dotnetdocs/a00882.html#ae0a814049482ca424f89c13ab1099c3d >> >> >> >> b) Transform the Execution::execute() method on the client side >> into a >> >> >> non-blocking method. >> >> >> This alternative is more complex and requires changes in all the >> >> >> clients. Apart from that it has implications on the public client >> API it >> requires moving the exceptions offered currently by the >> Execution::execute() method to the ResultCollector::getResult() and new >> threads will have to be managed. >> >> >> I think this is more in line with what users expect is going on >> based on >> >> >> the current API, I know I have. If were are going to make any change I >> think this is the one. I don’t think the behavior change is a >> problem since >> it's what is expected to be happening anyway. >> >> >> An outline of a possible implementation for option b) would be: >> >> * Instead of invoking the ServerRegionProxy::executeFunction() >> >> >> directly as it is done today, create a Future that invokes this >> method and >> returns the resultCollector passed as parameter. >> >> >> Do you really think we need to introduce Futures here into the API? I >> >> >> feel like the ResultCollector acts as the Future. I don’t think any >> change >> needs to be made to the API in this regard. The ResultCollector >> implementation would just simply block as implied by the api for the >> timeout period. I would change the method to have units though and >> deprecate the method without units. >> >> I did not mean to introduce Futures in the API. My idea was to use Java >> Futures internally so that the ResultCollector returned by the >> getResult() method would wrap the Java Future with the ResultCollector >> that would actually hold the result. >> >> An alternative would be to leave the logic of blocking to each >> implementation ResultCollector. In the case of the >> DefaultResultCollector, we could use a CountDownLatch that would be >> decremented when endResults() is called and that would make getResult() >> block by using CountDown.await(...). >> >> The advantage of using Futures internally is that the blocking logic >> would not have to be implemented on every ResultCollector >> implementation. >> >> >> >> -Jake >> >> >> >> >> - Alberto >> >> >>