Hi all, First of all, thanks a lot Dan and Jacob for your feedback.
As we are getting close to the deadline I am adding here some conclusions and a refined proposal in order to get some more feedback and if possible some voting on the two alternatives proposed (or any other in between if you feel any of them is lacking something). I also add some draft code to try to clarify a bit the more complex of the alternatives. Proposal summary (needs a decision on which option to implement): ------------------------------------------------------------------------------------------- In order to make the API more coherent two alternatives are proposed: a) Remove the timeout from the ResultCollector::getResult() / document that the timeout has no effect, taking into account that Execution::execute() is always blocking. Additionally we could add the timeout parameter to the Execution::execute() method of the Java API in order to align it with the native client APIs. This timeout would not be the read timeout on the socket but a timeout for the execution of the operation. b) Change the implementation of the Execution::execute() method without timeout to be non-blocking on both the Java and native APIs. This change has backward compatibility implications, would probably bring some performance decrease and could pose some difficulties in the implementation on the C++ side (in the handling of timed out operations that hold resources). The first option (a) is less risky and does not have impacts regarding backward compatibility and performance. The second one (b) is the preferred alternative in terms of the expected behavior from the users of the API. This option is more complex to implement and as mentioned above has performance and backward compatibility issues not easy to be solved. Following is a draft version of the implementation of b) on the Java client: https://github.com/Nordix/geode/commit/507a795e34c6083c129bda7e976b9223d1a893da Following is a draft version of the implementation of b) on the C++ native client: https://github.com/apache/geode-native/commit/a03a56f229bb8d75ee71044cf6196df07f43150d Note that the above implementation of b) in the C++ client implies that the Execution object returned by the FunctionService cannot be destroyed until the thread executing the function asynchronously has finished. If the function times out, the Execution object must be kept until the thread finishes. Other considerations ------------------------- * Currently, in the function execution Java client there is not a possibility to set a timeout for the execution of functions. The closest to this is the read timeout that may be set globally for function executions but this is not really a timeout for operations. * Even if the API for function execution is the same on clients and servers, the implementation is different between them. On the clients, the execute() methods are blocking while on the servers it is non-blocking and the invoker of the function blocks on the getResult() method of the ResultCollector returned by the execute() method. Even if having both blocking and non-blocking implementation of execute() in both clients and servers sounds desirable from the point of view of orthogonality, this could bring complications in terms of backward compatibility. Besides, a need for a blocking version of function execution has not been found. -Alberto G. On 29/8/19 16:48, Alberto Gomez wrote: Sorry, some corrections on my comments after revisiting the native client code. When I said that the timeout used in the execution() method (by means of a system property) was to set a read timeout on the socket, I was only talking about the Java client. In the case of the native clients, the timeout set in the execute() method is not translated into a socket timeout but it is the time to wait for the operation to complete, i.e., to get all the results back. Things being so, I would change my proposal to: - Change the implementation of execute() on both Java and native clients to be non-blocking (having the blocking/non-blocking behavior configurable in the release this is introduced and leaving only the non-blocking behavior in the next release). - Either remove the execute() with timeout methods in the native clients (with a deprecation release) or implement the execute(timeout) method in the Java client to be blocking (to work as the native client does today). In case the method times out, the connection will not be closed. If the operation times out due to the socket timeout (system property), then the connection will be closed as it is now done in the Java client. - Do not implement the blocking execute(timeout) method on the server and leave the current execute() implementation on the server as it is (non-blocking) Does this make sense? -Alberto On 29/8/19 12:56, Alberto Gómez wrote: Hi Dan, Discussing these matters by e-mail is getting tricky. Let's see if I understand you correctly and also if I am being clear enough. Please, see my comments inline. On 29/8/19 0:49, Dan Smith wrote: Sorry for the slow response, I've been trying to decide what I think is the right approach here. For (1) - conceptually, I don't have a problem with having both blocking and non blocking methods on Execution. So adding blocking versions of execute() with a timeout seems ok. But I do think if we add them we need to implement them on both the client and the server to behave the same way. That shouldn't be too hard on the server since execute(timeout) can just call getResult(timeout) internally. We have to take into account that, currently, the timeout in execute() is not the same thing as the timeout in getResult(). On the one hand, the timeout set in execute() (via System property in the Java client, and with a parameter in the native client) sets a readtimeout on the socket which just means that if nothing is read from the socket after sending the request to the server for the given timeout, the corresponding exception will be thrown. It looks to me more like a protection against possible communication failures rather than a mechanism to decide if results took too long to be provided. So I would not link the presence of the timeout parameter in the method to the nature of the method (blocking or non-blocking). I think we could have this read timeout set and at the same time keep that method as non-blocking. On the other hand, the timeout set in getResult() is a timeout to wait for all the results to be received from the moment the method is invoked. Therefore, I would not implement the blocking version of execute() on the server by calling getResult() at the end. Apart from that, I doubt if it would make sense to set this readTimeout in the execute() methods from servers given that the communication is very different to the one done with clients. I also doubt that anyone would be interested in the blocking version of execute() on the server. My proposal was to add the readtimeout to the execute() methods in the Java client in order to align the Java client and the native client. This change would be independent to the decision we make regarding the change of execute() to non-blocking. To achieve this alignment, alternatively, we could remove the timeout parameter in execute() from the native clients and have it as a global property for the client to be set by whatever mechanism available as it is done in the Java client today. Were you proposing that the execute() methods with timeout were blocking and the ones without timeout non-blocking? Not sure if this is something you meant. For (2) - Although I think the original authors of this API probably did intend for execute() to be non-blocking, the fact is that it does block on the client and most users are probably calling execute from a client. So I do agree we probably shouldn't change the behavior at this point. Perhaps we can just clearly document the current behavior of execute() as part of adding these new methods. Going forward we can add new methods to Execution that are clearly non-blocking (submit?, invoke?) and implement them consistently on *both* the client in the server, but that doesn't have to be in the scope of this proposal. The problem I see with adding new non-blocking methods (new/submit...) is that it would be a solution for the current users of the client regarding backwards compatibility. But, on the server side, we would have to move the current logic of execute() which is non-blocking to the new methods and change the current execute() behavior to blocking. We would not impact the users of the client but we would impact the users of the server. Again, I would propose to aim at: a) either leave execute() on the client as blocking b) or change execute() on the client to be non-blocking but on the Geode release this is introduced, have it configurable. The default behavior would be blocking (deprecated behavior) but could be set to non-blocking with a system property. On the next release, the blocking behavior would be removed. - Alberto G. -Dan On Fri, Aug 23, 2019 at 4:28 AM Alberto Gomez <alberto.go...@est.tech><mailto:alberto.go...@est.tech> wrote: Hi Jake, Please, see my answers below. On 22/8/19 21:16, Jacob Barrett wrote: On Aug 21, 2019, at 8:49 AM, Alberto Gomez <alberto.go...@est.tech><mailto:alberto.go...@est.tech> wrote: 2. Timeout in ResultCollector::getResult() and Execution::execute() blocking Regarding the timeout in the ResultCollector::getResult() method problem and the blocking/non-blocking confusion for Execution::execute() two alternatives are considered: a) Remove the possibility of setting a timeout on the ResultCollector::getResult() method on the client side as with the current client implementation it is useless. This could be done by removing the method with the timeout parameter from the public API. It would be advisable to make explicit in the documentation that the getResult() method does not wait for results to arrive as that should have already been done in the Execution::execute() invocation. This alternative is very simple and would keep things pretty much as they are today. To be honest I think approach would go against what a user “thinks” is going on. Given that there hasn’t been a timeout on execute I assumed it was asynchronous and that the getResult blocked until timeout or results arrived. Typically these two calls were done back to back. You are right if you look at the Java client. But if you look at the native clients, the timeout is there, both in the C++ and the C# cases which would indicate that it is a blocking call. See https://geode.apache.org/releases/latest/cppdocs/a00725.html#aa918a5e193745950e12ca4feb9c5d776 and https://geode.apache.org/releases/latest/dotnetdocs/a00882.html#ae0a814049482ca424f89c13ab1099c3d b) Transform the Execution::execute() method on the client side into a non-blocking method. This alternative is more complex and requires changes in all the clients. Apart from that it has implications on the public client API it requires moving the exceptions offered currently by the Execution::execute() method to the ResultCollector::getResult() and new threads will have to be managed. I think this is more in line with what users expect is going on based on the current API, I know I have. If were are going to make any change I think this is the one. I don’t think the behavior change is a problem since it's what is expected to be happening anyway. An outline of a possible implementation for option b) would be: * Instead of invoking the ServerRegionProxy::executeFunction() directly as it is done today, create a Future that invokes this method and returns the resultCollector passed as parameter. Do you really think we need to introduce Futures here into the API? I feel like the ResultCollector acts as the Future. I don’t think any change needs to be made to the API in this regard. The ResultCollector implementation would just simply block as implied by the api for the timeout period. I would change the method to have units though and deprecate the method without units. I did not mean to introduce Futures in the API. My idea was to use Java Futures internally so that the ResultCollector returned by the getResult() method would wrap the Java Future with the ResultCollector that would actually hold the result. An alternative would be to leave the logic of blocking to each implementation ResultCollector. In the case of the DefaultResultCollector, we could use a CountDownLatch that would be decremented when endResults() is called and that would make getResult() block by using CountDown.await(...). The advantage of using Futures internally is that the blocking logic would not have to be implemented on every ResultCollector implementation. -Jake - Alberto