Re: [DISCUSS] Improvements on client function execution API

Alberto Gomez Tue, 17 Sep 2019 00:48:45 -0700

Thanks Anil. This is how this started, trying to pop the timeout up from 
a system property to a parameter in the Execution methods.


-Alberto G.

On 16/9/19 21:37, Anilkumar Gingade wrote:
> Alberto,
>
> Sorry for late response....Currently geode (java client) does provide
> ability to set function timeout; but its through internal system property
> "gemfire.CLIENT_FUNCTION_TIMEOUT"....Some of the tests using this property
> are Tests extending "FunctionRetryTestBase".
>
> Since this is through internal system property; we need a cleaner api to
> achieve the timeout behavior.
>
> +1 for the option a) proposed.
>
> -Anil
>
>
>
>
> On Mon, Sep 16, 2019 at 9:03 AM Dan Smith <[email protected]> wrote:
>
>> Thanks for following up on this!
>>
>> -Dan
>>
>> On Mon, Sep 16, 2019 at 3:07 AM Alberto Gomez <[email protected]>
>> wrote:
>>
>>> Thanks for the feedback. I also give a +1 to option a) including Dan's
>>> comments.
>>>
>>> I'll move the RFC to the Development state and will open a ticket to
>>> follow up on the implementation.
>>>
>>> -Alberto G.
>>>
>>> On 12/9/19 8:15, Jacob Barrett wrote:
>>>> +1
>>>>
>>>> I echo Dan’s comments as well.
>>>>
>>>> Thanks for tackling this.
>>>>
>>>> -jake
>>>>
>>>>
>>>>> On Sep 11, 2019, at 2:36 PM, Dan Smith <[email protected]> wrote:
>>>>>
>>>>> +1 - Ok, I think I've come around to option (a). We can go head and
>>> add a
>>>>> new execute(timeout, TimeUnit) method to the java API that is
>> blocking.
>>> We
>>>>> can leave the existing execute() method alone, except for documenting
>>> what
>>>>> it is doing.
>>>>>
>>>>> I would like implement execute(timeout,  TimeUnit) on the server side
>> as
>>>>> well. Since this Execution class is shared between both client and
>>> server
>>>>> APIs, it would be unfortunate to have a method on Execution that
>> simply
>>>>> doesn't work on the server side.
>>>>>
>>>>> -Dan
>>>>>
>>>>>
>>>>>> On Thu, Sep 5, 2019 at 9:25 AM Alberto Gomez <[email protected]
>>> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> First of all, thanks a lot Dan and Jacob for your feedback.
>>>>>>
>>>>>> As we are getting close to the deadline I am adding here some
>>> conclusions
>>>>>> and a refined proposal in order to get some more feedback and if
>>> possible
>>>>>> some voting on the two alternatives proposed (or any other in between
>>> if
>>>>>> you feel any of them is lacking something).
>>>>>>
>>>>>> I also add some draft code to try to clarify a bit the more complex
>> of
>>> the
>>>>>> alternatives.
>>>>>>
>>>>>>
>>>>>> Proposal summary (needs a decision on which option to implement):
>>>>>>
>>>>>>
>> -------------------------------------------------------------------------------------------
>>>>>> In order to make the API more coherent two alternatives are proposed:
>>>>>>
>>>>>> a) Remove the timeout from the ResultCollector::getResult() /
>> document
>>>>>> that the timeout has no effect, taking into account that
>>>>>> Execution::execute() is always blocking.
>>>>>> Additionally we could add the timeout parameter to the
>>>>>> Execution::execute() method of the Java API in order to align it with
>>> the
>>>>>> native client APIs. This timeout would not be the read timeout on the
>>>>>> socket but a timeout for the execution of the operation.
>>>>>>
>>>>>> b) Change the implementation of the Execution::execute() method
>> without
>>>>>> timeout to be non-blocking on both the Java and native APIs. This
>>> change
>>>>>> has backward compatibility implications, would probably bring some
>>>>>> performance decrease and could pose some difficulties in the
>>> implementation
>>>>>> on the C++ side (in the  handling of timed out operations that hold
>>>>>> resources).
>>>>>>
>>>>>>
>>>>>> The first option (a) is less risky and does not have impacts
>> regarding
>>>>>> backward compatibility and performance.
>>>>>>
>>>>>> The second one (b) is the preferred alternative in terms of the
>>> expected
>>>>>> behavior from the users of the API. This option is more complex to
>>>>>> implement and as mentioned above has performance and backward
>>> compatibility
>>>>>> issues not easy to be solved.
>>>>>>
>>>>>> Following is a draft version of the implementation of b) on the Java
>>>>>> client:
>>>>>>
>>>>>>
>> https://github.com/Nordix/geode/commit/507a795e34c6083c129bda7e976b9223d1a893da
>>>>>> Following is a draft version of the implementation of b) on the C++
>>> native
>>>>>> client:
>>>>>>
>>>>>>
>> https://github.com/apache/geode-native/commit/a03a56f229bb8d75ee71044cf6196df07f43150d
>>>>>> Note that the above implementation of b) in the C++ client implies
>> that
>>>>>> the Execution object returned by the FunctionService cannot be
>>> destroyed
>>>>>> until the thread executing the function asynchronously has finished.
>>> If the
>>>>>> function times out, the Execution object must be kept until the
>> thread
>>>>>> finishes.
>>>>>>
>>>>>>
>>>>>> Other considerations
>>>>>> -------------------------
>>>>>>
>>>>>> * Currently, in the function execution Java client there is not a
>>>>>> possibility to set a timeout for the execution of functions. The
>>> closest to
>>>>>> this is the read timeout that may be set globally for function
>>> executions
>>>>>> but this is not really a timeout for operations.
>>>>>>
>>>>>> * Even if the API for function execution is the same on clients and
>>>>>> servers, the implementation is different between them. On the
>> clients,
>>> the
>>>>>> execute() methods are blocking while on the servers it is
>> non-blocking
>>> and
>>>>>> the invoker of the function blocks on the getResult() method of the
>>>>>> ResultCollector returned by the execute() method.
>>>>>> Even if having both blocking and non-blocking implementation of
>>> execute()
>>>>>> in both clients and servers sounds desirable from the point of view
>> of
>>>>>> orthogonality, this  could bring complications in terms of backward
>>>>>> compatibility. Besides, a need for a blocking version of function
>>> execution
>>>>>> has not been found.
>>>>>>
>>>>>> -Alberto G.
>>>>>>
>>>>>> On 29/8/19 16:48, Alberto Gomez wrote:
>>>>>>
>>>>>> Sorry, some corrections on my comments after revisiting the native
>>>>>> client code.
>>>>>>
>>>>>> When I said that the timeout used in the execution() method (by means
>>> of
>>>>>> a system property) was to set a read timeout on the socket, I was
>> only
>>>>>> talking about the Java client. In the case of the native clients, the
>>>>>> timeout set in the execute() method is not translated into a socket
>>>>>> timeout but it is the time to wait for the operation to complete,
>> i.e.,
>>>>>> to get all the results back.
>>>>>>
>>>>>> Things being so, I would change my proposal to:
>>>>>>
>>>>>> - Change the implementation of execute() on both Java and native
>>> clients
>>>>>> to be non-blocking (having the blocking/non-blocking behavior
>>>>>> configurable in the release this is introduced and leaving only the
>>>>>> non-blocking behavior in the next release).
>>>>>>
>>>>>> - Either remove the execute() with timeout methods in the native
>>> clients
>>>>>> (with a deprecation release) or implement the execute(timeout) method
>>> in
>>>>>> the Java client to be blocking (to work as the native client does
>>>>>> today). In case the method times out, the connection will not be
>>> closed.
>>>>>> If the operation times out due to the socket timeout (system
>> property),
>>>>>> then the connection will be closed as it is now done in the Java
>>> client.
>>>>>> - Do not implement the blocking execute(timeout) method on the server
>>>>>> and leave the current execute() implementation on the server as it is
>>>>>> (non-blocking)
>>>>>>
>>>>>> Does this make sense?
>>>>>>
>>>>>> -Alberto
>>>>>>
>>>>>> On 29/8/19 12:56, Alberto Gómez wrote:
>>>>>>
>>>>>>
>>>>>> Hi Dan,
>>>>>>
>>>>>> Discussing these matters by e-mail is getting tricky.
>>>>>>
>>>>>> Let's see if I understand you correctly and also if I am being clear
>>>>>> enough.
>>>>>>
>>>>>> Please, see my comments inline.
>>>>>>
>>>>>> On 29/8/19 0:49, Dan Smith wrote:
>>>>>>
>>>>>>
>>>>>> Sorry for the slow response, I've been trying to decide what I think
>>>>>> is the
>>>>>> right approach here.
>>>>>>
>>>>>> For (1) - conceptually, I don't have a problem with having both
>>> blocking
>>>>>> and non blocking methods on Execution. So adding blocking versions of
>>>>>> execute() with a timeout seems ok. But I do think if we add them we
>>>>>> need to
>>>>>> implement them on both the client and the server to behave the same
>>> way.
>>>>>> That shouldn't be too hard on the server since execute(timeout) can
>>> just
>>>>>> call getResult(timeout) internally.
>>>>>>
>>>>>>
>>>>>>
>>>>>> We have to take into account that, currently, the timeout in
>> execute()
>>>>>> is not the same thing as the timeout in getResult().
>>>>>>
>>>>>> On the one hand, the timeout set in execute() (via System property in
>>>>>> the Java client, and with a parameter in the native client) sets a
>>>>>> readtimeout on the socket which just means that if nothing is read
>>>>>> from the socket after sending the request to the server for the given
>>>>>> timeout, the corresponding exception will be thrown. It looks to me
>>>>>> more like a protection against possible communication failures rather
>>>>>> than a mechanism to decide if results took too long to be provided.
>> So
>>>>>> I would not link the presence of the timeout parameter in the method
>>>>>> to the nature of the method (blocking or non-blocking). I think we
>>>>>> could have this read timeout set and at the same time keep that
>> method
>>>>>> as non-blocking.
>>>>>>
>>>>>> On the other hand, the timeout set in getResult() is a timeout to
>> wait
>>>>>> for all the results to be received from the moment the method is
>>> invoked.
>>>>>> Therefore, I would not implement the blocking version of execute() on
>>>>>> the server by calling getResult() at the end.
>>>>>>
>>>>>> Apart from that, I doubt if it would make sense to set this
>>>>>> readTimeout in the execute() methods from servers given that the
>>>>>> communication is very different to the one done with clients. I also
>>>>>> doubt that anyone would be interested in the blocking version of
>>>>>> execute() on the server.
>>>>>>
>>>>>> My proposal was to add the readtimeout to the execute() methods in
>> the
>>>>>> Java client in order to align the Java client and the native client.
>>>>>> This change would be independent to the decision we make regarding
>> the
>>>>>> change of execute() to non-blocking. To achieve this alignment,
>>>>>> alternatively, we could remove the timeout parameter in execute()
>> from
>>>>>> the native clients and have it as a global property for the client to
>>>>>> be set by whatever mechanism available as it is done in the Java
>>>>>> client today.
>>>>>>
>>>>>> Were you proposing that the execute() methods with timeout were
>>>>>> blocking and the ones without timeout non-blocking? Not sure if this
>>>>>> is something you meant.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> For (2) - Although I think the original authors of this API probably
>>> did
>>>>>> intend for execute() to be non-blocking, the fact is that it does
>>>>>> block on
>>>>>> the client and most users are probably calling execute from a client.
>>>>>> So I
>>>>>> do agree we probably shouldn't change the behavior at this point.
>>>>>> Perhaps
>>>>>> we can just clearly document the current behavior of execute() as
>>>>>> part of
>>>>>> adding these new methods. Going forward we can add new methods to
>>>>>> Execution
>>>>>> that are clearly non-blocking (submit?, invoke?) and implement them
>>>>>> consistently on *both* the client in the server, but that doesn't
>>>>>> have to
>>>>>> be in the scope of this proposal.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The problem I see with adding new non-blocking methods
>> (new/submit...)
>>>>>> is that it would be a solution for the current users of the client
>>>>>> regarding backwards compatibility. But, on the server side, we would
>>>>>> have to move the current logic of execute() which is non-blocking to
>>>>>> the new methods and change the current execute() behavior to
>> blocking.
>>>>>> We would not impact the users of the client but we would impact the
>>>>>> users of the server.
>>>>>>
>>>>>> Again, I would propose to aim at:
>>>>>>
>>>>>> a) either leave execute() on the client as blocking
>>>>>>
>>>>>> b) or change execute() on the client to be non-blocking but on the
>>>>>> Geode release this is introduced, have it configurable. The default
>>>>>> behavior would be blocking (deprecated behavior) but could be set to
>>>>>> non-blocking with a system property. On the next release, the
>> blocking
>>>>>> behavior would be removed.
>>>>>>
>>>>>> - Alberto G.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Dan
>>>>>>
>>>>>> On Fri, Aug 23, 2019 at 4:28 AM Alberto Gomez <[email protected]
>>>>>>> <mailto:[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Jake,
>>>>>>
>>>>>> Please, see my answers below.
>>>>>>
>>>>>> On 22/8/19 21:16, Jacob Barrett wrote:
>>>>>>
>>>>>>
>>>>>> On Aug 21, 2019, at 8:49 AM, Alberto Gomez <[email protected]
>>>>>>> <mailto:[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> 2. Timeout in ResultCollector::getResult() and Execution::execute()
>>>>>>
>>>>>>
>>>>>> blocking
>>>>>>
>>>>>>
>>>>>> Regarding the timeout in the ResultCollector::getResult() method
>>>>>>
>>>>>>
>>>>>> problem and the blocking/non-blocking confusion for
>>>>>> Execution::execute()
>>>>>> two alternatives are considered:
>>>>>>
>>>>>>
>>>>>> a) Remove the possibility of setting a timeout on the
>>>>>>
>>>>>>
>>>>>> ResultCollector::getResult() method on the client side as with the
>>>>>> current
>>>>>> client implementation it is useless. This could be done by removing
>> the
>>>>>> method with the timeout parameter from the public API.
>>>>>>
>>>>>>
>>>>>> It would be advisable to make explicit in the documentation that the
>>>>>>
>>>>>>
>>>>>> getResult() method does not wait for results to arrive as that
>>>>>> should have
>>>>>> already been done in the Execution::execute() invocation.
>>>>>>
>>>>>>
>>>>>> This alternative is very simple and would keep things pretty much as
>>>>>>
>>>>>>
>>>>>> they are today.
>>>>>>
>>>>>>
>>>>>> To be honest I think approach would go against what a user “thinks”
>> is
>>>>>>
>>>>>> going on. Given that there hasn’t been a timeout on execute I
>>>>>> assumed it
>>>>>> was asynchronous and that the getResult blocked until timeout or
>>>>>> results
>>>>>> arrived. Typically these two calls were done back to back.
>>>>>>
>>>>>> You are right if you look at the Java client. But if you look at the
>>>>>> native clients, the timeout is there, both in the C++ and the C#
>> cases
>>>>>> which would indicate that it is a blocking call.
>>>>>>
>>>>>> See
>>>>>>
>>>>>>
>>>>>>
>> https://geode.apache.org/releases/latest/cppdocs/a00725.html#aa918a5e193745950e12ca4feb9c5d776
>>>>>> and
>>>>>>
>>>>>>
>>>>>>
>> https://geode.apache.org/releases/latest/dotnetdocs/a00882.html#ae0a814049482ca424f89c13ab1099c3d
>>>>>>
>>>>>>
>>>>>> b) Transform the Execution::execute() method on the client side
>>>>>> into a
>>>>>>
>>>>>>
>>>>>> non-blocking method.
>>>>>>
>>>>>>
>>>>>> This alternative is more complex and requires changes in all the
>>>>>>
>>>>>>
>>>>>> clients. Apart from that it has implications on the public client
>>>>>> API it
>>>>>> requires moving the exceptions offered currently by the
>>>>>> Execution::execute() method to the ResultCollector::getResult() and
>> new
>>>>>> threads will have to be managed.
>>>>>>
>>>>>>
>>>>>> I think this is more in line with what users expect is going on
>>>>>> based on
>>>>>>
>>>>>>
>>>>>> the current API, I know I have. If were are going to make any change
>> I
>>>>>> think this is the one. I don’t think the behavior change is a
>>>>>> problem since
>>>>>> it's what is expected to be happening anyway.
>>>>>>
>>>>>>
>>>>>> An outline of a possible implementation for option b) would be:
>>>>>>
>>>>>>     *   Instead of invoking the ServerRegionProxy::executeFunction()
>>>>>>
>>>>>>
>>>>>> directly as it is done today, create a Future that invokes this
>>>>>> method and
>>>>>> returns the resultCollector passed as parameter.
>>>>>>
>>>>>>
>>>>>> Do you really think we need to introduce Futures here into the API? I
>>>>>>
>>>>>>
>>>>>> feel like the ResultCollector acts as the Future. I don’t think any
>>>>>> change
>>>>>> needs to be made to the API in this regard. The ResultCollector
>>>>>> implementation would just simply block as implied by the api for the
>>>>>> timeout period. I would change the method to have units though and
>>>>>> deprecate the method without units.
>>>>>>
>>>>>> I did not mean to introduce Futures in the API. My idea was to use
>> Java
>>>>>> Futures internally so that the ResultCollector returned by the
>>>>>> getResult() method would wrap the Java Future with the
>> ResultCollector
>>>>>> that would actually hold the result.
>>>>>>
>>>>>> An alternative would be to leave the logic of blocking to each
>>>>>> implementation ResultCollector. In the case of the
>>>>>> DefaultResultCollector, we could use a CountDownLatch that would be
>>>>>> decremented when endResults() is called and that would make
>> getResult()
>>>>>> block by using CountDown.await(...).
>>>>>>
>>>>>> The advantage of using Futures internally is that the blocking logic
>>>>>> would not have to be implemented on every ResultCollector
>>>>>> implementation.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Jake
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> - Alberto
>>>>>>
>>>>>>
>>>>>>

Re: [DISCUSS] Improvements on client function execution API

Reply via email to