lupko opened a new issue, #43819:
URL: https://github.com/apache/arrow/issues/43819

   ### Describe the enhancement requested
   
   The non-async Arrow Flight RPC infrastructure (in C++ and Python at least) 
currently does not expose necessary mechanisms to allow server-side 
cancellations of requests. Such a capability would really be useful to 
'safe-guard' the requests such as DoPut and DoExchange where the server has to 
wait for data from the client - which blocks the server thread.
   
   In the current state, an incorrectly coded (or a rogue) client can initiate 
say a bunch of DoPut requests, keep the connection open and never send/complete 
sending the data. The server thread would be blocked trying to read the data 
forever.
   
   This can be really detrimental to server well-being especially if the 
implementation of the the operation (say DoPut) obtains some resources before 
it gets blocked on the infinite read. 
   
   _Real life example: implementation of DoPut writes data to S3 via Arrow S3 
Filesystem. Each S3 output stream comes with minimum overhead (5MB for its 
internal accumulation buffers) and so the server protects its memory usage by 
using a semaphore. There is therefore an implicit limit on number of concurrent 
DoPut requests. An subtle bug in client-side code (Java) started DoPuts but 
never sent data and never closed DoPut. This was intermittent; over time 
(couple hours) the server's resources were all reserved and writes were no 
longer possible - hilarity ensues._
   
   While the problem with 'zombie' DoPut requests (and I believe do DoExchange 
is the same) starts with the client, I think the server code should have 
necessary mechanisms to safeguard itself. Not to mention that the client may 
not always be under direct control.
   
   What do you think about exposing the `TryCancel` method on the 
`ServerCallContext` for the non-async server? I did a quick PoC + test here: 
https://github.com/lupko/arrow/commit/29a7441e5b77264a8e518f7d6b58f365a24ee78e 
- the initial tests show that it can work.
   
   With TryCancel available on the context, the actual DoPut implementation 
(e.g. code outside of the Arrow codebase)  could register (strictly for 
duration of the call or better yet the blocking read) a per-call record to some 
kind of 'monitor' (again, outside of Arrow codebase) which would be a separate 
thread and will cancel old calls / calls without activity for some time. 
   
   If exposing the TryCancel is viable from your perspective, I could pick up 
my PoC, finish it up and make PR.
   
   ### Component(s)
   
   C++, FlightRPC, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to