Apache9 commented on PR #7363:
URL: https://github.com/apache/hbase/pull/7363#issuecomment-3381440722

   > > In general, all rpc request like locating a region should be an 
asynchronous call, do you have more details on what makes the table.batch call 
blocking for a long time?
   > 
   > I assume we're not waiting for any RPC responses on the `internalFlush` 
thread, but with large enough buffers and large enough numbers of mutations and 
high enough concurrency on incoming mutations, it seems even 
`AsyncBatchRpcRetryingCaller#groupAndSend` can take long enough (milliseconds?) 
to decrease overall throughput.
   > 
   > If the region location is in the cache, then the future completes 
synchronously in `AsyncNonMetaRegionLocator#getRegionLocationsInternal`, which 
allows more work under the `synchronized` block, which prevents further 
mutations from being accepted.
   > 
   > 
https://github.com/apache/hbase/blob/d0b94780f509156c66ea9d297a89e73940257eb1/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncNonMetaRegionLocator.java#L504-L513
   > 
   > When the number of mutations could be >100,000 and number of region 
locations could be >10,000, and most of those locations are in the cache, 
`groupAndSend` of those Multi RPCs yields a non-trivial amount of work.
   
   So maybe we should check depth of the stack trace? In netty there are some 
tricks around this area, if the future is complete synchronously all the time 
and makes a very deep call stack trace, it will force schedule an asynchronous 
task to prevent stack overflow and also reduce the blocking execution time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to