[
https://issues.apache.org/jira/browse/HBASE-29141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Umesh Kumar Kumawat updated HBASE-29141:
----------------------------------------
Description:
*TLDR*
In hbase for SimpleRPCScheduler, we have call queues and handlers. For each
queue there are some handler assigned. We have a config,
{{hbase.ipc.server.max.callqueue.length}} to control the maximum length of the
queue. If we don't define this config HBase tries calculate default maxLength
for the queue using {{{}DEFAULT_MAX_CALLQUEUE_LENGTH_PER_HANDLER{}}}. But there
was one issue here each queue was assigned to a portion of handler but while
calculating the max queue length we were using the count for all the handlers.
*Description using configs and code links*
_____________________________________________________________________________________________________________________
Regarding the handler and queues HBase have these config.
* {{hbase.regionserver.handler.count}} => Number of handlers, Default is 30.
* {{hbase.ipc.server.callqueue.handler.factor}} => Queue-to-handler ratio.
Default it is 0.1. Means 10 handler per queue.
** {{number of call queues}} = {{handlerCount}} *
{{callQueuesHandlersFactor(0.1)}}
([code|https://github.com/apache/hbase/blob/777010361abb203b8b17673d84acf4f7f1d0283a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java#L195])
* {{hbase.ipc.server.max.callqueue.length}} => Max callqueue Length. Default
is {{DEFAULT_MAX_CALLQUEUE_LENGTH_PER_HANDLER(10) * {color:#ff0000}*total
handlerCount*{color}}}
([code|https://github.com/apache/hbase/blob/777010361abb203b8b17673d84acf4f7f1d0283a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java#L71])
** {*}Issue{*}: I think there’s a small issue here. We are currently treating
this as the length per queue, but it's calculated using the {_}total handler
count{_}, not the number of handlers per queue.
_____________________________________________________________________________________________________________________
*Other concerns*
In codel queue type, we have a config
{{hbase.ipc.server.callqueue.codel.lifo.threshold}} (default 0.8). Having 100
handlers, by default all the 10 queues will be of size 1000 and queue
processing will not change to lifo unless there are 800 calls waiitng in a
single queue. Don't you think it is too much.
# I believe we may have unintentionally created a dependency between the
*total number of handlers* and the call queue length. Should we reconsider this
logic?
# Regarding the hard limit: should we be enforcing a _minimum_ value for the
queue size or a _maximum_ value for the queue size ?
was:
*TLDR*
In hbase for SimpleRPCScheduler, we have call queues and handlers. For each
queue there are some handler assigned. We have a config,
{{hbase.ipc.server.max.callqueue.length}} to control the maximum length of the
queue. If we don't define this config HBase tries calculate default maxLength
for the queue using {{{}DEFAULT_MAX_CALLQUEUE_LENGTH_PER_HANDLER{}}}. But there
was one issue here each queue was assigned to a portion of handler but while
calculating the max queue length we were using the count for all the handlers.
*Description using configs and code links*
_____________________________________________________________________________________________________________________
Regarding the handler and queues HBase have these config.
* {{hbase.regionserver.handler.count}} => Number of handlers, Default is 30.
* {{hbase.ipc.server.callqueue.handler.factor}} => Queue-to-handler ratio.
Default it is 0.1. Means 10 handler per queue.
** {{number of call queues}} = {{handlerCount}} *
{{callQueuesHandlersFactor(0.1)}}
([code|https://github.com/apache/hbase/blob/777010361abb203b8b17673d84acf4f7f1d0283a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java#L195])
* {{hbase.ipc.server.max.callqueue.length}} => Max callqueue Length. Default
is {{DEFAULT_MAX_CALLQUEUE_LENGTH_PER_HANDLER(10) * {color:#ff0000}*total
handlerCount*{color}}}
([code|https://github.com/apache/hbase/blob/777010361abb203b8b17673d84acf4f7f1d0283a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java#L71])
** {*}Issue{*}: I think there’s a small issue here. We are currently treating
this as the length per queue, but it's calculated using the {_}total handler
count{_}, not the number of handlers per queue.
* In addition we also have a hard limit of
({{{}DEFAULT_CALL_QUEUE_SIZE_HARD_LIMIT{}}}) 250
([code|https://github.com/apache/hbase/blob/777010361abb203b8b17673d84acf4f7f1d0283a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java#L228])
this is hard limit on *minimum* value of queue size. Queue size can not be
less then 250.
_____________________________________________________________________________________________________________________
*Other concerns*
In codel queue type, we have a config
{{hbase.ipc.server.callqueue.codel.lifo.threshold}} (default 0.8). Having 100
handlers, by default all the 10 queues will be of size 1000 and queue
processing will not change to lifo unless there are 800 calls waiitng in a
single queue. Don't you think it is too much.
# I believe we may have unintentionally created a dependency between the
*total number of handlers* and the call queue length. Should we reconsider this
logic?
# Regarding the hard limit: should we be enforcing a _minimum_ value for the
queue size or a _maximum_ value for the queue size ?
> Default maxQueueLength used for call queues is too high
> -------------------------------------------------------
>
> Key: HBASE-29141
> URL: https://issues.apache.org/jira/browse/HBASE-29141
> Project: HBase
> Issue Type: Bug
> Components: master, regionserver
> Affects Versions: 3.0.0-beta-1, 2.5.11, 2.6.2
> Reporter: Umesh Kumar Kumawat
> Assignee: Umesh Kumar Kumawat
> Priority: Major
> Labels: pull-request-available
>
> *TLDR*
> In hbase for SimpleRPCScheduler, we have call queues and handlers. For each
> queue there are some handler assigned. We have a config,
> {{hbase.ipc.server.max.callqueue.length}} to control the maximum length of
> the queue. If we don't define this config HBase tries calculate default
> maxLength for the queue using
> {{{}DEFAULT_MAX_CALLQUEUE_LENGTH_PER_HANDLER{}}}. But there was one issue
> here each queue was assigned to a portion of handler but while calculating
> the max queue length we were using the count for all the handlers.
> *Description using configs and code links*
> _____________________________________________________________________________________________________________________
> Regarding the handler and queues HBase have these config.
> * {{hbase.regionserver.handler.count}} => Number of handlers, Default is 30.
> * {{hbase.ipc.server.callqueue.handler.factor}} => Queue-to-handler ratio.
> Default it is 0.1. Means 10 handler per queue.
> ** {{number of call queues}} = {{handlerCount}} *
> {{callQueuesHandlersFactor(0.1)}}
> ([code|https://github.com/apache/hbase/blob/777010361abb203b8b17673d84acf4f7f1d0283a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcExecutor.java#L195])
> * {{hbase.ipc.server.max.callqueue.length}} => Max callqueue Length.
> Default is {{DEFAULT_MAX_CALLQUEUE_LENGTH_PER_HANDLER(10) *
> {color:#ff0000}*total handlerCount*{color}}}
> ([code|https://github.com/apache/hbase/blob/777010361abb203b8b17673d84acf4f7f1d0283a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java#L71])
> ** {*}Issue{*}: I think there’s a small issue here. We are currently
> treating this as the length per queue, but it's calculated using the {_}total
> handler count{_}, not the number of handlers per queue.
> _____________________________________________________________________________________________________________________
> *Other concerns*
> In codel queue type, we have a config
> {{hbase.ipc.server.callqueue.codel.lifo.threshold}} (default 0.8). Having 100
> handlers, by default all the 10 queues will be of size 1000 and queue
> processing will not change to lifo unless there are 800 calls waiitng in a
> single queue. Don't you think it is too much.
>
> # I believe we may have unintentionally created a dependency between the
> *total number of handlers* and the call queue length. Should we reconsider
> this logic?
> # Regarding the hard limit: should we be enforcing a _minimum_ value for the
> queue size or a _maximum_ value for the queue size ?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)