[ https://issues.apache.org/jira/browse/HBASE-28608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Roudnitsky updated HBASE-28608: -------------------------------------- Description: Documented behavior in the HBase reference for client meta operation timeout {{hbase.client.meta.operation.timeout}} default is that it will be set to the configured client operation timeout, but implementation is that it defaults to the default client operation timeout of 20 minutes. >From "Timeout settings" in the hbase reference : {panel} A higher-level timeout is hbase.client.operation.timeout which is valid for each client call. When an RPC call fails for instance for a timeout due to hbase.rpc.timeout it will be retried until hbase.client.operation.timeout is reached. Client operation timeout for system tables can be fine tuned by setting hbase.client.meta.operation.timeout configuration value. When this is not set its value will use hbase.client.operation.timeout {panel} There seem to be two very different dependencies on meta operation timeout: # End to end operation timeout for system table operations (2.x and 3) # Timeout to acquire {{userRegionLock}} to initiate meta scan in {{locateRegionInMeta}} (2.x blocking client only, see HBASE-28730 for more detail/related work) For case 1 I believe it makes sense from a user perspective that meta operation timeout, which is meant to apply to a specific subset of operations, will respect the 'general' operation timeout that is configured if the more specific meta operation timeout is not explicitly set. For case 2 blocking client, the default meta timeout value defeats the purpose of the `userRegionLock` timeout if one has a typical setup where {{hbase.client.operation.timeout}} << 20 minutes and {{hbase.client.meta.operation.timeout}} is not explicitly set, which can lead to operations taking much longer than the configured operation timeout to actually timeout if there is e.g meta slowness and/or contention around userRegionLock on 2.x, see HBASE-28730 for more detail/related work. was: Documented behavior in the HBase reference for client meta operation timeout {{hbase.client.meta.operation.timeout}} default is that it will be set to the configured client operation timeout, but implementation is that it defaults to the default client operation timeout of 20 minutes. There seem to be two very different dependencies on meta operation timeout: # End to end operation timeout for system table operations (2.x and 3) # Timeout to acquire {{userRegionLock}} to initiate meta scan in {{locateRegionInMeta}} (2.x blocking client only), this is the dependence on the meta operation timeout property that brought the 20 min default to my attention. For case 2 blocking client, the default meta timeout value defeats the purpose of the `userRegionLock` timeout if one has a typical setup where {{hbase.client.operation.timeout}} << 20 minutes and {{hbase.client.meta.operation.timeout}} is not explicitly set, which can lead to operations taking much longer than the configured operation timeout to actually timeout if there is e.g meta slowness and/or contention around userRegionLock on 2.x. From "Timeout settings" in the hbase reference : {panel} A higher-level timeout is hbase.client.operation.timeout which is valid for each client call. When an RPC call fails for instance for a timeout due to hbase.rpc.timeout it will be retried until hbase.client.operation.timeout is reached. Client operation timeout for system tables can be fine tuned by setting hbase.client.meta.operation.timeout configuration value. When this is not set its value will use hbase.client.operation.timeout {panel} > More sensible client meta operation timeout default > --------------------------------------------------- > > Key: HBASE-28608 > URL: https://issues.apache.org/jira/browse/HBASE-28608 > Project: HBase > Issue Type: Improvement > Components: Client > Affects Versions: 2.6.0, 2.4.17, 3.0.0-beta-1, 2.5.8 > Reporter: Daniel Roudnitsky > Assignee: Daniel Roudnitsky > Priority: Major > Labels: pull-request-available, timeout > > Documented behavior in the HBase reference for client meta operation timeout > {{hbase.client.meta.operation.timeout}} default is that it will be set to the > configured client operation timeout, but implementation is that it defaults > to the default client operation timeout of 20 minutes. > From "Timeout settings" in the hbase reference : > {panel} > A higher-level timeout is hbase.client.operation.timeout which is valid for > each client call. When an RPC call fails for instance for a timeout due to > hbase.rpc.timeout it will be retried until hbase.client.operation.timeout is > reached. Client operation timeout for system tables can be fine tuned by > setting hbase.client.meta.operation.timeout configuration value. When this is > not set its value will use hbase.client.operation.timeout > {panel} > There seem to be two very different dependencies on meta operation timeout: > # End to end operation timeout for system table operations (2.x and 3) > # Timeout to acquire {{userRegionLock}} to initiate meta scan in > {{locateRegionInMeta}} (2.x blocking client only, see HBASE-28730 for more > detail/related work) > For case 1 I believe it makes sense from a user perspective that meta > operation timeout, which is meant to apply to a specific subset of > operations, will respect the 'general' operation timeout that is configured > if the more specific meta operation timeout is not explicitly set. > For case 2 blocking client, the default meta timeout value defeats the > purpose of the `userRegionLock` timeout if one has a typical setup where > {{hbase.client.operation.timeout}} << 20 minutes and > {{hbase.client.meta.operation.timeout}} is not explicitly set, which can lead > to operations taking much longer than the configured operation timeout to > actually timeout if there is e.g meta slowness and/or contention around > userRegionLock on 2.x, see HBASE-28730 for more detail/related work. -- This message was sent by Atlassian Jira (v8.20.10#820010)