[
https://issues.apache.org/jira/browse/HBASE-30219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
rstest updated HBASE-30219:
---------------------------
Description:
h1. Summary
clone_table_schema / createTable / truncateTable return success (exitValue=0)
when the issuing client's node is partitioned from the RegionServer hosting
hbase:meta, but no CreateTableProcedure is ever registered on the master and
the table is never created
{*}Environment{*}: Three-node cluster (1 master + 2 RegionServers), single
HBase version, ZooKeeper quorum on all three nodes. A network partition is
active between the two RegionServers (so a client on the RegionServer that is
cut off from the meta-hosting RegionServer cannot reach hbase:meta), while the
master and the ZK quorum remain healthy. Found by differential fault-injection
testing (the same command sequence run on independent clusters under the same
partition, with results compared) and reproduced locally with full HMaster logs
captured.
h2. Description
When a client issues a DDL operation (e.g. {{{}clone_table_schema{}}},
{{{}create{}}}, {{{}truncate_preserve{}}}) from a node that cannot reach the
RegionServer currently hosting {{{}hbase:meta{}}}, the operation:
* blocks for ~60 seconds, then
* returns *success* to the caller ({{{}exitValue = 0{}}} from the shell; no
exception from {{{}HBaseAdmin{}}}),
even though:
* the HMaster log contains *no CreateTableProcedure* (or
TruncateTableProcedure) for that table at all - the operation never reached the
master as a procedure, and
* the table is {*}never created{*}: {{exists 'T'}} returns false and {{scan
'hbase:meta'}} has zero rows for it.
So a DDL call that reports success silently does nothing. There is no error
surfaced to the client, no entry in {{{}hbase:meta{}}}, and no procedure on the
master - the failure is completely invisible to the application, which believes
the table now exists.
The decisive condition is {*}reachability of the meta-hosting RegionServer from
the client's node{*}, not the health of the master or ZooKeeper. In our
reproduction the master and the ZK quorum were both reachable from the client
throughout; only the path from the client's RegionServer to the RegionServer
hosting {{hbase:meta}} was cut. A DDL issued from the other side (the node that
CAN reach the meta host) completes normally in ~1 second with a clean
{{{}CreateTableProcedure ... ADD_TO_META -> ASSIGN -> state=ENABLED ->
SUCCESS{}}}.
h2. Evidence
In a run where {{hbase:meta}} was hosted on RegionServer R1 and the partition
cut R1 from R2:
* A {{clone_table_schema}} issued from a client on *R2* (cannot reach
R1/meta): client returns success after ~60 s; the complete HMaster log over the
whole window contains *zero* mentions of the target table; {{exists}} = false;
0 rows in {{{}hbase:meta{}}}. -> silent loss.
* A {{clone_table_schema}} issued from a client on *R1* (meta is local):
HMaster log shows {{Client=... create 'T'}} -> {{CreateTableProcedure pid=NNN:
CREATE_TABLE_PRE_OPERATION -> WRITE_FS_LAYOUT -> ADD_TO_META -> ASSIGN ->
state=ENABLED -> SUCCESS}} in ~1 s; table exists. -> normal.
The two cases differ only in which side of the partition the issuing client
sits on relative to the meta host. Whichever DDL is issued from the
meta-unreachable side is silently lost with a success return code; whichever is
issued from the meta-reachable side succeeds.
h2. What is established vs. the one open detail
Established (reproduced, with logs):
* A DDL call returns success while no procedure is ever registered on the
master and the table is never created.
* The trigger is the issuing client's node being unable to reach the
RegionServer hosting {{hbase:meta}} (a RegionServer-to-RegionServer partition),
with master and ZK quorum healthy.
* The ~60 s duration indicates the client retries an operation that needs meta
access (unreachable) and ultimately returns success instead of surfacing the
failure.
Open detail (the only thing not yet pinned): the exact point in the
client/Admin path that converts the meta-unreachable failure into a success
return - e.g. the {{clone_table_schema}} source-descriptor read or the
{{createTable}} pre-flight giving up after retries and returning normally
rather than throwing. The observable contract violation (success returned,
nothing created) does not depend on which it is.
h2. Steps to Reproduce
# Start a 3-node cluster (node0 = master, node1 + node2 = RegionServers), any
single version (reproduced on 2.5.13 and 2.6.4).
# Ensure {{hbase:meta}} is hosted on node1 (e.g. {{move}} it there if needed;
confirm with {{scan 'hbase:meta'}} / the master UI).
# Create a source table to clone: \{{create 'S', {NAME => 'cf'}}}.
# Partition node1 <-> node2 only (e.g. {{iptables -A INPUT -s <peer> -j DROP}}
both directions). Do NOT touch node0, so the master and the ZK quorum stay
healthy.
# From a client/shell running on *node2* (the side cut off from meta on node1):
{{clone_table_schema 'S', 'T'}} (or \{{create 'T', {NAME => 'cf'}
}}).
Observe:
* The command blocks ~60 s, then returns success / {{exitValue = 0}} (no
exception).
* The HMaster log (node0) contains *no* CreateTableProcedure for {{{}T{}}}.
* {{exists 'T'}} => false; {{scan 'hbase:meta'}} has no rows for {{{}T{}}}.
Control: issuing the same {{clone_table_schema 'S', 'T2'}} from a client on
*node1* (meta is local) creates {{T2}} normally in ~1 s. Healing the partition
does not retroactively create {{T}} - the loss is durable.
Expected: a DDL operation must not return success unless the table was actually
created and is durably present in {{{}hbase:meta{}}}. If it cannot reach the
metadata it needs, it must fail visibly (timeout/exception), not return success.
Actual: success is returned with no procedure registered and no table created.
h2. Root cause pointers
* Shell DDL is a thin wrapper over {{{}Admin{}}}: {{clone_table_schema}} ->
{{HBaseAdmin.cloneTableSchema}} (reads the source descriptor, then calls
{{{}createTable{}}}); {{create}} -> {{HBaseAdmin.createTableAsync}} ->
{{MasterRpcServices.createTable}} -> {{HMaster.createTable}} ->
{{{}CreateTableProcedure{}}}; {{truncate_preserve}} ->
{{{}HBaseAdmin.truncateTable(name, true){}}}.
* The {{{}createTable{}}}/{{{}truncateTable{}}} futures are specified to wait
for the table to be enabled and all regions online before returning. In the
failing case the future returns success though no procedure ran - so the
success path is reachable without the procedure ever being
submitted/acknowledged.
* Suspect area: the client/Admin retry path for the meta-dependent step
(source-descriptor read or createTable submission) when the meta-hosting
RegionServer is unreachable - it appears to exhaust retries (~60 s) and return
normally instead of throwing. A maintainer with the captured HMaster + client
logs (available on request) can confirm the exact branch.
h2. Suggested fixes
# A DDL Admin call must surface failure (timeout/exception) when the
underlying meta-dependent step cannot complete, rather than returning success.
Returning success implies the table is durably created and visible in
{{{}hbase:meta{}}}.
# Verify the post-condition before returning success: confirm the table exists
in {{hbase:meta}} (and reached the expected enabled/region state), not merely
that the local call sequence returned.
# Add a fault-injection regression test: 3-node cluster, partition a
RegionServer from the meta-hosting RegionServer, issue create/clone/truncate
from the cut-off side, and assert the call FAILS (or the table is durably
created) - it must never return success with no table created.
h2. Additional context
* The bug requires no version change and no special configuration - it
reproduces on a single-version cluster (Steps to Reproduce above). It was
originally surfaced by a differential test harness that compares independent
clusters running the same plan under the same partition; because the partition
silently dropped DDL from one side, the clusters ended with divergent table
sets (one cluster missing table A, another missing table B, a truncated table's
enabled state differing) - all explained by the single rule above: a DDL from
the meta-unreachable side is silently lost with a success return code.
* We can provide the captured per-lane HMaster logs (meta-unreachable side:
zero mentions of the table; meta-reachable side: clean CreateTableProcedure
SUCCESS), the RegionServer log showing the meta location, and the client-side
timing showing the ~60 s success return.
was:
h1. Summary
clone_table_schema / createTable / truncateTable return success (exitValue=0)
when the issuing client's node is partitioned from the RegionServer hosting
hbase:meta, but no CreateTableProcedure is ever registered on the master and
the table is never created
*Environment*: Three-node cluster (1 master + 2 RegionServers), single HBase
version, ZooKeeper quorum on all three nodes. A network partition is active
between the two RegionServers (so a client on the RegionServer that is cut off
from the meta-hosting RegionServer cannot reach hbase:meta), while the master
and the ZK quorum remain healthy. Found by differential fault-injection testing
(the same command sequence run on independent clusters under the same
partition, with results compared) and reproduced locally with full HMaster logs
captured.
h2. Description
When a client issues a DDL operation (e.g. {{clone_table_schema}}, {{create}},
{{truncate_preserve}}) from a node that cannot reach the RegionServer currently
hosting {{hbase:meta}}, the operation:
* blocks for ~60 seconds, then
* returns *success* to the caller ({{exitValue = 0}} from the shell; no
exception from {{HBaseAdmin}}),
even though:
* the HMaster log contains *no CreateTableProcedure* (or
TruncateTableProcedure) for that table at all - the operation never reached the
master as a procedure, and
* the table is *never created*: {{exists 'T'}} returns false and {{scan
'hbase:meta'}} has zero rows for it.
So a DDL call that reports success silently does nothing. There is no error
surfaced to the client, no entry in {{hbase:meta}}, and no procedure on the
master - the failure is completely invisible to the application, which believes
the table now exists.
The decisive condition is *reachability of the meta-hosting RegionServer from
the client's node*, not the health of the master or ZooKeeper. In our
reproduction the master and the ZK quorum were both reachable from the client
throughout; only the path from the client's RegionServer to the RegionServer
hosting {{hbase:meta}} was cut. A DDL issued from the other side (the node that
CAN reach the meta host) completes normally in ~1 second with a clean
{{CreateTableProcedure ... ADD_TO_META -> ASSIGN -> state=ENABLED -> SUCCESS}}.
h2. Evidence (reproduced locally, HMaster logs captured)
In a run where {{hbase:meta}} was hosted on RegionServer R1 and the partition
cut R1 from R2:
* A {{clone_table_schema}} issued from a client on *R2* (cannot reach R1/meta):
client returns success after ~60 s; the complete HMaster log over the whole
window contains *zero* mentions of the target table; {{exists}} = false; 0 rows
in {{hbase:meta}}. -> silent loss.
* A {{clone_table_schema}} issued from a client on *R1* (meta is local):
HMaster log shows {{Client=... create 'T'}} -> {{CreateTableProcedure pid=NNN:
CREATE_TABLE_PRE_OPERATION -> WRITE_FS_LAYOUT -> ADD_TO_META -> ASSIGN ->
state=ENABLED -> SUCCESS}} in ~1 s; table exists. -> normal.
The two cases differ only in which side of the partition the issuing client
sits on relative to the meta host. Whichever DDL is issued from the
meta-unreachable side is silently lost with a success return code; whichever is
issued from the meta-reachable side succeeds.
h2. What is established vs. the one open detail
Established (reproduced, with logs):
* A DDL call returns success while no procedure is ever registered on the
master and the table is never created.
* The trigger is the issuing client's node being unable to reach the
RegionServer hosting {{hbase:meta}} (a RegionServer-to-RegionServer partition),
with master and ZK quorum healthy.
* The ~60 s duration indicates the client retries an operation that needs meta
access (unreachable) and ultimately returns success instead of surfacing the
failure.
Open detail (the only thing not yet pinned): the exact point in the
client/Admin path that converts the meta-unreachable failure into a success
return - e.g. the {{clone_table_schema}} source-descriptor read or the
{{createTable}} pre-flight giving up after retries and returning normally
rather than throwing. The observable contract violation (success returned,
nothing created) does not depend on which it is.
h2. Steps to Reproduce (single version, no special build)
# Start a 3-node cluster (node0 = master, node1 + node2 = RegionServers), any
single version (reproduced on 2.5.13 and 2.6.4).
# Ensure {{hbase:meta}} is hosted on node1 (e.g. {{move}} it there if needed;
confirm with {{scan 'hbase:meta'}} / the master UI).
# Create a source table to clone: {{create 'S', {NAME => 'cf'}}}.
# Partition node1 <-> node2 only (e.g. {{iptables -A INPUT -s <peer> -j DROP}}
both directions). Do NOT touch node0, so the master and the ZK quorum stay
healthy.
# From a client/shell running on *node2* (the side cut off from meta on node1):
{{clone_table_schema 'S', 'T'}} (or {{create 'T', {NAME => 'cf'}}}).
Observe:
* The command blocks ~60 s, then returns success / {{exitValue = 0}} (no
exception).
* The HMaster log (node0) contains *no* CreateTableProcedure for {{T}}.
* {{exists 'T'}} => false; {{scan 'hbase:meta'}} has no rows for {{T}}.
Control: issuing the same {{clone_table_schema 'S', 'T2'}} from a client on
*node1* (meta is local) creates {{T2}} normally in ~1 s. Healing the partition
does not retroactively create {{T}} - the loss is durable.
Expected: a DDL operation must not return success unless the table was actually
created and is durably present in {{hbase:meta}}. If it cannot reach the
metadata it needs, it must fail visibly (timeout/exception), not return success.
Actual: success is returned with no procedure registered and no table created.
h2. Root cause pointers
* Shell DDL is a thin wrapper over {{Admin}}: {{clone_table_schema}} ->
{{HBaseAdmin.cloneTableSchema}} (reads the source descriptor, then calls
{{createTable}}); {{create}} -> {{HBaseAdmin.createTableAsync}} ->
{{MasterRpcServices.createTable}} -> {{HMaster.createTable}} ->
{{CreateTableProcedure}}; {{truncate_preserve}} ->
{{HBaseAdmin.truncateTable(name, true)}}.
* The {{createTable}}/{{truncateTable}} futures are specified to wait for the
table to be enabled and all regions online before returning. In the failing
case the future returns success though no procedure ran - so the success path
is reachable without the procedure ever being submitted/acknowledged.
* Suspect area: the client/Admin retry path for the meta-dependent step
(source-descriptor read or createTable submission) when the meta-hosting
RegionServer is unreachable - it appears to exhaust retries (~60 s) and return
normally instead of throwing. A maintainer with the captured HMaster + client
logs (available on request) can confirm the exact branch.
h2. Suggested fixes
# A DDL Admin call must surface failure (timeout/exception) when the underlying
meta-dependent step cannot complete, rather than returning success. Returning
success implies the table is durably created and visible in {{hbase:meta}}.
# Verify the post-condition before returning success: confirm the table exists
in {{hbase:meta}} (and reached the expected enabled/region state), not merely
that the local call sequence returned.
# Add a fault-injection regression test: 3-node cluster, partition a
RegionServer from the meta-hosting RegionServer, issue create/clone/truncate
from the cut-off side, and assert the call FAILS (or the table is durably
created) - it must never return success with no table created.
h2. Additional context
* The bug requires no version change and no special configuration - it
reproduces on a single-version cluster (Steps to Reproduce above). It was
originally surfaced by a differential test harness that compares independent
clusters running the same plan under the same partition; because the partition
silently dropped DDL from one side, the clusters ended with divergent table
sets (one cluster missing table A, another missing table B, a truncated table's
enabled state differing) - all explained by the single rule above: a DDL from
the meta-unreachable side is silently lost with a success return code.
* We can provide the captured per-lane HMaster logs (meta-unreachable side:
zero mentions of the table; meta-reachable side: clean CreateTableProcedure
SUCCESS), the RegionServer log showing the meta location, and the client-side
timing showing the ~60 s success return.
> Admin DDL silently returns success (no CreateTableProcedure, no table) when
> hbase:meta RegionServer is unreachable
> ------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-30219
> URL: https://issues.apache.org/jira/browse/HBASE-30219
> Project: HBase
> Issue Type: Bug
> Components: Admin, Client, proc-v2
> Affects Versions: 2.6.4, 2.5.13
> Reporter: rstest
> Priority: Major
>
> h1. Summary
> clone_table_schema / createTable / truncateTable return success (exitValue=0)
> when the issuing client's node is partitioned from the RegionServer hosting
> hbase:meta, but no CreateTableProcedure is ever registered on the master and
> the table is never created
> {*}Environment{*}: Three-node cluster (1 master + 2 RegionServers), single
> HBase version, ZooKeeper quorum on all three nodes. A network partition is
> active between the two RegionServers (so a client on the RegionServer that is
> cut off from the meta-hosting RegionServer cannot reach hbase:meta), while
> the master and the ZK quorum remain healthy. Found by differential
> fault-injection testing (the same command sequence run on independent
> clusters under the same partition, with results compared) and reproduced
> locally with full HMaster logs captured.
> h2. Description
> When a client issues a DDL operation (e.g. {{{}clone_table_schema{}}},
> {{{}create{}}}, {{{}truncate_preserve{}}}) from a node that cannot reach the
> RegionServer currently hosting {{{}hbase:meta{}}}, the operation:
> * blocks for ~60 seconds, then
> * returns *success* to the caller ({{{}exitValue = 0{}}} from the shell; no
> exception from {{{}HBaseAdmin{}}}),
> even though:
> * the HMaster log contains *no CreateTableProcedure* (or
> TruncateTableProcedure) for that table at all - the operation never reached
> the master as a procedure, and
> * the table is {*}never created{*}: {{exists 'T'}} returns false and {{scan
> 'hbase:meta'}} has zero rows for it.
> So a DDL call that reports success silently does nothing. There is no error
> surfaced to the client, no entry in {{{}hbase:meta{}}}, and no procedure on
> the master - the failure is completely invisible to the application, which
> believes the table now exists.
> The decisive condition is {*}reachability of the meta-hosting RegionServer
> from the client's node{*}, not the health of the master or ZooKeeper. In our
> reproduction the master and the ZK quorum were both reachable from the client
> throughout; only the path from the client's RegionServer to the RegionServer
> hosting {{hbase:meta}} was cut. A DDL issued from the other side (the node
> that CAN reach the meta host) completes normally in ~1 second with a clean
> {{{}CreateTableProcedure ... ADD_TO_META -> ASSIGN -> state=ENABLED ->
> SUCCESS{}}}.
> h2. Evidence
> In a run where {{hbase:meta}} was hosted on RegionServer R1 and the partition
> cut R1 from R2:
> * A {{clone_table_schema}} issued from a client on *R2* (cannot reach
> R1/meta): client returns success after ~60 s; the complete HMaster log over
> the whole window contains *zero* mentions of the target table; {{exists}} =
> false; 0 rows in {{{}hbase:meta{}}}. -> silent loss.
> * A {{clone_table_schema}} issued from a client on *R1* (meta is local):
> HMaster log shows {{Client=... create 'T'}} -> {{CreateTableProcedure
> pid=NNN: CREATE_TABLE_PRE_OPERATION -> WRITE_FS_LAYOUT -> ADD_TO_META ->
> ASSIGN -> state=ENABLED -> SUCCESS}} in ~1 s; table exists. -> normal.
> The two cases differ only in which side of the partition the issuing client
> sits on relative to the meta host. Whichever DDL is issued from the
> meta-unreachable side is silently lost with a success return code; whichever
> is issued from the meta-reachable side succeeds.
> h2. What is established vs. the one open detail
> Established (reproduced, with logs):
> * A DDL call returns success while no procedure is ever registered on the
> master and the table is never created.
> * The trigger is the issuing client's node being unable to reach the
> RegionServer hosting {{hbase:meta}} (a RegionServer-to-RegionServer
> partition), with master and ZK quorum healthy.
> * The ~60 s duration indicates the client retries an operation that needs
> meta access (unreachable) and ultimately returns success instead of surfacing
> the failure.
> Open detail (the only thing not yet pinned): the exact point in the
> client/Admin path that converts the meta-unreachable failure into a success
> return - e.g. the {{clone_table_schema}} source-descriptor read or the
> {{createTable}} pre-flight giving up after retries and returning normally
> rather than throwing. The observable contract violation (success returned,
> nothing created) does not depend on which it is.
> h2. Steps to Reproduce
> # Start a 3-node cluster (node0 = master, node1 + node2 = RegionServers),
> any single version (reproduced on 2.5.13 and 2.6.4).
> # Ensure {{hbase:meta}} is hosted on node1 (e.g. {{move}} it there if
> needed; confirm with {{scan 'hbase:meta'}} / the master UI).
> # Create a source table to clone: \{{create 'S', {NAME => 'cf'}}}.
> # Partition node1 <-> node2 only (e.g. {{iptables -A INPUT -s <peer> -j
> DROP}} both directions). Do NOT touch node0, so the master and the ZK quorum
> stay healthy.
> # From a client/shell running on *node2* (the side cut off from meta on
> node1): {{clone_table_schema 'S', 'T'}} (or \{{create 'T', {NAME => 'cf'}
> }}).
> Observe:
> * The command blocks ~60 s, then returns success / {{exitValue = 0}} (no
> exception).
> * The HMaster log (node0) contains *no* CreateTableProcedure for {{{}T{}}}.
> * {{exists 'T'}} => false; {{scan 'hbase:meta'}} has no rows for {{{}T{}}}.
> Control: issuing the same {{clone_table_schema 'S', 'T2'}} from a client on
> *node1* (meta is local) creates {{T2}} normally in ~1 s. Healing the
> partition does not retroactively create {{T}} - the loss is durable.
> Expected: a DDL operation must not return success unless the table was
> actually created and is durably present in {{{}hbase:meta{}}}. If it cannot
> reach the metadata it needs, it must fail visibly (timeout/exception), not
> return success.
> Actual: success is returned with no procedure registered and no table created.
> h2. Root cause pointers
> * Shell DDL is a thin wrapper over {{{}Admin{}}}: {{clone_table_schema}} ->
> {{HBaseAdmin.cloneTableSchema}} (reads the source descriptor, then calls
> {{{}createTable{}}}); {{create}} -> {{HBaseAdmin.createTableAsync}} ->
> {{MasterRpcServices.createTable}} -> {{HMaster.createTable}} ->
> {{{}CreateTableProcedure{}}}; {{truncate_preserve}} ->
> {{{}HBaseAdmin.truncateTable(name, true){}}}.
> * The {{{}createTable{}}}/{{{}truncateTable{}}} futures are specified to
> wait for the table to be enabled and all regions online before returning. In
> the failing case the future returns success though no procedure ran - so the
> success path is reachable without the procedure ever being
> submitted/acknowledged.
> * Suspect area: the client/Admin retry path for the meta-dependent step
> (source-descriptor read or createTable submission) when the meta-hosting
> RegionServer is unreachable - it appears to exhaust retries (~60 s) and
> return normally instead of throwing. A maintainer with the captured HMaster +
> client logs (available on request) can confirm the exact branch.
> h2. Suggested fixes
> # A DDL Admin call must surface failure (timeout/exception) when the
> underlying meta-dependent step cannot complete, rather than returning
> success. Returning success implies the table is durably created and visible
> in {{{}hbase:meta{}}}.
> # Verify the post-condition before returning success: confirm the table
> exists in {{hbase:meta}} (and reached the expected enabled/region state), not
> merely that the local call sequence returned.
> # Add a fault-injection regression test: 3-node cluster, partition a
> RegionServer from the meta-hosting RegionServer, issue create/clone/truncate
> from the cut-off side, and assert the call FAILS (or the table is durably
> created) - it must never return success with no table created.
> h2. Additional context
> * The bug requires no version change and no special configuration - it
> reproduces on a single-version cluster (Steps to Reproduce above). It was
> originally surfaced by a differential test harness that compares independent
> clusters running the same plan under the same partition; because the
> partition silently dropped DDL from one side, the clusters ended with
> divergent table sets (one cluster missing table A, another missing table B, a
> truncated table's enabled state differing) - all explained by the single rule
> above: a DDL from the meta-unreachable side is silently lost with a success
> return code.
> * We can provide the captured per-lane HMaster logs (meta-unreachable side:
> zero mentions of the table; meta-reachable side: clean CreateTableProcedure
> SUCCESS), the RegionServer log showing the meta location, and the client-side
> timing showing the ~60 s success return.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)