rstest created HBASE-30219:
------------------------------
Summary: Admin DDL silently returns success (no
CreateTableProcedure, no table) when hbase:meta RegionServer is unreachable
Key: HBASE-30219
URL: https://issues.apache.org/jira/browse/HBASE-30219
Project: HBase
Issue Type: Bug
Components: Admin, Client, proc-v2
Affects Versions: 2.5.13, 2.6.4
Reporter: rstest
h1. Summary
clone_table_schema / createTable / truncateTable return success (exitValue=0)
when the issuing client's node is partitioned from the RegionServer hosting
hbase:meta, but no CreateTableProcedure is ever registered on the master and
the table is never created
*Environment*: Three-node cluster (1 master + 2 RegionServers), single HBase
version, ZooKeeper quorum on all three nodes. A network partition is active
between the two RegionServers (so a client on the RegionServer that is cut off
from the meta-hosting RegionServer cannot reach hbase:meta), while the master
and the ZK quorum remain healthy. Found by differential fault-injection testing
(the same command sequence run on independent clusters under the same
partition, with results compared) and reproduced locally with full HMaster logs
captured.
h2. Description
When a client issues a DDL operation (e.g. {{clone_table_schema}}, {{create}},
{{truncate_preserve}}) from a node that cannot reach the RegionServer currently
hosting {{hbase:meta}}, the operation:
* blocks for ~60 seconds, then
* returns *success* to the caller ({{exitValue = 0}} from the shell; no
exception from {{HBaseAdmin}}),
even though:
* the HMaster log contains *no CreateTableProcedure* (or
TruncateTableProcedure) for that table at all - the operation never reached the
master as a procedure, and
* the table is *never created*: {{exists 'T'}} returns false and {{scan
'hbase:meta'}} has zero rows for it.
So a DDL call that reports success silently does nothing. There is no error
surfaced to the client, no entry in {{hbase:meta}}, and no procedure on the
master - the failure is completely invisible to the application, which believes
the table now exists.
The decisive condition is *reachability of the meta-hosting RegionServer from
the client's node*, not the health of the master or ZooKeeper. In our
reproduction the master and the ZK quorum were both reachable from the client
throughout; only the path from the client's RegionServer to the RegionServer
hosting {{hbase:meta}} was cut. A DDL issued from the other side (the node that
CAN reach the meta host) completes normally in ~1 second with a clean
{{CreateTableProcedure ... ADD_TO_META -> ASSIGN -> state=ENABLED -> SUCCESS}}.
h2. Evidence (reproduced locally, HMaster logs captured)
In a run where {{hbase:meta}} was hosted on RegionServer R1 and the partition
cut R1 from R2:
* A {{clone_table_schema}} issued from a client on *R2* (cannot reach R1/meta):
client returns success after ~60 s; the complete HMaster log over the whole
window contains *zero* mentions of the target table; {{exists}} = false; 0 rows
in {{hbase:meta}}. -> silent loss.
* A {{clone_table_schema}} issued from a client on *R1* (meta is local):
HMaster log shows {{Client=... create 'T'}} -> {{CreateTableProcedure pid=NNN:
CREATE_TABLE_PRE_OPERATION -> WRITE_FS_LAYOUT -> ADD_TO_META -> ASSIGN ->
state=ENABLED -> SUCCESS}} in ~1 s; table exists. -> normal.
The two cases differ only in which side of the partition the issuing client
sits on relative to the meta host. Whichever DDL is issued from the
meta-unreachable side is silently lost with a success return code; whichever is
issued from the meta-reachable side succeeds.
h2. What is established vs. the one open detail
Established (reproduced, with logs):
* A DDL call returns success while no procedure is ever registered on the
master and the table is never created.
* The trigger is the issuing client's node being unable to reach the
RegionServer hosting {{hbase:meta}} (a RegionServer-to-RegionServer partition),
with master and ZK quorum healthy.
* The ~60 s duration indicates the client retries an operation that needs meta
access (unreachable) and ultimately returns success instead of surfacing the
failure.
Open detail (the only thing not yet pinned): the exact point in the
client/Admin path that converts the meta-unreachable failure into a success
return - e.g. the {{clone_table_schema}} source-descriptor read or the
{{createTable}} pre-flight giving up after retries and returning normally
rather than throwing. The observable contract violation (success returned,
nothing created) does not depend on which it is.
h2. Steps to Reproduce (single version, no special build)
# Start a 3-node cluster (node0 = master, node1 + node2 = RegionServers), any
single version (reproduced on 2.5.13 and 2.6.4).
# Ensure {{hbase:meta}} is hosted on node1 (e.g. {{move}} it there if needed;
confirm with {{scan 'hbase:meta'}} / the master UI).
# Create a source table to clone: {{create 'S', {NAME => 'cf'}}}.
# Partition node1 <-> node2 only (e.g. {{iptables -A INPUT -s <peer> -j DROP}}
both directions). Do NOT touch node0, so the master and the ZK quorum stay
healthy.
# From a client/shell running on *node2* (the side cut off from meta on node1):
{{clone_table_schema 'S', 'T'}} (or {{create 'T', {NAME => 'cf'}}}).
Observe:
* The command blocks ~60 s, then returns success / {{exitValue = 0}} (no
exception).
* The HMaster log (node0) contains *no* CreateTableProcedure for {{T}}.
* {{exists 'T'}} => false; {{scan 'hbase:meta'}} has no rows for {{T}}.
Control: issuing the same {{clone_table_schema 'S', 'T2'}} from a client on
*node1* (meta is local) creates {{T2}} normally in ~1 s. Healing the partition
does not retroactively create {{T}} - the loss is durable.
Expected: a DDL operation must not return success unless the table was actually
created and is durably present in {{hbase:meta}}. If it cannot reach the
metadata it needs, it must fail visibly (timeout/exception), not return success.
Actual: success is returned with no procedure registered and no table created.
h2. Root cause pointers
* Shell DDL is a thin wrapper over {{Admin}}: {{clone_table_schema}} ->
{{HBaseAdmin.cloneTableSchema}} (reads the source descriptor, then calls
{{createTable}}); {{create}} -> {{HBaseAdmin.createTableAsync}} ->
{{MasterRpcServices.createTable}} -> {{HMaster.createTable}} ->
{{CreateTableProcedure}}; {{truncate_preserve}} ->
{{HBaseAdmin.truncateTable(name, true)}}.
* The {{createTable}}/{{truncateTable}} futures are specified to wait for the
table to be enabled and all regions online before returning. In the failing
case the future returns success though no procedure ran - so the success path
is reachable without the procedure ever being submitted/acknowledged.
* Suspect area: the client/Admin retry path for the meta-dependent step
(source-descriptor read or createTable submission) when the meta-hosting
RegionServer is unreachable - it appears to exhaust retries (~60 s) and return
normally instead of throwing. A maintainer with the captured HMaster + client
logs (available on request) can confirm the exact branch.
h2. Suggested fixes
# A DDL Admin call must surface failure (timeout/exception) when the underlying
meta-dependent step cannot complete, rather than returning success. Returning
success implies the table is durably created and visible in {{hbase:meta}}.
# Verify the post-condition before returning success: confirm the table exists
in {{hbase:meta}} (and reached the expected enabled/region state), not merely
that the local call sequence returned.
# Add a fault-injection regression test: 3-node cluster, partition a
RegionServer from the meta-hosting RegionServer, issue create/clone/truncate
from the cut-off side, and assert the call FAILS (or the table is durably
created) - it must never return success with no table created.
h2. Additional context
* The bug requires no version change and no special configuration - it
reproduces on a single-version cluster (Steps to Reproduce above). It was
originally surfaced by a differential test harness that compares independent
clusters running the same plan under the same partition; because the partition
silently dropped DDL from one side, the clusters ended with divergent table
sets (one cluster missing table A, another missing table B, a truncated table's
enabled state differing) - all explained by the single rule above: a DDL from
the meta-unreachable side is silently lost with a success return code.
* We can provide the captured per-lane HMaster logs (meta-unreachable side:
zero mentions of the table; meta-reachable side: clean CreateTableProcedure
SUCCESS), the RegionServer log showing the meta location, and the client-side
timing showing the ~60 s success return.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)