Hi, Sujatha! Note, this is a review of a combined diff dc92235f21e + f1db957c5da
On Mar 14, Sujatha wrote: > revision-id: f1db957c5da (mariadb-10.5.2-247-gf1db957c5da) > parent(s): dc92235f21e > author: Sujatha <[email protected]> > committer: Andrei Elkin <[email protected]> > timestamp: 2020-12-21 16:10:46 +0200 > message: > > MDEV-21469: Implement crash-safe binary logging of the user XA > > Make XA PREPARE, XA COMMIT and XA ROLLBACK statements > crash-safe when --log-bin is specified. > > At execution of XA PREPARE, XA COMMIT and XA ROLLBACK their replication > events are made written into the binary log prior to execution > of the commands in storage engines. > In case the server crashes, after writing to binary log but before all > involved engines have processed the command, the following recovery > will execute the command's replication events to equalize the states > of involved engines with that of binlog. > That applies to all XA PREPARE *group* and XA COMMIT or ROLLBACK. > > On the implementation level the recovery time binary log parsing > is augmented to pay attention to > the user XA xids to identify the XA transactions' state:s in binary log, and > eventually match them against their states in engines, see > MYSQL_BIN_LOG::recover_explicit_xa_prepare(). > In discrepancy cases the outdated state in the engines is corrected with > resubmitting the transaction prepare group of events, or completion > ones. The multi-engine partly prepared XA PREPARE case > the XA is rolled back first. > The fact of multiple-engine involved is registered into > Gtid_log_event::flags2 as one bit. The boolean value is > sufficient and precise to deal with two engines in XA transaction. > With more than 2 recoverable engines the flag method is still correct though > may be pessimistic, as it treats all recoverable engines as XA > participants. So when the number of such Engines exceeds the number > of prepared engines of the XA that XA is treated > as partially completed, with all that ensued. > As an optimization no new bit is allocated in flags2, instead a > pre-existing ones (of MDEV-742) are reused, observing that > A. XA "COMPLETE" does not require multi-engine hint for its recovery and that > B. the MDEV-742 XA-completion bit is not anyhow used by > XA-PREPARE and its GTID log event. > > Notice the multi-engine flagging is superceded by MDEV-21117 extension > in Gtid log event so this part should be taken from there. > diff --git > a/mysql-test/suite/binlog/t/binlog_xa_multi_binlog_crash_recovery.test > b/mysql-test/suite/binlog/t/binlog_xa_multi_binlog_crash_recovery.test > new file mode 100644 > index 00000000000..d0852de2fcc > --- /dev/null > +++ b/mysql-test/suite/binlog/t/binlog_xa_multi_binlog_crash_recovery.test > @@ -0,0 +1,86 @@ > +# ==== Purpose ==== > +# > +# Test verifies that XA crash recovery works fine across multiple binary > logs. > +# > +# ==== Implementation ==== > +# > +# Steps: > +# 0 - Generate an explicit XA transaction. Using debug simulation hold the "debug sync" != "debug sim" :) > +# execution of XA PREPARE statement after the XA PREPARE is written to > +# the binary log. With this the prepare will not be done in engine. > +# 1 - By executing FLUSH LOGS generate multiple binary logs. > +# 2 - Now make the server to disappear at this point. > +# 3 - Restart the server. During recovery the XA PREPARE from the binary > +# log will be read. It is cross checked with engine. Since it is not > +# present in engine it will be executed once again. > +# 4 - When server is up execute XA RECOVER to check that the XA is > +# prepared in engine as well. > +# 5 - XA COMMIT the transaction and check the validity of the data. > +# > +# ==== References ==== > +# > +# MDEV-21469: Implement crash-safe logging of the user XA > +# > + > +--source include/have_innodb.inc > +--source include/have_debug.inc > +--source include/have_debug_sync.inc > +--source include/have_log_bin.inc > + > +RESET MASTER; > + > +CREATE TABLE t1 (a INT PRIMARY KEY, b MEDIUMTEXT) ENGINE=Innodb; > +CALL mtr.add_suppression("Found 1 prepared XA transactions"); > + > +connect(con1,localhost,root,,); > +XA START 'xa1'; > +INSERT INTO t1 SET a=1; > +SET DEBUG_SYNC= "simulate_hang_after_binlog_prepare SIGNAL con1_ready > WAIT_FOR con1_go"; > +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_prepare"; > +XA END 'xa1'; > +--send XA PREPARE 'xa1'; > + > +connection default; > +SET DEBUG_SYNC= "now WAIT_FOR con1_ready"; > +FLUSH LOGS; > +FLUSH LOGS; > +FLUSH LOGS; > + > +--source include/show_binary_logs.inc > +--let $binlog_file= master-bin.000004 > +--let $binlog_start= 4 > +--source include/show_binlog_events.inc > + > +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect > +wait I don't see why you'd need to wait first, it'd be simpler to restart right away. > +EOF > + > +--error 0,2013 > +SET DEBUG_SYNC= "now SIGNAL con1_go"; > +--source include/wait_until_disconnected.inc > + > +--connection con1 > +--error 2013 > +--reap > +--source include/wait_until_disconnected.inc > + > +# > +# Server restart > +# > +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect > +restart > +EOF > + > +connection default; > +--enable_reconnect > +--source include/wait_until_connected_again.inc > + > +XA RECOVER; > +XA COMMIT 'xa1'; > + > +SELECT * FROM t1; > + > +# Clean up. > +connection default; > +DROP TABLE t1; > + > diff --git a/mysql-test/suite/rpl/t/rpl_xa_commit_crash_safe.test > b/mysql-test/suite/rpl/t/rpl_xa_commit_crash_safe.test > new file mode 100644 > index 00000000000..e972e3f09de > --- /dev/null > +++ b/mysql-test/suite/rpl/t/rpl_xa_commit_crash_safe.test > @@ -0,0 +1,98 @@ > +# ==== Purpose ==== > +# > +# Test verifies that XA COMMIT statements are crash safe. > +# > +# ==== Implementation ==== > +# > +# Steps: > +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'. > +# 'xa1' will be prepared and committed. > +# 1 - For 'xa2' let the XA COMMIT be done in binary log and crash the > +# server so that it is not committed in engine. > +# 2 - Restart the server. The recovery code should successfully recover > +# 'xa2'. The COMMIT should be executed during recovery. > +# 3 - Check the data in table. Both rows should be present in table. > +# 4 - Trying to commit 'xa2' should report unknown 'XA' error as COMMIT is > +# already complete during recovery. > +# > +# ==== References ==== > +# > +# MDEV-21469: Implement crash-safe logging of the user XA > + > + > +--source include/have_innodb.inc > +--source include/master-slave.inc master-slave must be always included last > +--source include/have_debug.inc > + > +connect (master2,localhost,root,,); why do you need master2 connection? > +--connection master > +CALL mtr.add_suppression("Found 1 prepared XA transactions"); > + > +CREATE TABLE t ( f INT ) ENGINE=INNODB; > +XA START 'xa1'; > +INSERT INTO t VALUES (20); > +XA END 'xa1'; > +XA PREPARE 'xa1'; > +XA COMMIT 'xa1'; > +--sync_slave_with_master > +--source include/stop_slave.inc > + > +--connection master1 > +XA START 'xa2'; > +INSERT INTO t VALUES (40); > +XA END 'xa2'; > +XA PREPARE 'xa2'; > + > +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect > +wait > +EOF Same. Why do you wait? > + > +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_commit_or_rollback"; > +--error 2013 # CR_SERVER_LOST > +XA COMMIT 'xa2'; > +--source include/wait_until_disconnected.inc > + > +--connection master1 > +--source include/wait_until_disconnected.inc > + > +--connection master > +--source include/wait_until_disconnected.inc > + > +# > +# Server restart > +# > +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect > +restart > +EOF > + > +connection default; > +--enable_reconnect > +--source include/wait_until_connected_again.inc > + > +# rpl_end.inc needs to use the connection server_1 > +connection server_1; > +--enable_reconnect > +--source include/wait_until_connected_again.inc > + > +--connection master > +--enable_reconnect > +--source include/wait_until_connected_again.inc > + > +--connection slave > +--source include/start_slave.inc > +--sync_with_master > + > +--connection master > +SELECT * FROM t; > +XA RECOVER; > +--error 1397 # ER_XAER_NOTA > +XA COMMIT 'xa2'; > +SELECT * FROM t; > +--sync_slave_with_master > + > +SELECT * FROM t; > + > +--connection master > +DROP TABLE t; > +--sync_slave_with_master > +--source include/rpl_end.inc > diff --git a/mysql-test/suite/rpl/t/rpl_xa_event_apply_failure.test > b/mysql-test/suite/rpl/t/rpl_xa_event_apply_failure.test > new file mode 100644 > index 00000000000..14cebbd9b13 > --- /dev/null > +++ b/mysql-test/suite/rpl/t/rpl_xa_event_apply_failure.test > @@ -0,0 +1,119 @@ > +# ==== Purpose ==== > +# > +# Test verifies that if for some reason an event cannot be applied during > +# recovery, appropriate error is reported. > +# > +# ==== Implementation ==== > +# > +# Steps: > +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'. > +# 'xa1' will be prepared and committed. > +# 1 - For 'xa2' let the XA COMMIT be done in binary log and crash the > +# server so that it is not committed in engine. > +# 2 - Restart the server. Using debug simulation point make XA COMMIT > 'xa2' > +# execution to fail. The server will resume anyway > +# to leave the error in the errlog (see "Recovery: Error.."). > +# 3 - Work around the simulated failure with Commit once again > +# from a connection that turns OFF binlogging. > +# Slave must catch up with the master. > +# > +# ==== References ==== > +# > +# MDEV-21469: Implement crash-safe logging of the user XA > + > + > +--source include/have_innodb.inc > +--source include/master-slave.inc master-slave must be always included last > +--source include/have_debug.inc > + > +connect (master2,localhost,root,,); why do you need master2 connection? > +--connection master > +CALL mtr.add_suppression("Found 1 prepared XA transactions"); > +CALL mtr.add_suppression("Failed to execute binlog query event"); > +CALL mtr.add_suppression("Recovery: Error .Out of memory.."); > +CALL mtr.add_suppression("Crash recovery failed."); > +CALL mtr.add_suppression("Can.t init tc log"); > +CALL mtr.add_suppression("Aborting"); > + > +CREATE TABLE t ( f INT ) ENGINE=INNODB; > +XA START 'xa1'; > +INSERT INTO t VALUES (20); > +XA END 'xa1'; > +XA PREPARE 'xa1'; > +XA COMMIT 'xa1'; > +--sync_slave_with_master > +--source include/stop_slave.inc > + > +--connection master1 > +XA START 'xa2'; > +INSERT INTO t VALUES (40); > +XA END 'xa2'; > +XA PREPARE 'xa2'; > + > +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect > +wait ditto > +EOF > + > +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_commit_or_rollback"; > +--error 2013 # CR_SERVER_LOST > +XA COMMIT 'xa2'; > +--source include/wait_until_disconnected.inc > + > +--connection master1 > +--source include/wait_until_disconnected.inc > + > +--connection master > +--source include/wait_until_disconnected.inc > + > +# > +# Server restart > +# > +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect > +restart: --debug-dbug=d,trans_xa_commit_fail > +EOF > + > +connection default; > +--source include/wait_until_disconnected.inc > + > +connection default; > +--enable_reconnect > +--source include/wait_until_connected_again.inc > + > +--connection master > +--enable_reconnect > +--echo *** must be no 'xa2' commit seen, as it's still prepared: > +SELECT * FROM t; > +XA RECOVER; > + > +# Commit it manually now to work around the extra binlog record > +# by turning binlogging OFF by the connection. > + > +SET GLOBAL DEBUG_DBUG=""; > +SET SQL_LOG_BIN=0; > +--error 0 why error 0? What will happen if you don't disable binlog? > +XA COMMIT 'xa2'; > +SET SQL_LOG_BIN=1; > + > + > +# rpl_end.inc needs to use the connection server_1 > +connection server_1; > +--enable_reconnect > +--source include/wait_until_connected_again.inc > + > +--connection master > +--source include/wait_until_connected_again.inc > + > +--connection slave > +--source include/start_slave.inc > +--sync_with_master > + > +--connection master > +SELECT * FROM t; > + > +--sync_slave_with_master > +SELECT * FROM t; > + > +--connection master > +DROP TABLE t; > +--sync_slave_with_master > +--source include/rpl_end.inc > diff --git a/mysql-test/suite/rpl/t/rpl_xa_prepare_commit_prepare.test > b/mysql-test/suite/rpl/t/rpl_xa_prepare_commit_prepare.test > new file mode 100644 > index 00000000000..7b987c7f29b > --- /dev/null > +++ b/mysql-test/suite/rpl/t/rpl_xa_prepare_commit_prepare.test > @@ -0,0 +1,95 @@ > +# ==== Purpose ==== > +# > +# Test verifies that XA PREPARE transactions are crash safe. > +# > +# ==== Implementation ==== > +# > +# Steps: > +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'. > +# 'xa1' will be prepared and committed. > +# 1 - For 'xa2' let the XA PREPARE be done in binary log and crash the > +# server so that it is not prepared in engine. > +# 2 - Restart the server. The recovery code should successfully recover > +# 'xa2'. > +# 3 - When server is up, execute XA RECOVER and verify that 'xa2' is > +# present. > +# 4 - Commit the XA transaction and verify its correctness. > +# > +# ==== References ==== > +# > +# MDEV-21469: Implement crash-safe logging of the user XA > + > +--source include/have_innodb.inc > +--source include/master-slave.inc ditto > +--source include/have_debug.inc > + > +connect (master2,localhost,root,,); ditto > +--connection master > +CALL mtr.add_suppression("Found 1 prepared XA transactions"); > + > +CREATE TABLE t ( f INT ) ENGINE=INNODB; > +XA START 'xa1'; > +INSERT INTO t VALUES (20); > +XA END 'xa1'; > +XA PREPARE 'xa1'; > +XA COMMIT 'xa1'; > +--sync_slave_with_master > +--source include/stop_slave.inc > + > +--connection master1 > +XA START 'xa2'; > +INSERT INTO t VALUES (40); > +XA END 'xa2'; > + > +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect > +wait ditto > +EOF > + > +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_prepare"; > +--error 2013 # CR_SERVER_LOST > +XA PREPARE 'xa2'; > +--source include/wait_until_disconnected.inc > + > +--connection master1 > +--source include/wait_until_disconnected.inc > + > +--connection master > +--source include/wait_until_disconnected.inc > + > +# > +# Server restart > +# > +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect > +restart > +EOF > + > +connection default; > +--enable_reconnect > +--source include/wait_until_connected_again.inc > + > +# rpl_end.inc needs to use the connection server_1 > +connection server_1; > +--enable_reconnect > +--source include/wait_until_connected_again.inc > + > +--connection master > +--enable_reconnect > +--source include/wait_until_connected_again.inc > + > +--connection slave > +--source include/start_slave.inc > +--sync_with_master > + > +--connection master > +SELECT * FROM t; > +XA RECOVER; > +XA COMMIT 'xa2'; > +SELECT * FROM t; > +--sync_slave_with_master > + > +SELECT * FROM t; > + > +--connection master > +DROP TABLE t; > +--sync_slave_with_master > +--source include/rpl_end.inc > diff --git a/mysql-test/suite/rpl/t/rpl_xa_prepare_crash_safe.test > b/mysql-test/suite/rpl/t/rpl_xa_prepare_crash_safe.test > new file mode 100644 > index 00000000000..9d2c5cce528 > --- /dev/null > +++ b/mysql-test/suite/rpl/t/rpl_xa_prepare_crash_safe.test > @@ -0,0 +1,117 @@ > +# ==== Purpose ==== > +# > +# Test verifies that XA PREPARE transactions are crash safe. > +# > +# ==== Implementation ==== > +# > +# Steps: > +# 0 - Generate 3 explicit XA transactions. 'xa1', 'xa2' and 'xa3'. > +# Using debug simulation hold the execution of second XA PREPARE > +# statement after the XA PREPARE is written to the binary log. > +# With this the prepare will not be done in engine. > +# 1 - For 'xa3' allow the PREPARE statement to be written to binary log > and > +# simulate server crash. > +# 2 - Restart the server. The recovery code should successfully recover > +# 'xa2' and 'xa3'. > +# 3 - When server is up, execute XA RECOVER and verify that 'xa2' and > 'xa3' > +# are present along with 'xa1'. > +# 4 - Commit all the XA transactions and verify their correctness. > +# > +# ==== References ==== > +# > +# MDEV-21469: Implement crash-safe logging of the user XA > + > + > +--source include/have_innodb.inc > +--source include/master-slave.inc ditto > +--source include/have_debug.inc > + > +connect (master2,localhost,root,,); ok, in this test you actually use master2 :) > +--connection master > +CALL mtr.add_suppression("Found 1 prepared XA transactions"); > +CALL mtr.add_suppression("Found 2 prepared XA transactions"); > +CALL mtr.add_suppression("Found 3 prepared XA transactions"); > + > +CREATE TABLE t ( f INT ) ENGINE=INNODB; > +XA START 'xa1'; > +INSERT INTO t VALUES (20); > +XA END 'xa1'; > +XA PREPARE 'xa1'; > +--sync_slave_with_master > +--source include/stop_slave.inc > + > +--connection master1 > +use test; > +xa start 'xa2'; > +insert into t values (30); > +xa end 'xa2'; > +SET DEBUG_SYNC="simulate_hang_after_binlog_prepare SIGNAL reached WAIT_FOR > go"; > +send xa prepare 'xa2'; > + > +--connection master2 > +let $wait_condition= > + SELECT COUNT(*) = 1 FROM INFORMATION_SCHEMA.PROCESSLIST > + WHERE STATE like "debug sync point: simulate_hang_after_binlog_prepare%"; > +--source include/wait_condition.inc eh? you can just 'WAIT_FOR signal', couldn't you? > + > +XA START 'xa3'; > +INSERT INTO t VALUES (40); > +XA END 'xa3'; > + > +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect > +wait ditto > +EOF > + > +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_prepare"; > +--error 2013 # CR_SERVER_LOST > +XA PREPARE 'xa3'; > +--source include/wait_until_disconnected.inc > + > +--connection master1 > +--error 2013 > +--reap > +--source include/wait_until_disconnected.inc > + > +--connection master > +--source include/wait_until_disconnected.inc > + > +# > +# Server restart > +# > +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect > +restart > +EOF > + > +connection default; > +--enable_reconnect > +--source include/wait_until_connected_again.inc > + > +# rpl_end.inc needs to use the connection server_1 > +connection server_1; > +--enable_reconnect > +--source include/wait_until_connected_again.inc > + > +--connection master > +--enable_reconnect > +--source include/wait_until_connected_again.inc > + > + > +--connection slave > +--source include/start_slave.inc > +--sync_with_master > + > +--connection master > +SELECT * FROM t; > +XA RECOVER; > +XA COMMIT 'xa1'; > +XA COMMIT 'xa2'; > +XA COMMIT 'xa3'; > +SELECT * FROM t; > +--sync_slave_with_master > + > +SELECT * FROM t; > + > +--connection master > +DROP TABLE t; > +--sync_slave_with_master > +--source include/rpl_end.inc > diff --git a/mysql-test/suite/rpl/t/rpl_xa_rollback_commit_crash_safe.test > b/mysql-test/suite/rpl/t/rpl_xa_rollback_commit_crash_safe.test > new file mode 100644 > index 00000000000..1d19f96116e > --- /dev/null > +++ b/mysql-test/suite/rpl/t/rpl_xa_rollback_commit_crash_safe.test > @@ -0,0 +1,97 @@ > +# ==== Purpose ==== > +# > +# Test verifies that XA COMMIT statements are crash safe. XA ROLLBACK > +# > +# ==== Implementation ==== > +# > +# Steps: > +# 0 - Generate 2 explicit XA transactions. 'xa1' and 'xa2'. > +# 'xa1' will be prepared and committed. > +# 1 - For 'xa2' let the XA ROLLBACK be done in binary log and crash the > +# server so that it is not committed in engine. not rolled back > +# 2 - Restart the server. The recovery code should successfully recover > +# 'xa2'. The ROLLBACK should be executed during recovery. > +# 3 - Check the data in table. Only one row should be present in table. > +# 4 - Trying to rollback 'xa2' should report unknown 'XA' error as > rollback > +# is already complete during recovery. > +# > +# ==== References ==== > +# > +# MDEV-21469: Implement crash-safe logging of the user XA > + > +--source include/have_innodb.inc > +--source include/master-slave.inc ditto > +--source include/have_debug.inc > + > +connect (master2,localhost,root,,); unused again > +--connection master > +CALL mtr.add_suppression("Found 1 prepared XA transactions"); > + > +CREATE TABLE t ( f INT ) ENGINE=INNODB; > +XA START 'xa1'; > +INSERT INTO t VALUES (20); > +XA END 'xa1'; > +XA PREPARE 'xa1'; > +XA COMMIT 'xa1'; > +--sync_slave_with_master > +--source include/stop_slave.inc > + > +--connection master1 > +XA START 'xa2'; > +INSERT INTO t VALUES (40); > +XA END 'xa2'; > +XA PREPARE 'xa2'; > + > +--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect > +wait ditto > +EOF > + > +SET GLOBAL DEBUG_DBUG="d,simulate_crash_after_binlog_commit_or_rollback"; > +--error 2013 # CR_SERVER_LOST > +XA ROLLBACK 'xa2'; > +--source include/wait_until_disconnected.inc > + > +--connection master1 > +--source include/wait_until_disconnected.inc > + > +--connection master > +--source include/wait_until_disconnected.inc > + > +# > +# Server restart > +# > +--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect > +restart > +EOF > + > +connection default; > +--enable_reconnect > +--source include/wait_until_connected_again.inc > + > +# rpl_end.inc needs to use the connection server_1 > +connection server_1; > +--enable_reconnect > +--source include/wait_until_connected_again.inc > + > +--connection master > +--enable_reconnect > +--source include/wait_until_connected_again.inc > + > +--connection slave > +--source include/start_slave.inc > +--sync_with_master > + > +--connection master > +SELECT * FROM t; > +XA RECOVER; > +--error 1397 # ER_XAER_NOTA > +XA ROLLBACK 'xa2'; > +SELECT * FROM t; > +--sync_slave_with_master > + > +SELECT * FROM t; > + > +--connection master > +DROP TABLE t; > +--sync_slave_with_master > +--source include/rpl_end.inc > diff --git > a/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.test > b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.test > new file mode 100644 > index 00000000000..c7cefbda4bf > --- /dev/null > +++ b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.test > @@ -0,0 +1,46 @@ > +# MDEV-742, MDEV-21469 XA replication, and xa crash-safe. > +# Tests prove xa state is recovered to a prepared or completed state > +# upon post-crash recovery, incl a multi-engine case. > + > +--source include/have_rocksdb.inc > +--source include/have_innodb.inc > +--source include/have_debug.inc > +--source include/master-slave.inc ditto > +--source include/have_binlog_format_row.inc > + > +--connection slave > +--source include/stop_slave.inc > + > +--connection master > +CALL mtr.add_suppression("Found . prepared XA transactions"); > +CALL mtr.add_suppression("Failed to execute binlog query event"); > +CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=rocksdb; > +CREATE TABLE t2 (a INT PRIMARY KEY) ENGINE=innodb; > +INSERT INTO t1 SET a = 1; > +INSERT INTO t2 SET a = 1; > + > +--let $xa=xa1 > +--let $when=after_binlog > +--source rpl_rocksdb_xa_recover.inc > + > +--let $xa=xa2 > +--let $when=after_first_engine > +--let $finally_expected=$xa in the list > +--source rpl_rocksdb_xa_recover.inc > + > +--connection slave > +--source include/start_slave.inc > + > +--connection master > +--sync_slave_with_master > +let $diff_tables= master:t1, slave:t1; > +source include/diff_tables.inc; > +let $diff_tables= master:t2, slave:t2; > +source include/diff_tables.inc; > + > +--connection master > +SET SESSION DEBUG_DBUG=""; > +drop table t1,t2; > +--sync_slave_with_master > + > +--source include/rpl_end.inc > diff --git > a/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.inc > b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.inc > new file mode 100644 > index 00000000000..312b8b6a19e > --- /dev/null > +++ b/storage/rocksdb/mysql-test/rocksdb_rpl/t/rpl_rocksdb_xa_recover.inc > @@ -0,0 +1,67 @@ > +# callee of rpl_rocksdb_xa_recover.test > +# requires t1,t2 as defined in the caller. > + > +--let $at=_prepare > +--let $finally_expected=$xa in the list > +--let $init_val=`SELECT MAX(a) from t1 as t_1` why as t_1 ? > +--eval XA START '$xa' > +--eval INSERT INTO t1 SET a=$init_val + 1 > +--eval INSERT INTO t2 SET a=$init_val + 1 > +--eval XA END '$xa' > +--eval SET SESSION DEBUG_DBUG="d,simulate_crash_$when$at" > + > +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect > +--error 2013 # CR_SERVER_LOST > +--eval XA PREPARE '$xa' > +--source include/wait_until_disconnected.inc > +--let $rpl_server_number = 1 > +--source include/rpl_reconnect.inc > +--echo "*** $finally_expected" > +XA RECOVER; > +--eval XA ROLLBACK '$xa' > +--eval SELECT MAX(a) - $init_val as zero FROM t1 > +--eval SELECT MAX(a) - $init_val as zero FROM t2 > + > + > +--let $at=_commit_or_rollback > +--let $finally_expected=empty list (rolled back) > +--eval XA START '$xa' > +--eval INSERT INTO t1 SET a=$init_val + 2 > +--eval INSERT INTO t2 SET a=$init_val + 2 > +--eval XA END '$xa' > +--eval XA PREPARE '$xa' > + > +--eval SET SESSION DEBUG_DBUG="d,simulate_crash_$when$at" > +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect > +--error 2013 # CR_SERVER_LOST > +--eval XA ROLLBACK '$xa' > +--source include/wait_until_disconnected.inc > +--let $rpl_server_number = 1 > +--source include/rpl_reconnect.inc > + > +--echo "*** $finally_expected" > +XA RECOVER; > +--eval SELECT MAX(a) - $init_val as zero FROM t1 > +--eval SELECT MAX(a) - $init_val as zero FROM t2 > + > + > +--let $at=_commit_or_rollback > +--let $finally_expected=empty list (committed away) > +--eval XA START '$xa' > +--eval INSERT INTO t1 SET a=$init_val + 3 > +--eval INSERT INTO t2 SET a=$init_val + 3 > +--eval XA END '$xa' > +--eval XA PREPARE '$xa' > + > +--eval SET SESSION DEBUG_DBUG="d,simulate_crash_$when$at" > +--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect > +--error 2013 # CR_SERVER_LOST > +--eval XA COMMIT '$xa' > +--source include/wait_until_disconnected.inc > +--let $rpl_server_number = 1 > +--source include/rpl_reconnect.inc > + > +--echo "*** $finally_expected" > +XA RECOVER; > +--eval SELECT MAX(a) - $init_val as three FROM t1 > +--eval SELECT MAX(a) - $init_val as three FROM t2 > diff --git a/sql/log_event.h b/sql/log_event.h > index 639cbfbe7aa..e1880339e85 100644 > --- a/sql/log_event.h > +++ b/sql/log_event.h > @@ -3614,6 +3616,12 @@ class Gtid_log_event: public Log_event > static const uchar FL_PREPARED_XA= 64; > /* FL_"COMMITTED or ROLLED-BACK"_XA is set for XA transaction. */ > static const uchar FL_COMPLETED_XA= 128; > + /* > + To mark the fact of multiple transactional engine participants > + in the prepared XA. The FL_COMPLETED_XA bit is reused by > XA_PREPARE_LOG_EVENT, What do you mean by that? XA_prepare_log_event does not inherit from Gtid_log_event, it cannot use Gtid_log_event flags. And, please, explain in a comment why you can reuse a bit for two different flags. Like "the ambiguity between FL_COMPLETED_XA and FL_MULTI_ENGINE_XA is resolved by the flag FL_PREPARED_XA. if it's set, then 128 means FL_MULTI_ENGINE_XA, if it's not set, then 128 means FL_COMPLETED_XA" > + oth the XA completion events do not need such marking. other > + */ > + static const uchar FL_MULTI_ENGINE_XA= 128; > > #ifdef MYSQL_SERVER > Gtid_log_event(THD *thd_arg, uint64 seq_no, uint32 domain_id, bool > standalone, > diff --git a/sql/handler.h b/sql/handler.h > index 0eff7bd930d..9acdd47bfc2 100644 > --- a/sql/handler.h > +++ b/sql/handler.h > @@ -1880,8 +1891,18 @@ class Ha_trx_info > m_ht= ht_arg; > m_flags= (int) TRX_READ_ONLY; /* Assume read-only at start. */ > > - m_next= trans->ha_list; > - trans->ha_list= this; > + if (likely(!past_head)) > + { > + m_next= trans->ha_list; > + trans->ha_list= this; > + } > + else > + { > + DBUG_ASSERT(trans->ha_list); > + > + m_next= trans->ha_list->m_next; > + trans->ha_list->m_next= this; > + } I don't like this fake generalization. There's no point for an arbitrary engine to be put at a specific place in the list. Only binlog is special. At least, remove this new argument and check for binlog_hton here. But really, think of this - binlog is so special, may be it'd be easier not to pretend it's a storage engine at all? No need for a fake binlog_hton, no need to register it, etc. It kind of made sense in 2005, but it doesn't necessarily make it now. > } > > /** Clear, prepare for reuse. */ > diff --git a/sql/handler.cc b/sql/handler.cc > index ffb89b30d92..b981294f0d6 100644 > --- a/sql/handler.cc > +++ b/sql/handler.cc > @@ -1338,6 +1345,9 @@ int ha_prepare(THD *thd) > handlerton *ht= ha_info->ht(); > if (ht->prepare) > { > + DBUG_EXECUTE_IF("simulate_crash_after_first_engine_prepare", > + if (!ha_info->next()) DBUG_SUICIDE();); looks like not simulate_crash_after_first_engine_prepare, but more like simulate_crash_before_last_engine_prepare. it's the same point in time if you have only two engines, but the code still implements "before last", not "after first" > + > if (unlikely(prepare_or_error(ht, thd, all))) > { > ha_rollback_trans(thd, all); > @@ -1368,22 +1378,22 @@ int ha_prepare(THD *thd) > } > > /* > - Like ha_check_and_coalesce_trx_read_only to return counted number of > - read-write transaction participants limited to two, but works in the 'all' > - context. > - Also returns the last found rw ha_info through the 2nd argument. > + Returns counted number of > + read-write recoverable transaction participants optionally limited to two. > + Also optionally returns the last found rw ha_info through the 2nd argument. > */ > -uint ha_count_rw_all(THD *thd, Ha_trx_info **ptr_ha_info) > +uint ha_count_rw_all(THD *thd, Ha_trx_info **ptr_ha_info, bool count_through) I don't understand why you need a new argument. ptr_ha_info==NULL means don't return a value there, this is a standard convention used everywhere. also, it's not an "count rw all" it's are there more rw-all htons than N where N can be 1 or 2. So: * either take N as an argument and return as soon as rw_ha_count > N * or don't overoptimize here and just return full count every time > { > unsigned rw_ha_count= 0; > > for (auto ha_info= thd->transaction.all.ha_list; ha_info; > ha_info= ha_info->next()) > { > - if (ha_info->is_trx_read_write()) > + if (ha_info->is_trx_read_write() && ha_info->ht()->recover) > { > - *ptr_ha_info= ha_info; > - if (++rw_ha_count > 1) > + if (ptr_ha_info) > + *ptr_ha_info= ha_info; > + if (++rw_ha_count > 1 && !count_through) > break; > } > } > @@ -1876,6 +1886,10 @@ commit_one_phase_2(THD *thd, bool all, THD_TRANS > *trans, bool is_real_trans) > { > int err; > handlerton *ht= ha_info->ht(); > + > + DBUG_EXECUTE_IF("simulate_crash_after_first_engine_commit_or_rollback", > + if (!ha_info->next()) DBUG_SUICIDE();); 1. same as above 2. why not to use *different* names for these two crashes? in case, you know, you'll want to crash at a specific point in the code, not just somewhere? And in a case you'll actually want to crash just somewhere you can always use "+d,simulate_crash_after_first_engine_commit,simulate_crash_after_first_engine_rollback" (and it should be "before_last", of course, not "after_first") > + > if ((err= ht->commit(ht, thd, all))) > { > my_error(ER_ERROR_DURING_COMMIT, MYF(0), err); > @@ -1987,6 +2001,10 @@ int ha_rollback_trans(THD *thd, bool all) > { > int err; > handlerton *ht= ha_info->ht(); > + > + DBUG_EXECUTE_IF("simulate_crash_after_first_engine_commit_or_rollback", > + if (!ha_info->next()) DBUG_SUICIDE();); ditto > + > if ((err= ht->rollback(ht, thd, all))) > { // cannot happen > my_error(ER_ERROR_DURING_ROLLBACK, MYF(0), err); > @@ -2098,8 +2116,24 @@ int ha_commit_or_rollback_by_xid(XID *xid, bool commit) > xaop.xid= xid; > xaop.result= 1; > > + if (binlog_hton->recover) > + { > + /* > + When the binlogging service is enabled complete the transaction > + by it first. > + */ > + if (commit) > + binlog_hton->commit_by_xid(binlog_hton, xid); > + else > + binlog_hton->rollback_by_xid(binlog_hton, xid); > + } > plugin_foreach(NULL, commit ? xacommit_handlerton : xarollback_handlerton, > MYSQL_STORAGE_ENGINE_PLUGIN, &xaop); > + if (binlog_hton->recover) > + { > + THD *thd= current_thd; > + thd->reset_binlog_completed_by_xid(); > + } 1. this, precisely, shows that binlog does not need a hton anymoreo 2. how can binlog_hton->recover be NULL here? > > return xaop.result; > } > @@ -2222,6 +2258,7 @@ static my_bool xarecover_handlerton(THD *unused, > plugin_ref plugin, > > if (hton->recover) > { > + info->recover_htons++; don't forget to subtract 1 for binlog_hton->recover, otherwise it'll make you to rollback everything with a FL_MULTI_ENGINE_XA flag > while ((got= hton->recover(hton, info->list, info->len)) > 0 ) > { > sql_print_information("Found %d prepared transaction(s) in %s", > @@ -2263,7 +2300,21 @@ static my_bool xarecover_handlerton(THD *unused, > plugin_ref plugin, > _db_doprnt_("ignore xid %s", xid_to_str(buf, info->list[i])); > }); > xid_cache_insert(info->list + i); > + XID *foreign_xid= info->list + i; swap those two lines ^^^ then you can write xid_cache_insert(foreign_xid); > info->found_foreign_xids++; > + > + /* > + For each foreign xid prepraed in engine, check if it is present > in > + xa_prepared_list of binlog. > + */ > + if (info->xa_prepared_list) for simplicity I'd rather always initialize xa_prepared_list and remove this if() > + { > + struct xa_recovery_member *member= NULL; > + if ((member= (xa_recovery_member *) > + my_hash_search(info->xa_prepared_list, foreign_xid->key(), > + foreign_xid->key_length()))) > + member->in_engine_prepare++; > + } > continue; > } > if (IF_WSREP(!(wsrep_emulate_bin_log && > @@ -2362,7 +2424,7 @@ int ha_recover(HASH *commit_list) > info.found_my_xids, opt_tc_log_file); > DBUG_RETURN(1); > } > - if (info.commit_list) > + if (info.commit_list && !info.found_foreign_xids) Why? after the finished crash recovery one can still have transactions in doubt. > sql_print_information("Crash recovery finished."); > DBUG_RETURN(0); > } > diff --git a/sql/log.cc b/sql/log.cc > index 731bb3e98f0..c69b8518cf4 100644 > --- a/sql/log.cc > +++ b/sql/log.cc > @@ -2019,28 +2027,49 @@ static int binlog_xa_recover_dummy(handlerton *hton > __attribute__((unused)), > return 0; > } > > - > +/* > + The function invokes binlog_commit() and returns its result > + when it has not yet called it already. > + binlog_cache_mngr::completed_by_xid remembers the fact of > + the 1st of maximum two subsequent calls. > +*/ > static int binlog_commit_by_xid(handlerton *hton, XID *xid) > { > + int rc= 0; > THD *thd= current_thd; > - > - (void) thd->binlog_setup_trx_data(); > + binlog_cache_mngr *cache_mngr= thd->binlog_setup_trx_data(); > > DBUG_ASSERT(thd->lex->sql_command == SQLCOM_XA_COMMIT); > > - return binlog_commit(hton, thd, TRUE); > -} > + if (!cache_mngr->completed_by_xid) On what code path can you have completed_by_xid == false here? > + { > + rc= binlog_commit(hton, thd, TRUE); > + cache_mngr->completed_by_xid= true; > + } > > + return rc; > +} > > +/* > + The function invokes binlog_rollback() and returns its results similarly > + to a commit branch. > +*/ > static int binlog_rollback_by_xid(handlerton *hton, XID *xid) > { > + int rc= 0; > THD *thd= current_thd; > - > - (void) thd->binlog_setup_trx_data(); > + binlog_cache_mngr *cache_mngr= thd->binlog_setup_trx_data(); > > DBUG_ASSERT(thd->lex->sql_command == SQLCOM_XA_ROLLBACK || > (thd->transaction.xid_state.get_state_code() == > XA_ROLLBACK_ONLY)); > - return binlog_rollback(hton, thd, TRUE); > + > + if (!cache_mngr->completed_by_xid) > + { > + rc= binlog_rollback(hton, thd, TRUE); > + cache_mngr->completed_by_xid= true; > + } > + > + return rc; > } > > > @@ -2195,6 +2224,12 @@ static int binlog_commit(handlerton *hton, THD *thd, > bool all) > if (!all) > cache_mngr->trx_cache.set_prev_position(MY_OFF_T_UNDEF); > > + DBUG_EXECUTE_IF("simulate_crash_after_binlog_commit_or_rollback", > + DBUG_SUICIDE();); > + DEBUG_SYNC(thd, "simulate_hang_after_binlog_prepare"); > + DBUG_EXECUTE_IF("simulate_crash_after_binlog_prepare", > + DBUG_SUICIDE();); wouldn't you say there's one DBUG_EXECUTE_IF too many? Just keep one, "simulate_crash_after_binlog_commit" > + > THD_STAGE_INFO(thd, org_stage); > DBUG_RETURN(error); > } > @@ -2298,6 +2333,9 @@ static int binlog_rollback(handlerton *hton, THD *thd, > bool all) > cache_mngr->trx_cache.set_prev_position(MY_OFF_T_UNDEF); > thd->reset_binlog_for_next_statement(); > > + DBUG_EXECUTE_IF("simulate_crash_after_binlog_commit_or_rollback", same as above, use unique dbug keywords, like "simulate_crash_after_binlog_rollback" (because it's rollback here) > + DBUG_SUICIDE();); > + > DBUG_RETURN(error); > } > > @@ -5749,6 +5794,14 @@ binlog_cache_mngr *THD::binlog_setup_trx_data() > DBUG_RETURN(cache_mngr); > } > > +/* > + XA completion flag resetter for ha_commit_or_rollback_by_xid(). > +*/ > +void THD::reset_binlog_completed_by_xid() > +{ > + binlog_setup_trx_data()->reset_completed_by_xid(); > +} I don't think this makes any sense as a THD method > + > /* > Function to start a statement and optionally a transaction for the > binary log. > @@ -10333,14 +10386,108 @@ start_binlog_background_thread() > return 0; > } > > +#ifdef HAVE_REPLICATION > +/** > + Auxiliary function for TC_LOG::recover(). > + @returns a successfully created and inserted @c xa_recovery_member > + into hash @c hash_arg, > + or NULL. > +*/ > +static xa_recovery_member* > +xa_member_insert(HASH *hash_arg, xid_t *xid_arg, xa_binlog_state state_arg, > + MEM_ROOT *ptr_mem_root) > +{ > + xa_recovery_member *member= (xa_recovery_member*) > + alloc_root(ptr_mem_root, sizeof(xa_recovery_member)); > + if (!member) > + return NULL; > + > + member->xid.set(xid_arg); > + member->state= state_arg; > + member->in_engine_prepare= 0; > + return my_hash_insert(hash_arg, (uchar*) member) ? NULL : member; > +} > + > +/* Inserts or updates an existing hash member with a proper state */ > +static bool xa_member_replace(HASH *hash_arg, xid_t *xid_arg, bool > is_prepare, > + MEM_ROOT *ptr_mem_root) > +{ > + if(is_prepare) > + { > + if (!(xa_member_insert(hash_arg, xid_arg, XA_PREPARE, ptr_mem_root))) > + return true; > + } > + else > + { > + /* > + Search if XID is already present in recovery_list. If found > + and the state is 'XA_PREPRAED' mark it as XA_COMPLETE. > + Effectively, there won't be XA-prepare event group replay. > + */ > + xa_recovery_member* member; > + if ((member= (xa_recovery_member *) > + my_hash_search(hash_arg, xid_arg->key(), xid_arg->key_length()))) > + { > + if (member->state == XA_PREPARE) > + member->state= XA_COMPLETE; > + } > + else // We found only XA COMMIT during recovery insert to list > + { > + if (!(member= xa_member_insert(hash_arg, > + xid_arg, XA_COMPLETE, ptr_mem_root))) > + return true; > + } > + } > + return false; > +} > +#endif > > +extern "C" uchar *xid_get_var_key(xid_t *entry, size_t *length, define the first argument as const char*, then you won't need to cast in my_hash_init, and you can be sure you pass the correct function with the correct number of arguments, even if my_hash_get_key changes in the future. and cast entry to xid_t* below explicitly. > + my_bool not_used __attribute__((unused))) > +{ > + *length= entry->key_length(); > + return (uchar*) entry->key(); > +} > + > +/** > + Performs recovery based on transaction coordinator log for 2pc. At the > + time of crash, if the binary log was in active state, then recovery for > + "implicit" 'xid's and explicit 'XA' transactions is initiated, > + otherwise merely the gtid binlog state is updated. > + For 'xid' and 'XA' based recovery the following steps are performed. > + > + Identify the active binlog checkpoint file. > + Scan the binary log from the beginning. > + From GTID_LIST and GTID_EVENTs reconstruct the gtid binlog state. > + Prepare a list of 'xid's for recovery. > + Prepare a list of explicit 'XA' transactions for recovery. > + Recover the 'xid' transactions. > + The explicit 'XA' transaction recovery is initiated once all the server > + components are initialized. Please check 'execute_xa_for_recovery()'. why explicit XA recovery is delayed? > + > + Called from @c MYSQL_BIN_LOG::do_binlog_recovery() > + > + @param linfo Store here the found log file name and position to > + the NEXT log file name in the index file. > + > + @param last_log_name Name of the last active binary log at the time of > + crash. > + > + @param first_log Pointer to IO_CACHE of active binary log > + @param fdle Format_description_log_event of active binary log > + @param do_xa Is 2pc recovery needed for 'xid's and explicit XA > + transactions. > + @return indicates success or failure of recovery. > + @retval 0 success > + @retval 1 failure > + > +*/ > int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, > IO_CACHE *first_log, > Format_description_log_event *fdle, bool do_xa) > { > Log_event *ev= NULL; > HASH xids; > - MEM_ROOT mem_root; > char binlog_checkpoint_name[FN_REFLEN]; > bool binlog_checkpoint_found; > bool first_round; > @@ -10533,10 +10693,22 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const > char *last_log_name, > > if (do_xa) > { > - if (ha_recover(&xids)) > + if (ha_recover(&xids, &xa_recover_list, &xa_recover_htons)) > goto err2; > > - free_root(&mem_root, MYF(0)); > + DBUG_ASSERT(!xa_recover_list.records || > + (binlog_checkpoint_found && binlog_checkpoint_name[0] != 0)); why is that? > + > + if (!xa_recover_list.records) > + { > + free_root(&mem_root, MYF(0)); > + my_hash_free(&xa_recover_list); > + } > + else > + { > + xa_binlog_checkpoint_name= strmake_root(&mem_root, > binlog_checkpoint_name, > + > strlen(binlog_checkpoint_name)); > + } > my_hash_free(&xids); > } > return 0; > @@ -10561,6 +10734,219 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const > char *last_log_name, > return 1; > } > > +void MYSQL_BIN_LOG::execute_xa_for_recovery() > +{ > + if (xa_recover_list.records) > + (void) recover_explicit_xa_prepare(); > + free_root(&mem_root, MYF(0)); > + my_hash_free(&xa_recover_list); > +}; > + > +/** > + Performs recovery of user XA transactions. > + 'xa_recover_list' contains the list of XA transactions to be recovered. > + with possible replaying replication event from the binary log. > + > + @return indicates success or failure of recovery. > + @retval false success > + @retval true failure > + > +*/ > +bool MYSQL_BIN_LOG::recover_explicit_xa_prepare() why is it bool, if you only use it in execute_xa_for_recovery() and ignore the return value? > +{ > +#ifndef HAVE_REPLICATION > + /* Can't be supported without replication applier built in. */ > + return false; > +#else > + bool err= true; > + int error=0; > + Relay_log_info *rli= NULL; > + rpl_group_info *rgi; > + THD *thd= new THD(0); /* Needed by start_slave_threads */ > + thd->thread_stack= (char*) &thd; > + thd->store_globals(); > + thd->security_ctx->skip_grants(); > + IO_CACHE log; > + const char *errmsg; > + File file; > + bool enable_apply_event= false; > + Log_event *ev = 0; > + LOG_INFO linfo; > + int recover_xa_count= xa_recover_list.records; > + xa_recovery_member *member= NULL; > + > + if (!(rli= thd->rli_fake= new Relay_log_info(FALSE, "Recovery"))) > + { > + my_error(ER_OUTOFMEMORY, MYF(ME_FATAL), 1); > + goto err2; > + } > + rli->sql_driver_thd= thd; > + static LEX_CSTRING connection_name= { STRING_WITH_LEN("Recovery") }; > + rli->mi= new Master_info(&connection_name, false); > + if (!(rgi= thd->rgi_fake)) > + rgi= thd->rgi_fake= new rpl_group_info(rli); > + rgi->thd= thd; > + thd->system_thread_info.rpl_sql_info= > + new rpl_sql_thread_info(rli->mi->rpl_filter); > + > + if (rli && !rli->relay_log.description_event_for_exec) > + { > + rli->relay_log.description_event_for_exec= > + new Format_description_log_event(4); > + } > + if (find_log_pos(&linfo, xa_binlog_checkpoint_name, 1)) > + { > + sql_print_error("Binlog file '%s' not found in binlog index, needed " > + "for recovery. Aborting.", xa_binlog_checkpoint_name); > + goto err2; > + } > + > + tmp_disable_binlog(thd); > + thd->variables.pseudo_slave_mode= TRUE; > + for (;;) > + { > + if ((file= open_binlog(&log, linfo.log_file_name, &errmsg)) < 0) > + { > + sql_print_error("%s", errmsg); > + goto err1; > + } > + while (recover_xa_count > 0 && > + (ev= Log_event::read_log_event(&log, > + > rli->relay_log.description_event_for_exec, > + opt_master_verify_checksum))) > + { > + if (!ev->is_valid()) > + { > + sql_print_error("Found invalid binlog query event %s" > + " at %s:%llu; error %d %s", ev->get_type_str(), > + linfo.log_file_name, > + (ev->log_pos - ev->data_written)); > + goto err1; > + } > + enum Log_event_type typ= ev->get_type_code(); > + ev->thd= thd; > + > + if (typ == FORMAT_DESCRIPTION_EVENT) > + enable_apply_event= true; > + > + if (typ == GTID_EVENT) > + { > + Gtid_log_event *gev= (Gtid_log_event *)ev; > + if (gev->flags2 & > + (Gtid_log_event::FL_PREPARED_XA | > Gtid_log_event::FL_COMPLETED_XA)) > + { > + if ((member= > + (xa_recovery_member*) my_hash_search(&xa_recover_list, > + gev->xid.key(), > + gev->xid.key_length()))) > + { > + /* > + When XA PREPARE group of events (as flagged so) check > + its actual binlog state which may be COMPLETED. If the > + state is also PREPARED then analyze through > + in_engine_prepare whether the transaction needs replay. > + */ > + if (gev->flags2 & Gtid_log_event::FL_PREPARED_XA) > + { > + if (member->state == XA_PREPARE) > + { > + // XA prepared is not present in (some) engine then apply it > + if (member->in_engine_prepare == 0) > + enable_apply_event= true; > + else if (gev->flags2 & Gtid_log_event::FL_MULTI_ENGINE_XA && > + xa_recover_htons > member->in_engine_prepare) This needs a comment, something like "FL_MULTI_ENGINE_XA says the transaction must be prepared in more than one engine. We don't know in how many, so if it's not prepared in *all* engines we'll replay it just in case" I presume that (a bit silly) logic will do away when you will port this patch to use a counter instead of a flag. Please, don't push if before that. > + { > + enable_apply_event= true; > + // partially engine-prepared XA is first cleaned out prior > replay > + thd->lex->sql_command= SQLCOM_XA_ROLLBACK; > + ha_commit_or_rollback_by_xid(&gev->xid, 0); > + } > + else > + --recover_xa_count; > + } > + } > + else if (gev->flags2 & Gtid_log_event::FL_COMPLETED_XA) > + { > + if (member->state == XA_COMPLETE && > + member->in_engine_prepare > 0) > + enable_apply_event= true; why? you cannot replay a fully prepared and partially committed transaction > + else > + --recover_xa_count; > + } > + } > + } > + } > + > + if (enable_apply_event) > + { > + if ((err= ev->apply_event(rgi))) > + { > + sql_print_error("Failed to execute binlog query event of type: > %s," > + " at %s:%llu; error %d %s", ev->get_type_str(), > + linfo.log_file_name, > + (ev->log_pos - ev->data_written), > + thd->get_stmt_da()->sql_errno(), > + thd->get_stmt_da()->message()); > + delete ev; > + goto err1; > + } > + else if (typ == FORMAT_DESCRIPTION_EVENT) > + enable_apply_event=false; how can that happen? > + else if (thd->lex->sql_command == SQLCOM_XA_PREPARE || > + thd->lex->sql_command == SQLCOM_XA_COMMIT || > + thd->lex->sql_command == SQLCOM_XA_ROLLBACK) > + { > + --recover_xa_count; > + enable_apply_event=false; > + > + sql_print_information("Binlog event %s at %s:%llu" > + " successfully applied", > + typ == XA_PREPARE_LOG_EVENT ? > + static_cast<XA_prepare_log_event *>(ev)->get_query() : > + static_cast<Query_log_event *>(ev)->query, > + linfo.log_file_name, (ev->log_pos - ev->data_written)); > + } > + } > + if (typ != FORMAT_DESCRIPTION_EVENT) > + delete ev; > + } > + end_io_cache(&log); > + mysql_file_close(file, MYF(MY_WME)); > + file= -1; > + if (unlikely((error= find_next_log(&linfo, 1)))) > + { > + if (error != LOG_INFO_EOF) > + sql_print_error("find_log_pos() failed (error: %d)", error); > + else > + break; > + } > + } > +err1: > + reenable_binlog(thd); > + /* > + There should be no more XA transactions to recover upon successful > + completion. > + */ > + if (recover_xa_count > 0) > + goto err2; > + sql_print_information("Crash recovery finished."); > + err= false; > +err2: > + if (file >= 0) > + { > + end_io_cache(&log); > + mysql_file_close(file, MYF(MY_WME)); > + } > + thd->variables.pseudo_slave_mode= FALSE; > + delete rli->mi; > + delete thd->system_thread_info.rpl_sql_info; > + rgi->slave_close_thread_tables(thd); > + thd->reset_globals(); > + delete thd; > + > + return err; > +#endif /* !HAVE_REPLICATION */ > +} > > int > MYSQL_BIN_LOG::do_binlog_recovery(const char *opt_name, bool do_xa_recovery) Regards, Sergei VP of MariaDB Server Engineering and [email protected] _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

