Hi Sergei, Your suggestion does not work. There are more than one problem
(1) wsrep_abort_transaction does not release MDL-lock (2) innobase_kill_one_trx crashes at wsrep->abort_pre_commit() because transaction registered inside wsrep has disappeared (this does not happen if THD::LOCK_thd_data is locked) ==> I will use Seppo's KILL as TOI and remove all unrelated changes from mdl.cc and wsrep_close_connections, they are not related to problems we need to fix. Let's take one problem at a time. TOI is a very powerful solution to the problem we are trying to fix. R: Jan On Thu, Oct 21, 2021 at 7:52 AM Jan Lindström <[email protected]> wrote: > Hi Sergei, > > This does not seem to work. Consider following: > > CREATE TABLE t1 (id INT PRIMARY KEY) ENGINE=InnoDB; > INSERT INTO t1 VALUES (1); > connection node_2; > SET AUTOCOMMIT=OFF; > START TRANSACTION; > INSERT INTO t1 VALUES (2); > connection node_2a; > ALTER TABLE t1 ADD COLUMN f2 INTEGER, LOCK=EXCLUSIVE; > > Problem seems to be the fact that the MDL-lock acquired by thread > executing INSERT that thread executing ALTER wants to be released by > killing holder is > not released. We do kill query inside InnoDB. This MDL-code is not > familiar to me and I do not yet understand why MDL-lock is not released > > R: Jan > > On Thu, Oct 14, 2021 at 9:49 PM Sergei Golubchik <[email protected]> wrote: > >> Hi, Jan! >> >> Here's an idea of the fix: >> >> Let's always use the KILL mutex locking order, that is >> >> victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex >> >> For this we need to fix wsrep_abort_transaction(), which is called from >> the >> server, and wsrep_innobase_kill_one_trx(), which is called from BF >> thread. >> >> wsrep_abort_transaction() can be fixed by not invoking >> wsrep_innobase_kill_one_trx() and always using KILL code path (that is >> wsrep_thd_awake) and forcing rollback after the kill. >> >> wsrep_innobase_kill_one_trx() can be fixed by not locking LOCK_thd_data >> at all, just don't lock it. We know that the victim waits on a lock >> inside InnoDB and we've locked trx mutex and lock_sys mutex. The victim >> cannot go away, cannot modify its data, it cannot do anything. So, >> LOCK_thd_data doesn't seem to be necessary at that point. >> >> I've attached a demo patch. It compiles, but I didn't try to run it, >> it's only to show the idea, not a working fix (I already suspect I >> removed too much from wsrep_abort_transaction()). Note it's the patch >> for 10.2 at the commit 29bbcac0ee8^ - that is one commit before my fix. >> >> On Oct 12, Jan Lindström wrote: >> > Hi Sergei, >> > >> > Update on wsrep_close_connections problem. My suggestion to fix this >> issue >> > is on >> > >> https://github.com/MariaDB/server/commit/99cbe03a44cc95e6f548550df51e7201ebea3b9d >> > >> > If you have a better solution, please advise. >> > >> > R: Jan >> >> Regards, >> Sergei >> VP of MariaDB Server Engineering >> and [email protected] >> >
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

