Hi Jan, Ramesh said that he has not observed any "WSREP: BF lock wait long for trx" messages while running tests. This would suggest that the diagnostic output code is essentially untested.
I would say that the diagnostic output is related to MDEV-23328 or MDEV-25114, because it involves the same mutexes as the MDEV-23328 hang. Furthermore, the MDEV-23328 scenario is forced abort of lock-holding lower-priority transactions due to applying certified transactions (called "BF" or "brute force" in Galera). I am glad that you will reconsider my request to remove wsrep_trx_print_locking() and the mutex operations around the call. Marko On Mon, Oct 25, 2021 at 9:04 AM Jan Lindström <[email protected]> wrote: > > Hi Marko, >> >> >> >> I am sad to see that my comment regarding wsrep_is_BF_lock_timeout() >> that I made in >> https://github.com/MariaDB/server/commit/b74b53f0515b360bb5cddec1a506a2f4d4dc21b3#r52293813 >> (June 17) has not been addressed. Do we really need that output? Do we >> see that output in our internal testing? If not, then we have not >> tested that that code is free from race conditions or hangs. (It >> should be a lot safer to avoid such unnecessary unlock/lock exercises >> involving multiple mutexes.) If yes, then why have we not added source >> code comments to document when such scenarios would occur? I believe >> that it is better to rely on some operating system features (such as >> stack traces from a debugger) rather than to try to implement partial >> logging. > > > I know, this is not related to MDEV-23328 or MDEV-25114 but ok, I can remove > most of this stupid code. > > R: Jan -- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation _______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

