Hi Jan,

Ramesh said that he has not observed any "WSREP: BF lock wait long for
trx" messages while running tests. This would suggest that the
diagnostic output code is essentially untested.

I would say that the diagnostic output is related to MDEV-23328 or
MDEV-25114, because it involves the same mutexes as the MDEV-23328
hang. Furthermore, the MDEV-23328 scenario is forced abort of
lock-holding lower-priority transactions due to applying certified
transactions (called "BF" or "brute force" in Galera).

I am glad that you will reconsider my request to remove
wsrep_trx_print_locking() and the mutex operations around the call.

Marko

On Mon, Oct 25, 2021 at 9:04 AM Jan Lindström <[email protected]> wrote:
>
> Hi Marko,
>>
>>
>>
>> I am sad to see that my comment regarding wsrep_is_BF_lock_timeout()
>> that I made in 
>> https://github.com/MariaDB/server/commit/b74b53f0515b360bb5cddec1a506a2f4d4dc21b3#r52293813
>> (June 17) has not been addressed. Do we really need that output? Do we
>> see that output in our internal testing? If not, then we have not
>> tested that that code is free from race conditions or hangs. (It
>> should be a lot safer to avoid such unnecessary unlock/lock exercises
>> involving multiple mutexes.) If yes, then why have we not added source
>> code comments to document when such scenarios would occur? I believe
>> that it is better to rely on some operating system features (such as
>> stack traces from a debugger) rather than to try to implement partial
>> logging.
>
>
> I know, this is not related to MDEV-23328 or MDEV-25114 but ok, I can remove 
> most of this stupid code.
>
> R: Jan



-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp

Reply via email to