Hi Sergei, Answers to your questions below:
On Wed, Oct 6, 2021 at 5:03 PM Sergei Golubchik <[email protected]> wrote: > Hi, Jan! > > On Oct 06, Jan Lindström wrote: > > > > > > > > > +/* This is wrapper for wsrep_break_lock in thr_lock.c */ > > > > > > +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void > *victim_thd_ptr, my_bool signal) > > > > > > +{ > > > > > > + THD* victim_thd= (THD *) victim_thd_ptr; > > > > > > + /* We need to lock THD::LOCK_thd_data to protect victim > > > > > > + from concurrent usage or disconnect or delete. */ > > > > > > > > > > How do you know victim_thd wasn't deleted before you locked > > > > > LOCK_thd_data below? > > > > > > > > I must say the thr_lock code is not familiar to me but there are > > > > mysql_mutex_lock() calls to lock->mutex. After code review it is not > > > > clear to me what that mutex is. > > > > > > where are mysql_mutex_lock() calls to lock->mutex? > > > > mysys/thr_lock.c there is function thr_lock() there is call to > > mysql_mutex_lock(&lock->mutex); this is before wsrep_break_lock where we > > call wsrep_thd_abort > > this is for table locks. `lock` is `data->lock` where `data` is THR_LOCK > structure somewhere in the table share. It locks tables and handlers, > not threads. And InnoDB isn't using it at all anyway. > I do not know how to change this one so that thd is protected, can I even do so as this code does not know THD internals... > > > > > > > > > if (victim_trx) { > > > > > > + wsrep_thd_UNLOCK(victim_thd); > > > > > > > > > > what keeps victim_trx from disappearing here? > > > > > > > > Nothing. Do you have suggestions ? > > > > > > A couple of thoughts: > > > > > > * Why do you unlock LOCK_thd_data here at all? I believed the whole > > > point of using TOI was to make sure that even if you lock mutexes in > > > a different order, it will not cause a deadlock. > > > > This is DDL-case when we have a MDL-conflict not user KILL, we need to > > release it in my opinion because we need to take mutexes in > > lock_sys -> trx -> THD::LOCK_thd_data order, > > if I do not release I can see easily safe_mutex warnings > > I don't understand. First, lock_sys and trx mutexes are not covered by > safe_mutex, so it cannot possibly warn about them. > > Second, the reason for taking mutexes always in the same order is to > avoid deadlocks. And yor TOI trick should archieve that. > static void wsrep_abort_transaction( /*====================*/ handlerton* hton, THD *bf_thd, THD *victim_thd, my_bool signal) { DBUG_ENTER("wsrep_abort_transaction"); trx_t* victim_trx = thd_to_trx(victim_thd); trx_t* bf_trx = (bf_thd) ? thd_to_trx(bf_thd) : NULL; WSREP_DEBUG("abort transaction: BF: %s victim: %s victim conf: %d", wsrep_thd_query(bf_thd), wsrep_thd_query(victim_thd), wsrep_thd_conflict_state(victim_thd, FALSE)); if (victim_trx) { const trx_id_t victim_trx_id= victim_trx->id; const longlong victim_thread= thd_get_thread_id(victim_thd); WSREP_DEBUG("wsrep_abort_transaction: Victim thread %ld " "transaction " TRX_ID_FMT " trx_state %d", thd_get_thread_id(victim_thd), victim_trx->id, victim_trx->state); /* This is necessary as correct mutexing order is lock_sys -> trx -> THD::LOCK_thd_data and below function assumes we have lock_sys and trx locked and takes THD::LOCK_thd_data for THD state check. */ wsrep_thd_UNLOCK(victim_thd); // GAP where thd or trx is not protected DEBUG_SYNC(bf_thd, "wsrep_abort_victim_unlocked"); DBUG_EXECUTE_IF("wsrep_abort_replicated_sleep", WSREP_DEBUG("wsrep_abort_transaction: sleeping " "for thread %ld ", thd_get_thread_id(victim_thd)); lock_mutex_enter(); if (trx_t* victim= trx_rw_is_active(victim_trx_id, NULL, true)) { // As trx is now referenced it can't go away trx_mutex_enter(victim); ut_ad(victim->mysql_thd ? thd_get_thread_id(victim->mysql_thd) == victim_thread :1); // In below we take THD::LOCK_thd_data wsrep_innobase_kill_one_trx(bf_thd, bf_trx, victim, signal); trx_mutex_exit(victim); victim->release_reference(); wsrep_srv_conc_cancel_wait(victim); } lock_mutex_exit(); DBUG_VOID_RETURN; } else { WSREP_DEBUG("victim does not have transaction"); wsrep_thd_LOCK(victim_thd); wsrep_thd_set_conflict_state(victim_thd, MUST_ABORT); wsrep_thd_awake(victim_thd, signal); wsrep_thd_UNLOCK(victim_thd); } DBUG_VOID_RETURN; } > > > > By the way, how long can WSREP_TO_ISOLATION_BEGIN() take? > > > Does it need to wait for everything else being replicated? > > > > As long as it takes to start it on all nodes in the cluster. > > So, KILL basically won't work when a cluster is starting. > Can bf aborts happen during a cluster startup? > Not sure. R: Jan
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

