Hi David,

On 01/07/2019 11.27, David Miller wrote:
> From: Gerd Rausch <gerd.rau...@oracle.com>
> Date: Mon, 1 Jul 2019 09:39:44 -0700
> 
>> +                    /* Memory regions make it onto the "clean_list" via
>> +                     * "rds_ib_flush_mr_pool", after the memory region has
>> +                     * been posted for invalidation via "rds_ib_post_inv".
>> +                     *
>> +                     * At that point in time, "fr_state" may still be
>> +                     * in state "FRMR_IS_INUSE", since the only place where
>> +                     * "fr_state" transitions to "FRMR_IS_FREE" is in
>> +                     * is in "rds_ib_mr_cqe_handler", which is
>> +                     * triggered by a tasklet.
>> +                     *
>> +                     * So in case we notice that
>> +                     * "fr_state != FRMR_IS_FREE" (see below), * we wait for
>> +                     * "fr_inv_done" to trigger with a maximum of 10msec.
>> +                     * Then we check again, and only put the memory region
>> +                     * onto the drop_list (via "rds_ib_free_frmr")
>> +                     * in case the situation remains unchanged.
>> +                     *
>> +                     * This avoids the problem of memory-regions bouncing
>> +                     * between "clean_list" and "drop_list" before they
>> +                     * even have a chance to be properly invalidated.
>> +                     */
>> +                    frmr = &ibmr->u.frmr;
>> +                    wait_event_timeout(frmr->fr_inv_done,
>> +                                       frmr->fr_state == FRMR_IS_FREE,
>> +                                       msecs_to_jiffies(10));
>> +                    if (frmr->fr_state == FRMR_IS_FREE)
>> +                            break;
> 
> If we see FRMR_IS_FREE after the timeout, what cleans this up?
> 

In that case, the memory-region is subjected to the
"rds_ib_free_frmr(ibmr, true)" call that follows:
In essence making it onto the "drop_list".

It's the same as if it wouldn't transition to FRMR_IS_FREE at all.
In both cases, the memory region should get dropped, and the application
would have been penalized by an extra 10msec wait-time (for having tried to 
invalidate it).

> Also, why 10msec?

It's empirical.
I had added some debugging code (not part of this submission) that traced
the return value of "wait_event_timeout" in order to see the out-lier in terms
of processing the "IB_WR_LOCAL_INV" request.

On my test-systems the majority of requests were done in less than 1msec.
I saw an outlier at almost 2msec once.
So I gave it an extra order-of-magnitude multiplier for extra buffer / paranoia.

> Why that specific value and not some other value?

I looked around to find what Mellanox or any other reference material had
so say about the expected turn--around time of an "IB_WR_LOCAL_INV" ought to be.
I wasn't able to find any.

Please note that even if there was an upper-bound specified, such as minutes:
It wouldn't necessarily be a good idea to penalize an application by wait-times
up to one minute, if the alternative is to just put this memory region on a
drop-list and pick another one (which is suggested here).

> Why not wait for however long it takes for the tasklet to run and clean it up?

Two reasons I can think of:
1) The penalty of long wait-times would go to the application
2) If there were a firmware-bug, the "wait_event" would not terminate

Thanks,

  Gerd

Reply via email to