On Thu, Sep 16, 2021 at 02:30:39PM +0200, Jan Beulich wrote: > On 16.09.2021 13:10, Dmitry Isaikin wrote: > > From: Dmitry Isaykin <[email protected]> > > > > This significantly speeds up concurrent destruction of multiple domains on > > x86. > > This effectively is a simplistic revert of 228ab9992ffb ("domctl: > improve locking during domain destruction"). There it was found to > actually improve things; now you're claiming the opposite. It'll > take more justification, clearly identifying that you actually > revert an earlier change, and an explanation why then you don't > revert that change altogether. You will want to specifically also > consider the cleaning up of huge VMs, where use of the (global) > domctl lock may hamper progress of other (parallel) operations on > the system. > > > I identify the place taking the most time: > > > > do_domctl(case XEN_DOMCTL_destroydomain) > > -> domain_kill() > > -> domain_relinquish_resources() > > -> relinquish_memory(d, &d->page_list, PGT_l4_page_table) > > > > My reference setup: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, Xen 4.14. > > > > I use this command for test: > > > > for i in $(seq 1 5) ; do xl destroy test-vm-${i} & done > > > > Without holding the lock all calls of `relinquish_memory(d, &d->page_list, > > PGT_l4_page_table)` > > took on my setup (for HVM with 2GB of memory) about 3 seconds for each > > destroying domain. > > > > With holding the lock it took only 100 ms. > > I'm further afraid I can't make the connection. Do you have an > explanation for why there would be such a massive difference? > What would prevent progress of relinquish_memory() with the > domctl lock not held?
I would recommend to Dmitry to use lock profiling with and without this change applied and try to spot which lock is causing the contention as a starting point. That should be fairly easy and could share some light. Regards, Roger.
