Hi,
(+ some AWS folks)
On 17/09/2021 11:17, Jan Beulich wrote:
On 16.09.2021 19:52, Andrew Cooper wrote:
On 16/09/2021 13:30, Jan Beulich wrote:
On 16.09.2021 13:10, Dmitry Isaikin wrote:
From: Dmitry Isaykin <[email protected]>
This significantly speeds up concurrent destruction of multiple domains on x86.
This effectively is a simplistic revert of 228ab9992ffb ("domctl:
improve locking during domain destruction"). There it was found to
actually improve things;
Was it? I recall that it was simply an expectation that performance
would be better...
My recollection is that it was, for one of our customers.
Amazon previously identified 228ab9992ffb as a massive perf hit, too.
Interesting. I don't recall any mail to that effect.
Here we go:
https://lore.kernel.org/xen-devel/de46590ad566d9be55b26eaca0bc4dc7fbbada59.1585063311.git.hongy...@amazon.com/
We have been using the revert for quite a while in production and didn't
notice any regression.
Clearly some of the reasoning behind 228ab9992ffb was flawed and/or
incomplete, and it appears as if it wasn't necessarily a wise move in
hindsight.
Possible; I continue to think though that the present observation wants
properly understanding instead of more or less blindly undoing that
change.
To be honest, I think this is the other way around. You wrote and merged
a patch with the following justification:
"
There is no need to hold the global domctl lock across domain_kill() -
the domain lock is fully sufficient here, and parallel cleanup after
multiple domains performs quite a bit better this way.
"
Clearly, the original commit message is lacking details on the exact
setups and numbers. But we now have two stakeholders with proof that
your patch is harmful to the setup you claim perform better with your patch.
To me this is enough justification to revert the original patch. Anyone
against the revert, should provide clear details of why the patch should
not be reverted.
Cheers,
--
Julien Grall