Al Viro <[email protected]> writes: > On Tue, Dec 19, 2017 at 03:49:24PM -0600, Eric W. Biederman wrote: >> > what would you be delaying? kmem_cache_alloc() for struct mount and >> > assignments >> > to its fields? That's noise; if anything, I would expect the main cost >> > with >> > a plenty of containers to be in sget() scanning the list of mqueue >> > superblocks. >> > And we can get rid of that, while we are at it - to hell with mount_ns(), >> > with >> > that approach we can just use mount_nodev() instead. The logics in >> > mq_internal_mount() will deal with multiple instances - if somebody has >> > already >> > triggered creation of internal mount, all subsequent calls in that ipcns >> > will >> > end up avoiding kern_mount_data() entirely. And if you have two callers >> > racing - sure, you will get two superblocks. Not for long, though - the >> > first >> > one to get to setting ->mq_mnt (serialized on mq_lock) wins, the second >> > loses >> > and prompty destroys his vfsmount and superblock. I seriously suspect that >> > variant below would cut down on the cost a whole lot more - as it is, we >> > have >> > the total of O(N^2) spent in the loop inside of sget_userns() when we >> > create >> > N ipcns and mount in each of those; this patch should cut that to >> > O(N)... >> >> If that is where the cost is, is there any point in delaying creating >> the internal mount at all? > > We won't know without the profiles... Incidentally, is there any point in > using mount_ns() for procfs? Similar scheme (with ->proc_mnt instead of > ->mq_mnt, of course) would live with mount_nodev() just fine, and it's > definitely less costly - we don't bother with the loop in sget_userns() > at all that way.
The mechanism of mqueuefs and proc are different for dealing with a filesystem that continues to be mounted/referenced after the namespace exists. Proc actually takes a reference on the pid namespace so it is easier to work with. pid_ns_prepare_proc and and pid_ns_release_proc are the namespace side of that dependency. So yes we could look at a local cache in the namespace and all would be well for proc. I don't know what we would use for locking when we start allowing more that one path to set it. atmoic_cmpxchg(&proc_mnt, NULL)? That makes me suspect we could have a common helper that does the work. I do know that the reason I moved proc to mount_ns is that it had simply been open coding that function. Eric

