On Thu, Oct 22, 2015 at 11:55:42AM +0100, Alan Burlison wrote: > On 22/10/2015 05:21, Al Viro wrote: > > >>Most of the work on using a file descriptor is local to the thread. > > > >Using - sure, but what of cacheline dirtied every time you resolve a > >descriptor to file reference? > > Don't you have to do that anyway, to do anything useful with the file?
Dirtying the cacheline that contains struct file itself is different, but that's not per-descriptor. > >In case of Linux we have two bitmaps and an array of pointers associated > >with descriptor table. They grow on demand (in parallel) > > * reserving a descriptor is done under ->file_lock (dropped/regained > >around memory allocation if we end up expanding the sucker, actual > >reassignment > >of pointers to array/bitmaps is under that spinlock) > > * installing a pointer is lockless (we wait for ongoing resize to > >settle, RCU takes care of the rest) > > * grabbing a file by index is lockless as well > > * removing a pointer is under ->file_lock, so's replacing it by dup2(). > > Is that table per-process or global? Usually it's per-process, but any thread could ask for a private instance to work with (and then spawn more threads sharing that instance - or getting independent copies). It's common for Plan 9-inspired models - basically, you treat every thread as a machine that consists of * memory * file descriptor table * namespace * signal handlers ... * CPU (i.e. actual thread of execution). The last part can't be shared; anything else can. fork(2) variant used to start new threads (clone(2) in case of Linux, rfork(2) in Plan 9 and *BSD) is told which components should be copies of parent's ones and which should be shared with the parent. fork(2) is simply "copy everything except for the namespace". It's fairly common to have "share everything", but intermediate variants are also possible. There are constraints (e.g. you can't share signal handlers without sharing the memory space), but descriptor table can be shared independently from memory space just fine. There's also a way to say "unshare this, this and that components" - mapped to unshare(2) in Linux and to rfork(2) in Plan 9. Best way to think of that is to consider descriptor table as a first-class object a thread can be connected to. Usually you have one for each process, with all threads belonging to that process connected to the same thing, but that's just the most common use. > I don't think that it's possible to claim that a non-atomic dup2() > is POSIX-compliant. Except that it's in non-normative part of dup2(2), AFAICS. I certainly agree that it would be a standard lawyering beyond reason, but "not possible to claim" is too optimistic. Maybe I'm just more cynical... > ThreadA remains sat in accept on fd1 which is now a plain file, not > a socket. No. accept() is not an operation on file descriptors; it's an operation on file descriptions (pardon for use of that terminology). They are specified by passing descriptors, but there's a hell of a difference between e.g. dup() or fcntl(,F_SETFD,) (operations on descriptors) and read() or lseek() (operations on descriptions). Lookups are done once per syscall; the only exception is F_SETFL{,W}, where we recheck that descriptor is refering to the same thing before granting the lock. Again, POSIX is still underspecifying the semantics of shared descriptor tables; back when the bulk of it had been written there had been no way to have a descriptor -> description mapping changed under a syscall by action of another thread. Hell, they still hadn't picked on some things that happened in early 80s, let alone early-to-mid 90s... Linux and Solaris happen to cover these gaps differently; FreeBSD and OpenBSD are probably closer to Linux variant, NetBSD - to Solaris one. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html