>Ho-hum... It could even be made lockless in fast path; the problems I see
>are
> * descriptor-to-file lookup becomes unsafe in a lot of locking
>conditions. Sure, most of that happens on the entry to some syscall, with
>very light locking environment, but... auditing every sodding ioctl that
>might be doing such lookups is an interesting exercise, and then there are
>->mount() instances doing the same thing. And procfs accesses. Probably
>nothing impossible to deal with, but nothing pleasant either.
In the Solaris kernel code, the ioctl code is generally not handled a file
descriptor but instead a file pointer (i.e., the lookup is done early in
the system call).
In those specific cases where a system call needs to convert a file
descriptor to a file pointer, there is only one routines which can be used.
> * memory footprint. In case of Linux on amd64 or sparc64,
>main()
>{
> int i;
> for (i = 0; i < 1<<24; dup2(0, i++)) // 16M descriptors
> ;
>}
>will chew 132Mb of kernel data (16Mpointer + 32Mbit, assuming sufficient
>ulimit -n,
>of course). How much will Solaris eat on the same?
Yeah, that is a large amount of memory. Of course, the table is only
sized when it is extended and there is a reason why there is a limit on
file descriptors. But we're using more data per file descriptor entry.
> * related to the above - how much cacheline sharing will that involve?
>These per-descriptor use counts are bitch to pack, and giving each a cacheline
>of its own... <shudder>
As I said, we do actually use a lock and yes that means that you really
want to have a single cache line for each and every entry. It does make
it easy to have non-racy file description updates. You certainly do not
want false sharing when there is a lot of contention.
Other data is used to make sure that it only takes O(log(n)) to find the
lowest available file descriptor entry. (Where n, I think, is the returned
descriptor)
Not contended locks aren't expensive. And all is done on a single cache
line.
One question about the Linux implementation: what happens when a socket in
select is closed? I'm assuming that the kernel waits until "shutdown" is
given or when a connection comes in?
Is it a problem that you can "hide" your listening socket with a thread in
accept()? I would think so (It would be visible in netstat but you can't
easily find out why has it)
Casper
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html