James,

Sorry, perhaps I was indirect, but I thought I had responded to that in 
https://lore.kernel.org/linux-fsdevel/de6adce76b534310975e4d3c4a4fa...@garmin.com/.

I really hope I do not come off as complaining about this issue. We identified 
what seemed to be something that was overlooked with the various APIs around 
creating child processes. Rather than fixing it ourselves and moving on we 
chose to invest more time and effort into it by engaging the community (first 
POSIX, and now this one) in a discussion. I humbly and sincerely ask if you 
would help me understand, if we could turn back the clock, how our application 
could have been written to avoid this issue:

*A parent process forks a child. Another thread in the parent process closes 
and attempts to reopen a socket, file, or other resource it needs exclusive 
access to. This fails because the -operating system- still has a reference to 
that resource that it is keeping on behalf of the child. The child eventually 
calls exec and the resource is closed because the close-on-exec flag is set.*

Our first attempt, which was to use the pthread_atfork() handlers, failed 
because system() is not required to call the handlers.

Most of the feedback we're getting on this seems to say "don't use system(), it 
is unsafe for threaded applications". Is that documented anywhere? The man page 
says it is "MT-Safe".

Aside from that, even if we remove all uses of system() from our application 
(which we already have), then our application, like many other applications, 
needs to use third-party shared libraries. There is nothing that prevents those 
libraries from using system(). We can audit those libraries and go back with 
the vendor with a request to replace system() with a standard fork/exec, but 
they will also want documentation supporting that.

We can also take steps to change or remove system() from our standard library. 
It fixes our issue, but still leaves the community with an API that is 
broken/flawed/poorly-documented (depending on how one looks at it).

If the feedback from the community is truly and finally that system() should 
not be used in these applications, then is there support for updating the man 
page to better communicate that?

Thanks for your help with this.

Nate

-----Original Message-----
From: James Bottomley <james.bottom...@hansenpartnership.com>
Sent: Friday, May 15, 2020 11:26
To: Karstens, Nate <nate.karst...@garmin.com>; Matthew Wilcox 
<wi...@infradead.org>
Cc: Alexander Viro <v...@zeniv.linux.org.uk>; Jeff Layton <jlay...@kernel.org>; 
J. Bruce Fields <bfie...@fieldses.org>; Arnd Bergmann <a...@arndb.de>; Richard 
Henderson <r...@twiddle.net>; Ivan Kokshaysky <i...@jurassic.park.msu.ru>; Matt 
Turner <matts...@gmail.com>; Helge Deller <del...@gmx.de>; David S. Miller 
<da...@davemloft.net>; Jakub Kicinski <k...@kernel.org>; Eric Dumazet 
<eduma...@google.com>; David Laight <david.lai...@aculab.com>; 
linux-fsde...@vger.kernel.org; linux-a...@vger.kernel.org; 
linux-al...@vger.kernel.org; linux-par...@vger.kernel.org; 
sparcli...@vger.kernel.org; netdev@vger.kernel.org; 
linux-ker...@vger.kernel.org; Changli Gao <xiao...@gmail.com>; 
a.jo...@opengroup.org
Subject: Re: [PATCH v2] Implement close-on-fork

CAUTION - EXTERNAL EMAIL: Do not click any links or open any attachments unless 
you trust the sender and know the content is safe.


On Fri, 2020-05-15 at 16:07 +0000, Karstens, Nate wrote:
> Matthew,
>
> What alternative would you suggest?
>
> From an earlier email:
>
> > ...nothing else addresses the underlying issue: there is no way to
> > prevent a fork() from duplicating the resource. The close-on-exec
> > flag partially-addresses this by allowing the parent process to mark
> > a file descriptor as exclusive to itself, but there is still a
> > period of time the failure can occur because the auto-close only
> > occurs during the exec(). Perhaps this would not be an issue with a
> > different process/threading model, but that is another discussion
> > entirely.
>
> Do you disagree there is an issue?

Oh good grief that's a leading question: When I write bad code and it crashes, 
most people would agree there is an issue; very few would agree the kernel 
should be changed to fix it. Several of us have already said the problem seems 
to be with the way your application is written.  You didn't even answer emails 
like this speculating about the cause being the way your application counts 
resources:

https://lore.kernel.org/linux-fsdevel/1587569663.3485.18.ca...@hansenpartnership.com/

The bottom line is that we think you could rewrite this one application not to 
have the problem you're complaining about rather than introduce a new kernel 
API to "fix" it.

James




________________________________

CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of 
the intended recipient(s) and contain information that may be Garmin 
confidential and/or Garmin legally privileged. If you have received this email 
in error, please notify the sender by reply email and delete the message. Any 
disclosure, copying, distribution or use of this communication (including 
attachments) by someone other than the intended recipient is prohibited. Thank 
you.

Reply via email to