Luke,

sure, adjustment at run-time works just fine, the issue currently is that it is 
baked-in at compile time so there is no way to adjust it (re-building R is not 
an option in production environment where this usually happens).

That said, I'm still not sure that connection limit is a good way to guard 
against the fd limit since there are so many other ways to use up descriptors 
(DLLs, sockets, pipes, etc. - packages and 3rd party libraries). Apparently we 
are actually already fiddling with the soft limit - we have R_EnsureFDLimit() 
and R_GetFDLimit() which is used at startup to raise it to 1024 by default 
regardless of the ulimit -n setting (comments say this is for DLLs). I guess 
based on that we know at least what to expect so we could trivially warn if the 
new setting is larger that the user limit.

Cheers,
Simon


> On Aug 25, 2021, at 1:45 PM, luke-tier...@uiowa.edu wrote:
> 
> We do need to be careful about using too many file descriptors.  The
> standard soft limit on Linux is fairly low (1024; the hard limit is
> usually quite a bit higher). Hitting that limit, e.g. with runaway
> with code allocating lots of connections, can cause other things, like
> loading packages, to fail with hard to diagnose error messages. A
> static connection limit is a crude way to guard against that. Doing
> anything substantially better is probably a lot of work. A simple
> option that may be worth pursuing is to allow the limit to be adjusted
> at runtime. Users who want to go higher would do so at their own risk
> and may need to know how to adjust the soft limit on the process.
> 
> Best,
> 
> luke
> 
> On Wed, 25 Aug 2021, Simon Urbanek wrote:
> 
>> 
>> Martin,
>> 
>> I don't think static connection limit is sensible. Recall that connections 
>> can be anything, not just necessarily sockets or file descriptions so they 
>> are not linked to the system fd limit. For example, if you use a codec then 
>> you will need twice the number of connections than the fds. To be honest the 
>> connection limit is one of the main reasons why in our big data applications 
>> we have always avoided R connections and used C-level sockets instead 
>> (others were lack of control over the socket flags, but that has been 
>> addressed in the last release). So I'd vote for at the very least increasing 
>> the limit significantly (at least 1k if not more) and, ideally, make it 
>> dynamic if memory footprint is an issue.
>> 
>> Cheers,
>> Simon
>> 
>> 
>>> On Aug 25, 2021, at 8:53 AM, Martin Maechler <maech...@stat.math.ethz.ch> 
>>> wrote:
>>> 
>>>>>>>> GILLIBERT, Andre
>>>>>>>>   on Tue, 24 Aug 2021 09:49:52 +0000 writes:
>>> 
>>>> RConnection is a pointer to a Rconn structure. The Rconn
>>>> structure must be allocated independently (e.g. by
>>>> malloc() in R_new_custom_connection).  Therefore,
>>>> increasing NCONNECTION to 1024 should only use 8
>>>> kilobytes on 64-bits platforms and 4 kilobytes on 32
>>>> bits platforms.
>>> 
>>> You are right indeed, and I was wrong.
>>> 
>>>> Ideally, it should be dynamically allocated : either as
>>>> a linked list or as a dynamic array
>>>> (malloc/realloc). However, a simple change of
>>>> NCONNECTION to 1024 should be enough for most uses.
>>> 
>>> There is one important other problem I've been made aware
>>> (similarly to the number of open DLL libraries, an issue 1-2
>>> years ago) :
>>> 
>>> The OS itself has limits on the number of open files
>>> (yes, I know that there are other connections than files) and
>>> these limits may quite differ from platform to platform.
>>> 
>>> On my Linux laptop, in a shell, I see
>>> 
>>> $ ulimit -n
>>> 1024
>>> 
>>> which is barely conformant with your proposed 1024 NCONNECTION.
>>> 
>>> Now if NCONNCECTION is larger than the max allowed number of
>>> open files and if R opens more files than the OS allowed, the
>>> user may get quite unpleasant behavior, e.g. R being terminated brutally
>>> (or behaving crazily) without good R-level warning / error messages.
>>> 
>>> It's also not at all sufficient to check for the open files
>>> limit at compile time, but rather at R process startup time
>>> 
>>> So this may need considerably more work than you / we have
>>> hoped, and it's probably hard to find a safe number that is
>>> considerably larger than 128  and less than the smallest of all
>>> non-crazy platforms' {number of open files limit}.
>>> 
>>>> Sincerely
>>>> Andr� GILLIBERT
>>> 
>>> [............]
>>> 
>>> ______________________________________________
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> ______________________________________________
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
> 
> -- 
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>   Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to