Control: severity -1 important

Hi,

On Mon, Jun 03, 2024 at 04:31:39PM +0200, Pierre-Elliott Bécue wrote:
> Assigning to fakeroot for now, but not sure it's not something for ftp.d.o (a
> binNMU) or libc6.
> 
> Today, after an upgrade, I am not able to build packages with sbuild as
> it hanks with this process tree:

I suggest that the cause may be something else and we are looking at
some red herrings at least. In the mean time, the buildd network is
reporting successful builds, Jochen Sprickerhof and I cannot readily
reproduce the problem. I'm relatively certain that it doesn't affect
everyone at this time and hence downgrading the report.

> In parallel, one can find a faked-sysv process eating all a CPU resources.
> 
> peb       225847  100  0.0   2440   648 pts/4    R+   16:25   0:02  \_ 
> /usr/bin/faked-sysv

I think I have a plausible explanation for the CPU consumption.

> strace: Process 230857 attached
> close(200453)                           = -1 EBADF (Bad file descriptor)
> close(200454)                           = -1 EBADF (Bad file descriptor)
> close(200455)                           = -1 EBADF (Bad file descriptor)
...
> close(200511)                           = -1 EBADF (Bad file descriptor)
> ... and so on

What we see here is a loop closing increasing file descriptors. I looked
into fakeroot's source code and indeed there is such a loop that may
correspond to this trace.

https://sources.debian.org/src/fakeroot-ng/0.18-4.1/daemon.cpp/?hl=411#L414
|         int fd_limit=getdtablesize();
|         for( int i=0; i<fd_limit; ++i ) {
|             if( i!=skip_fd1 && i!=skip_fd2 && i!=fd )
|                 close(i);
|         }

For those who also learned about getdtablesize today, it is a libc
function that reports the maximum possible file descriptor number and
the output is dependent on resource limits. You can see yours:

$ ulimit -n
1024
$ python3 -c 'print(__import__("ctypes").CDLL("").getdtablesize())'
1024
$ ulimit -n $((1024*1024))
$ python3 -c 'print(__import__("ctypes").CDLL("").getdtablesize())'
1048576
$

I note that the default file descriptor limit is 1e6 hard and 1e3 soft.
This is what I see on multiple systems even though I'm not exactly sure
where it comes from. What applies here is the soft limit and yours isn't
1e3. I tried raising mine to 1e6 and then build hostname (which happens
to not be R³: no), but I couldn't reproduce the hanging behaviour.

> An hypothesis is that a rebuild against the current sid could solve the issue.
> I will try that and report back.

This no longer feels plausible to me. I have a few other ones:

A. Your machine is relatively slow and closing 1e6 fds (for whatever
   reason) takes it very long time leaving the impression of hanging.

B. Your file descriptor limit is even higher than 1e6 and in that case,
   the close loop really can take very long.

C. You captured part of the close loop (with a higher than usual file
   descriptor limit), but it is not the cause of the hang. Rather it
   hangs for other reasons after having closed all those file
   descriptors.

> Feel free to reassign.

>From what I can tell, fakeroot would be better served with using
close_range(2). That would lower the CPU consumption with a higher
resource limit and make the real problem more apparent or disappear.

Hope this helps, but I am fairly convinced now that this is not a glibc
regression.

Helmut

Reply via email to