On Mittwoch, 5. Oktober 2022 11:38:39 CEST Christian Schoenebeck wrote: > On Dienstag, 4. Oktober 2022 14:54:16 CEST Christian Schoenebeck wrote: > > On Dienstag, 4. Oktober 2022 12:41:21 CEST Linus Heckemann wrote: > > > The previous implementation would iterate over the fid table for > > > lookup operations, resulting in an operation with O(n) complexity on > > > the number of open files and poor cache locality -- for every open, > > > stat, read, write, etc operation. > > > > > > This change uses a hashtable for this instead, significantly improving > > > the performance of the 9p filesystem. The runtime of NixOS's simple > > > installer test, which copies ~122k files totalling ~1.8GiB from 9p, > > > decreased by a factor of about 10. > > > > > > Signed-off-by: Linus Heckemann <[email protected]> > > > Reviewed-by: Philippe Mathieu-Daudé <[email protected]> > > > Reviewed-by: Greg Kurz <[email protected]> > > > Message-Id: <[email protected]> > > > [CS: - Retain BUG_ON(f->clunked) in get_fid(). > > > > > > - Add TODO comment in clunk_fid(). ] > > > > > > Signed-off-by: Christian Schoenebeck <[email protected]> > > > --- > > > > In general: LGTM now, but I will definitely go for some longer test runs > > before queuing this patch. Some minor side notes below ... > > So I was running a compilation marathon on 9p as root fs this night, first > couple hours went smooth, but then after about 12 hours 9p became unusable > with error: > > Too many open files > > The question is, is that a new issue introduced by this patch? I.e. does it > break the reclaim fd code? Or is that rather unrelated to this patch, and a > problem we already had? > > Linus, could you look at this? It would probably make sense to force getting > into this situation much earlier like: > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c > index aebadeaa03..0c104b81e1 100644 > --- a/hw/9pfs/9p.c > +++ b/hw/9pfs/9p.c > @@ -4330,6 +4330,6 @@ static void __attribute__((__constructor__)) > v9fs_set_fd_limit(void) > error_report("Failed to get the resource limit"); > exit(1); > } > - open_fd_hw = rlim.rlim_cur - MIN(400, rlim.rlim_cur / 3); > + open_fd_hw = rlim.rlim_cur - MIN(50, rlim.rlim_cur / 3); > open_fd_rc = rlim.rlim_cur / 2; > } > > I can't remember that we had this issue before, so there might still be > something wrong with this GHashTable patch here.
Much easier reproducer; and no source changes required whatsoever: prlimit --nofile=140 -- qemu-system-x86_64 ... And I actually get this error without this patch as well, which suggests that we already had a bug in the reclaim FDs code before? :/ Anyway, as it seems that this bug was not introduced by this particular patch, and with the unnecesary `goto` and `out:` label removed: Queued on 9p.next: https://github.com/cschoenebeck/qemu/commits/9p.next Best regards, Christian Schoenebeck
