On Thu, Jan 19, 2023 at 7:05 PM Paul Eggert <egg...@cs.ucla.edu> wrote: > > On 1/19/23 15:41, Bruno Haible wrote: > > Jim or Paul, what should we state > > — either in the 'fts' module description, or in the .texi documentation? > > The quick thing is to say in both that the description/documentation is > incomplete, and that people need to read the source code. > > Jim may be able to fill in a bit here, since I think he wrote most of > that stuff. (I haven't checked this though; sorry, I'm a bit crunched > for time today.)
Thanks for caring/documenting. Here's a quick summary (for more detail, see the comments in fts_.h). This started when I found glibc's fts was insufficiently robust to meet GNU rm's needs (rm was merely the first user; now, many others use it): - O(N^2) behavior in the number of file name components due to cycle detection - max hierarchy depth was 64k due to type of fts_level being a "short" - subject to O(N^2) effects for directories with many entries (poor locality of reference, for which the fix was to process entries in sorted-inode order (per a heuristic), delaying any "stat" until operating on the entry) Re fts's cycle detection: - contrast glibc's O(depth) time algorithm vs our O(1) implementation - our cheap-but-lazy O(1)-memory approach is ok for most applications, but - there's an optional, slightly more costly detect-ASAP approach required for du (uses O(max-depth-of-hierarchy) memory) Fixing those things required ABI changes and nontrivial redesign.