On 10/31/2016 04:14 PM, Chet Ramey wrote: >> > Nice, thanks for the modifications. > Here's the modified version. > > Chet > -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars > longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU c...@case.edu > http://cnswww.cns.cwru.edu/~chet/ >
> > if ((dir = opendir(dirname))) > { > while ((dp = readdir(dir))) Ouch. This version has O(n^2) complexity when dealing with deep hierarchies. GNU Coreutils intentionally prefers using openat()/fdopendir() and so on in order to recurse through the tree with O(n) complexity instead of O(n^2). > { > if (*dp->d_name == '.' && (dp->d_name[1] == 0 || (dp->d_name[1] == > '.' && dp->d_name[2] == 0))) > continue; > > fnsize = dirlen + 1 + strlen(dp->d_name) + 1; > fname = xmalloc (fnsize); > snprintf(fname, fnsize, "%s/%s", dirname, dp->d_name); And here is part of the HUGE speed penalty that you get in large directories. You are mallocing a LOT of memory, and waste a LOT of work since the majority of the time (while within a single directory) the prefix is unchanged from the previous iteration, while in the deep hierarchy; while doing everything directory-relative completely avoids the need to construct ever-deeper names. > > if (rm_file (fname) && force == 0) > err = 1; > free (fname); > } Another speed sink - your implementation removes names in readdir() order; GNU Coreutils has proven benchmark numbers that unlink()ing files that are sorted by inode number is measurably faster than unlink()ing in readdir() order for directories with a large number of files. Yes, it is more complex to set up the tracking structures to sort things, but it pays off in the long run. If you are going to make this a builtin, you REALLY ought to consider having the builtin fall back to the NORMAL rm utility (which has already taken care of speed issues that you have so naively regressed on) if heuristics ever detect that you are descending more than a few levels deep or if you have encountered more than a certain number of files in a given directory. > while ((opt = internal_getopt (list, "rf")) != -1) > { > switch (opt) > { > case 'r': > recursive = 1; > break; Per POSIX, 'rm -R' and 'rm -r' are identical; a couple tweaks to this patch will make support for 'rm -R' work out of the box. > case 'f': > force = 1; > break; > CASE_HELPOPT; > default: > builtin_usage (); > return (EX_USAGE); > } You do NOT support the POSIX-mandated 'rm -i'. I don't necessarily blame you (since it is complex), but what you should REALLY do is that any time you don't recognize an option, your builtin should silently call the real rm utility, rather than failing. That way, your builtin can be a faster drop-in when it recognizes the simple cases, but remains POSIX-compliant (assuming the fallback app is POSIX compliant), as well as offer ALL extensions that the normal fallback understands, rather than being a crippled version that causes failures to code expecting POSIX or particular extension behaviors. > } > list = loptend; > > if (list == 0) > return (EXECUTION_SUCCESS); This does not match the behavior of GNU coreutils. 'rm' without arguments prints a usage error and exits with nonzero status. But note that POSIX requires 'rm -f' without arguments to silently exit with success. POSIX only specifies the behavior of 'rm -f' without arguments; with 'rm', you can do what you want (therefore you comply with POSIX), but it's still nice to match what other implementations do. While I think that it is neat that this builtin can be used to speed up configure scripts, I don't think it is ready for prime time yet. It's tough to recommend that autoconf be taught to use this sub-standard implementation. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature