-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Paul Eggert on 11/7/2005 2:02 PM: >>Should I add a dependency on the verify module to ensure that drive >>letter prefixes are only used on ASCII platforms? > > I wouldn't bother, unless there's a serious possibility that we'll see > drive letter prefixes on an EBCDIC platform. I think that unlikely. > Besides, if we ever port to OS/360, we'll need to rewrite all this > code anyway -- their file name syntax is _weird_.
OK. VMS also has weird file names, if anyone ever wants to port there. > >>Should I still move FILE_SYSTEM_PREFIX_LEN out of config.h and into >>dirname.h? > > I'd rather move it out of config.h, yes. I have some qualms with > config.h containing ifdefs. What about ISSLASH? I did not touch that in dos.m4, but it is another #ifdef that could reasonably be moved to dirname.h. >>POSIX poses the rule as: if "$string" is a valid filename, then 'cd >>"$(dirname "$string")"; ls "$(basename "$string")"' lists the same >>file > > Yes, that's a better way to formulate the constraint (though it's not > a complete spec). > A similar formulation also appears in the comments of filenamecat.c. > >>(although POSIX fails to mention that this will fail if the >>filename contains a trailing newline). > > That problem doesn't occur for dir_name and base_name though, right? > At least on Unix-like platforms. The problem with trailing newlines is not in the library dirname(3) and basename(3), but with the command line utilities dirname(1) and basename(1) in the presence of `` or $() of /bin/sh. This is because the command line utilities must add a newline, then the command substitution quotes strip all consecutive trailing newlines whether they were part of the filename or added by the utility. > >>My patch makes base_name("c:/") return "". > > Is "" a valid file name? If not, then we're violating the constraint. > I don't know the DOS rules, but I suspect that the only suffix of > "c:/" that identifies the same file is "c:/" itself; in that case, > base_name("c:/") should return "c:/". A quick test shows that DOS treats "" as ".", but cygwin rejects it as invalid (as do all POSIX platforms). I'm not sure about DJGPP or mingw. Consistency argues that for roots (/, //, and c:/), we should either always return the entire root or the empty string. Then there is the question that when returning the empty string, should it be done by returning the beginning of the original string with base_len of 0, or by returning the end of the original string. > > >>For that matter, would it help to redefine our base_name to ALWAYS return >>the empty string if the argument is a root directory? > > That might be simpler, yes, but due to backward-compatibility concerns > I might be inclined to give that function a different name -- > base_component perhaps. > ... > > I'd like someone to review the uses of base_name, dir_name, etc. in > coreutils and tar. This will give us a better feeling for how these > changes would affect real code. Perhaps our backward-compatibility > concerns are unwarranted.... > OK, here's my audit. gnulib: lib/backupfile.c: check_extension and numbered_backup - both are static, and only invoked by find_backup_file_name; which in turn is only used by coreutils and tar on non-directories, so immune to whether base_name("c:/") returns "" or "c:/". lib/dirname.c: dir_len - logic works whether base_name("c:/") returns "" or "c:/", so immune. lib/filnamecat.c: file_name_concat - uses base_name on DIR argument; works if base_name("c:/") returns "c:/" length 3, or returns &"c:/"[3] length 0, but fails if it returns "c:/" length 0. Also see the comment in the test code below about the possibility of changing this file to warn if ABASE is a root. lib/same.c: same_name - used twice to compare two filenames, so immune since both calls have same semantics. lib/stripslash.c: strip_trailing_slashes - already has an issue with "///" not being shortened to "/", so easy to fix to whatever semantics we give base_name. coreutils: src/basename.c: main - currently uses what base_name returns, but easy to patch to print original string if base_name returns empty string. src/cp.c: ASSIGN_BASENAME_STRDUPA - uses alloca, walks entire string twice (by calling strip_trailing_slashes before base_name); but only used as an argument to file_name_concat which already had the comment about passing a root as the ABASE argument. target_directory_operand - followed up by stat of full filename, so immune. src/dircolors.c: guess_shell_syntax - setting SHELL as a root would fail anyways, so immune. src/install.c: target_directory_operand - this should be factored out and shared with cp.c; followed up by stat of full filename, so immune. install_file_in_dir - only used as an argument to file_name_concat which already has the comment about passing a root as the ABASE argument. src/ln.c: target_directory_operand - again, this could be factored out; followed up by stat of full filename, so immune. main - only used as an argument to file_name_concat which already has the comment about passing a root as the ABASE argument. src/ls.c: basename_is_dot_or_dotdot - only used for string comparison to . or .., so immune. However, base_name does not strip trailing slashes, and system.h DOT_OR_DOTDOT does not accomodate trailing slashes. src/mv.c: target_directory_operand - again, this could be factored out; followed up by stat of full filename, so immune. movefile - only used as an argument to file_name_concat which already has the comment about passing a root as the ABASE argument. src/remove.c: rm_1 - only used for string comparison to . or .., so immune. src/rm.c: usage - unlike all the other coreutils, this only uses base_name(program_name) in the usage string. Either they all should, or this should be changed to print the full program_name. src/shred.c: wipename - only invoked on non-directories, so immune. src/split.c: next_file_name - only used in checking for filename truncation of long names, but roots are short, so immune. src/su.c: log_su - based on program_name, so immune. run_shell - setting SHELL as a root would fail anyways, so immune. Besides, there is talk of removing su from coreutils. paxutils: lib/rtapelib.c: rmt_open__ - setting SHELL as a root would fail anyways, so immune. paxlib/rtape.c: rmt_open - setting SHELL as a root would fail anyways, so immune. tar: src/extract.c: create_placeholder_file - only used on filenames with multiple components, so immune. src/xheader.c: xheader_formate_name - comments claim that %f is treated as the results of the basename utility, but the code neglects to strip trailing slashes (unless this was already done when building the tar_stat_info). I'm not sure if tar_stat_info can ever name a root. findutils: find/find.c: at_top - only used after filtering out roots, so immune. find/pred.c: pref_fprintf - I'm not sure if pathname will ever be a root. pred_iname and pred_name - used as argument to fnmatch; I'm not sure if pathname will ever be a root, or if it can have trailing slashes that should have been stripped. locate/locate.c: visit_basename - I'm not sure if pathname will ever be a root, or if it can have trailing slashes that should have been stripped. In summary, if we can verify that tar and findutils' usage of base_name will not be done on root directories, then it looks like changing the semantics of base_name to return the address of the \0 in the original string naming a root directory is probably a safe semantic change. - -- Life is short - so eat dessert first! Eric Blake [EMAIL PROTECTED] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDcLRM84KuGfSFAYARAtd3AKCgKidtbkbJEdmNhMrjRXRC3708ngCg1MKr BJ+h6FlqTvMAlI4CofkowZM= =X4aw -----END PGP SIGNATURE----- _______________________________________________ bug-gnulib mailing list bug-gnulib@gnu.org http://lists.gnu.org/mailman/listinfo/bug-gnulib