-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Paul Eggert on 5/16/2005 6:14 PM: > This isn't just dd; it's cat, md5sum, split, etc. And I don't really > understand how it works, or why some programs use binary modes and not > others. For example, POSIX says that the input to "head" must be a > text file, so why does GNU "head" set binary mode? Why does (say) > "unexpand" use binary mode, but "uniq" uses text mode? Why does > md5sum invoke setmode (..., O_TEXT) on a file that has just been > fopened with "r" (doesn't that mean text?). None of this stuff really > makes sense to me, and this makes the code hard to maintain.
I agree that a lot of programs have become ad hoc in deciding when text vs. binary vs. default mode is needed, often as bugs are reported that the behavior wasn't intuitive (such as the recent report on dd). First, some background: POSIX fopen(,"r") and open(,,<neither O_BINARY nor O_TEXT>) default to whatever the underlying mount point is. Cygwin recommends binary mount points, since POSIX requires "r" and "rb" to be identical, but there are users who have text mount points for better interoperability with Windows programs on text files; text mode mounts make the most sense when all files in that mount point are text files. POSIX fopen(,"rb") and the extension open(,,O_BINARY) force binary mode. And the POSIX extensions fopen(,"rt") and open(,,O_TEXT) force text mode. Furthermore, terminals have strange behavior, where forcing binary or text mode on a terminal is almost always the wrong thing to do, hence coreutils' SET_BINARY macro that ensures that it is only changing mode on a non-terminal. As to POSIX requirements, you bring up a valid point on utilities that are required to operate only on text files, and that a script that tries to run that utility on a non-text file is non-portable. I claim that logically, programs that need operate only on text files should defer to the mount point mode, and programs that must operate on any file type should always default to binary. Forcing text mode without a user option is almost always wrong. Output from many POSIX programs is human-readable text, but since the utility inherits rather than opens stdout, it shouldn't be changing the mode of stdout in that case. Helper files (such as uptime opening /proc/uptime under the hood) are not user-specified stdin or filenames on the command line, and as such, should probably be opened in binary mode. I welcome feedback on whether this following list of desired behavior sounds correct, before I then try to see whether coreutils is actually doing that behavior: [ - doesn't open files basename - doesn't open files cat - POSIX requires binary input and output, and this already has -B option to fine-tune mode chgrp - doesn't open files chmod - doesn't open files chown - doesn't open files chroot - doesn't open files cksum - POSIX requires binary input comm - POSIX requires text input cp - POSIX requires binary input and output csplit - POSIX requires text input cut - POSIX requires text input date - doesn't open files dd - POSIX requires binary input and output, and the [io]flag=text option was just added df - doesn't open files dir - doesn't open files dircolors - non-standard, but operates on text input dirname - doesn't open files du - doesn't open files echo - doesn't open files env - doesn't open files expand - POSIX requires text input expr - doesn't open files factor - doesn't open files false - doesn't open files fmt - non-standard, but operates on text input fold - POSIX requires text input groups - doesn't open files head - POSIX requires text input, but compare to tail -c on binary input hostid - doesn't open files hostname - doesn't open files id - doesn't open files install - non-standard, but operates on binary input and output kill - doesn't open files link - doesn't open files ln - doesn't open files logname - doesn't open files ls - doesn't open files md5sum - non-standard, but like cksum needs binary input mkdir - doesn't open files mkfifo - doesn't open files mknod - doesn't open files mv - POSIX requires binary input and output nice - doesn't open files nl - POSIX requires text input nohup - POSIX requires that stdout from utility may to go to nohup.out, so nohup.out should probably be opened in same mode as nohup's stdout (if it exists) od - POSIX requires binary input; but as this is a formatter, we probably want options to fine-tune the mode paste - POSIX requires text input and output pathchk - doesn't open files pinky - doesn't open files pr - doesn't open files printenv - doesn't open files printf - doesn't open files ptx - non-standard, but operates on text input pwd - doesn't open files readlink - doesn't open files rm - doesn't open files rmdir - doesn't open files seq - doesn't open files sha1sum - non-standard, but like cksum needs binary input shred - non-standard, but needs binary input if it is going to affect the same number of bytes on disk as it erases sleep - doesn't open files sort - POSIX requires text input split - POSIX requires binary input stat - doesn't open files stty - doesn't open files su - doesn't open files (cygwin doesn't support su; and the question was raised earlier whether coreutils should drop su or add newgrp) sum - non-standard, but like cksum needs binary input sync - doesn't open files tac - non-standard, but like cat operates on binary input tail - POSIX requires that -c operates on binary input, otherwise on text input tee - POSIX requires binary input and output test - doesn't open files touch - doesn't open files tr - POSIX requires binary input true - doesn't open files tsort - POSIX requires text input tty - doesn't open files uname - doesn't open files unexpand - POSIX requires text input uniq - POSIX requires text input unlink - doesn't open files uptime - doesn't open files users - non-standard, but /var/run/utmp should probably be opened in binary mode vdir - doesn't open files wc - POSIX requires binary input who - doesn't open files whoami - doesn't open files yes - doesn't open files > > Is there some way that we can simplify this by using wrapper functions > on DOS-like hosts? I'd rather get rid of the SETMODE and SET_BINARY > macros entirely. If Cygwin open or fcntl doesn't do the obvious thing > with O_TEXT and O_BINARY, let's define a wrapper function, used only > on cygwin, that does the right thing. open does the right thing. The problem is that fcntl(fd,F_SETFL,fcntl(fd,F_GETFL)|O_BINARY) will not work, since O_BINARY is not an additive property, but a mutually exclusive property with O_TEXT. Whether you used O_BINARY, O_TEXT, or nothing with the original open(), fcntl(F_GETFL) will always return O_BINARY or O_TEXT in its list of flags. And even if cygwin is patched to let fcntl(F_SETFL,O_BINARY) change the mode to binary, it will have to reject fcntl(F_SETFL,O_BINARY|O_TEXT). Hence the current use of setmode(mode), which returns EINVAL unless mode is exactly O_BINARY, O_TEXT, or 0 (meaning no change). I agree that a wrapper might help, but the wrapper would need slightly different semantics than how fcntl(F_SETFL) is used in dd.c, because of the mutually exclusive nature of O_BINARY and O_TEXT. > Your patch assumes that (O_BINARY != 0 && O_TEXT != 0); is this really > true on all platforms? It seems to me that one could be zero. system.h keys solely off of O_BINARY - if O_BINARY is non-zero, then O_TEXT is required to also exist (and it is probably also non-zero). If O_BINARY doesn't exist or is 0, then system.h makes both O_BINARY and O_TEXT be 0, to avoid later #ifdef'ery. My patch always treated the combination of (O_BINARY|O_TEXT), which should easily be optimized out as 0 on platforms without O_BINARY; and should work fine even if there is a platform with non-zero O_BINARY but zero O_TEXT. - -- Life is short - so eat dessert first! Eric Blake [EMAIL PROTECTED] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCifIF84KuGfSFAYARArZZAKCDotpvCtmF64M5CSizVfBCTWuycwCeK559 OxEtCVcmNQIVzS+dc9DmvBg= =drvs -----END PGP SIGNATURE----- -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/