Re: Handling files with CRLF line ending
Valid question. I believe a major goal of bash will be to cross operate with other tools. In this case, being able to read text files generated by python, when running under WSL, seems like something bash should do. On the question of minimal changes. I believe many bash users (some are not hard core developers, just devops) are tasked with transfering existing solutions to WSL. I am not aware of hard data, but I believe those are underrepresented in this forum. I admit no hard data to support any of those. On Mon, Dec 5, 2022, 15:36 Chet Ramey wrote: > On 12/3/22 8:53 AM, Yair Lenga wrote: > > Thank you for suggestions. I want to emphasize: I do not need help in > > striping the CR from the input files - it's simple. > > > > The challenge is executing a working bash/python solution from Linux on > > WSL, with MINIMAL changes to the scripts. > > That's certainly your priority. But is it a compelling enough reason to > change bash to accomplish it? > > It seems easy enough to set up a pipeline on WSL to provide input in the > form the script authors assume. > > > -- > ``The lyf so short, the craft so long to lerne.'' - Chaucer > ``Ars longa, vita brevis'' - Hippocrates > Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ > >
Re: Handling files with CRLF line ending
It seems to me that there's more going on than first meets the eye. My understanding is that Posix's file open function allows specifying if the file is text or binary, and in text mode, if the underlying system natively uses CRLF for EOL, CRLF in the file is turned into LF for the code in a transparent way. And so I'd expect that Bash considers the file that it is reading to execute to be text and Bash's command parser wouldn't see CRs if it was running on a system that use CRLF on disk for EOL. And conversely, if you use "echo" to write a line, it goes to stdout, which presumably has been opened in text mode. Generally, when a command has redirection, Bash doesn't have to think about this, since Bash only opens an FD; it's the command that is going to execute an fdopen() that wraps a Posix FILE* around the open FD, and to do that, will specify the I/O mode as text or binary. So far, everybody is happy -- things automatically work as intuition expects. The trouble happens when a Bash built-in command reads or writes an FD. Then Bash needs to implicitly or explicitly handle the text/binary decision, in parallel to when a C command starts up and the C startup code does an fdopen() on FD 0 to create the FILE* "stdin". Looking at the code of Bash 5.2 -- and I am no expert, and I didn't study it deeply -- it looks like "readarray/mapfile" (builtins/mapfile.def) uses "zgetline" (lib/sh/zgetline.c) to read input rather than the underlying Posix implementation's fdopen(). And that function's comment says: /* Derived from GNU libc's getline. The behavior is almost the same as getline. See man getline. The differences are (1) using file descriptor instead of FILE *; (2) the order of arguments: the file descriptor comes first; (3) the addition of a fourth argument, DELIM; sets the delimiter to be something other than newline if desired. If setting DELIM, the next argument should be 1; and (4) the addition of a fifth argument, UNBUFFERED_READ; this argument controls whether get_line uses buffering or not to get a byte data from FD. get_line uses zreadc if UNBUFFERED_READ is zero; and uses zread if UNBUFFERED_READ is non-zero. Returns number of bytes read or -1 on error. */ And zgetline() doesn't have a "mode" argument for setting the text/binary mode. (getline() doesn't have such an argument either, but it takes a FILE*, not an FD.) Dale
Re: Handling files with CRLF line ending
On 06/12/2022 16:00, Dale R. Worley wrote: It seems to me that there's more going on than first meets the eye. Yes. Yair is trying to process text files written on a Windows system (line ending \r\n) on a Linux system (line ending \n). That Python wrote them is neither here nor there. Windows text files have to be converted to Linux format before processing - either inline (tr -d '\r') or in mass (dos2unix). Expecting bash to cope is a non-starter. Yair, how about using the Python installed in the WSL instance. -- Chris Elvidge England
Re: Handling files with CRLF line ending
On 2022/12/06 10:57, Chris Elvidge wrote: Yair, how about using the Python installed in the WSL instance. --- Oh, I wondered why Python used CRLF, but nothing else did. What version of python are you using? The Python for WSL, the python for cygwin, or the python for Windows? If you are using python for Windows, I'd *sorta* expect it to use CRLF, but would expect WSL or Cygwin versions to use just 'LF'. Similarly w/bash -- I haven't tested it, but I'd expect bash compiled for windows (using mingw toolchain) to use CRLF, but LF for WSL or Cygwin. Are you using both tools for the same OS and subsys and having them conflict?
Re: Handling files with CRLF line ending
2022年12月7日(水) 8:40 L A Walsh : > [...] Similarly w/bash -- > I haven't tested it, but I'd expect bash compiled for windows > (using mingw toolchain) to use CRLF, but LF for WSL or Cygwin. I think there is actually no Bash compiled for Windows (i.e., the pure Windows API on the Windows subsystem). The Bash that comes with the MinGW toolchain is linked with msys-2.0.dll (in the case of MSYS2), which means that the POSIX layer Bash relies on is provided by MSYS which is a minimized fork of Cygwin. The MSYS Bash treats LF as the newline but not CRLF. > Are you using both tools for the same OS and subsys and having > them conflict? I think so. I think this means that the reported configuration is wrong or, at least, very unusual. I don't think we should add in Bash an option that is only meaningful in a specific non-unix-like operating system for a heterogeneous amalgam of programs from different subsystems. That option is practically useless in all of the major Unix-like systems. If something would be modified at the side of Bash, maybe there is a chance that Bash of the Cygwin/MSYS packages could be patched like `shopt -s completion_strip_exe'. But even with that case, a question is why filtering by `tr' is not an option. The answer seemed to be to make the program work unmodified, but I don't think we should expect that the combination of programs from different subsystems will work unmodified in general. -- Koichi