Re: Handling files with CRLF line ending

2022-12-06 Thread Yair Lenga
 Valid question.

I believe a major goal of bash will be to cross operate with other tools.
In this case, being able to read text files generated by python, when
running under WSL, seems like something bash should do.

On the question of minimal changes. I believe many bash users (some are not
hard core developers, just devops) are tasked with transfering existing
solutions to WSL. I am not aware of hard data, but I believe those are
underrepresented in this forum.

I admit no hard data to support any of those.

On Mon, Dec 5, 2022, 15:36 Chet Ramey  wrote:

> On 12/3/22 8:53 AM, Yair Lenga wrote:
> > Thank you for suggestions. I want to emphasize: I do not need help in
> > striping the CR from the input files - it's simple.
> >
> > The challenge is executing a working bash/python solution from Linux on
> > WSL, with MINIMAL changes to the scripts.
>
> That's certainly your priority. But is it a compelling enough reason to
> change bash to accomplish it?
>
> It seems easy enough to set up a pipeline on WSL to provide input in the
> form the script authors assume.
>
>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>  ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
>
>


Re: Handling files with CRLF line ending

2022-12-06 Thread Dale R. Worley
It seems to me that there's more going on than first meets the eye.

My understanding is that Posix's file open function allows specifying if
the file is text or binary, and in text mode, if the underlying system
natively uses CRLF for EOL, CRLF in the file is turned into LF for the
code in a transparent way.  And so I'd expect that Bash considers the
file that it is reading to execute to be text and Bash's command parser
wouldn't see CRs if it was running on a system that use CRLF on disk for
EOL.

And conversely, if you use "echo" to write a line, it goes to stdout,
which presumably has been opened in text mode.

Generally, when a command has redirection, Bash doesn't have to think
about this, since Bash only opens an FD; it's the command that is going
to execute an fdopen() that wraps a Posix FILE* around the open FD, and
to do that, will specify the I/O mode as text or binary.

So far, everybody is happy -- things automatically work as intuition
expects.

The trouble happens when a Bash built-in command reads or writes an FD.
Then Bash needs to implicitly or explicitly handle the text/binary
decision, in parallel to when a C command starts up and the C startup
code does an fdopen() on FD 0 to create the FILE* "stdin".

Looking at the code of Bash 5.2 -- and I am no expert, and I didn't
study it deeply -- it looks like "readarray/mapfile"
(builtins/mapfile.def) uses "zgetline" (lib/sh/zgetline.c) to read input
rather than the underlying Posix implementation's fdopen().  And that
function's comment says:

/* Derived from GNU libc's getline.
   The behavior is almost the same as getline. See man getline.
   The differences are
(1) using file descriptor instead of FILE *;
(2) the order of arguments: the file descriptor comes first;
(3) the addition of a fourth argument, DELIM; sets the delimiter to
be something other than newline if desired.  If setting DELIM,
the next argument should be 1; and
(4) the addition of a fifth argument, UNBUFFERED_READ; this argument
controls whether get_line uses buffering or not to get a byte data
from FD. get_line uses zreadc if UNBUFFERED_READ is zero; and
uses zread if UNBUFFERED_READ is non-zero.

   Returns number of bytes read or -1 on error. */

And zgetline() doesn't have a "mode" argument for setting the
text/binary mode.  (getline() doesn't have such an argument either, but
it takes a FILE*, not an FD.)

Dale



Re: Handling files with CRLF line ending

2022-12-06 Thread Chris Elvidge

On 06/12/2022 16:00, Dale R. Worley wrote:

It seems to me that there's more going on than first meets the eye.


Yes. Yair is trying to process text files written on a Windows system 
(line ending \r\n) on a Linux system (line ending \n). That Python wrote 
them is neither here nor there.


Windows text files have to be converted to Linux format before 
processing - either inline (tr -d '\r') or in mass (dos2unix).


Expecting bash to cope is a non-starter.

Yair, how about using the Python installed in the WSL instance.

--
Chris Elvidge
England




Re: Handling files with CRLF line ending

2022-12-06 Thread L A Walsh

On 2022/12/06 10:57, Chris Elvidge wrote:

Yair, how about using the Python installed in the WSL instance.
  

---
   Oh, I wondered why Python used CRLF, but nothing else did.

   What version of python are you using?  The Python for WSL,
the python for cygwin, or the python for Windows?  If you are
using python for Windows, I'd *sorta* expect it to use CRLF, but
would expect WSL or Cygwin versions to use just 'LF'.  Similarly w/bash --
I haven't tested it, but I'd expect bash compiled for windows
(using mingw toolchain) to use CRLF, but LF for WSL or Cygwin.

Are you using both tools for the same OS and subsys and having
them conflict?





Re: Handling files with CRLF line ending

2022-12-06 Thread Koichi Murase
2022年12月7日(水) 8:40 L A Walsh :
> [...]  Similarly w/bash --
> I haven't tested it, but I'd expect bash compiled for windows
> (using mingw toolchain) to use CRLF, but LF for WSL or Cygwin.

I think there is actually no Bash compiled for Windows (i.e., the pure
Windows API on the Windows subsystem). The Bash that comes with the
MinGW toolchain is linked with msys-2.0.dll (in the case of MSYS2),
which means that the POSIX layer Bash relies on is provided by MSYS
which is a minimized fork of Cygwin. The MSYS Bash treats LF as the
newline but not CRLF.

> Are you using both tools for the same OS and subsys and having
> them conflict?

I think so. I think this means that the reported configuration is
wrong or, at least, very unusual. I don't think we should add in Bash
an option that is only meaningful in a specific non-unix-like
operating system for a heterogeneous amalgam of programs from
different subsystems. That option is practically useless in all of the
major Unix-like systems.

If something would be modified at the side of Bash, maybe there is a
chance that Bash of the Cygwin/MSYS packages could be patched like
`shopt -s completion_strip_exe'. But even with that case, a question
is why filtering by `tr' is not an option. The answer seemed to be to
make the program work unmodified, but I don't think we should expect
that the combination of programs from different subsystems will work
unmodified in general.

--
Koichi