Hi all,

On Windows I'm seeing the following:

> tf <- tempfile()
> writeBin(charToRaw("\r\r\n"), tf)
> readLines(file(tf, "r"))
[1] ""
> readLines(file(tf, "rb"))
[1] "" "" ""

The former matches Guido's observation of a disappearing line (also on Windows),
the latter seems to correspond to Duncan's observation of an extra line on 
MacOS.

As mentioned before, according to the docs in ?readLines it seems both should 
result in c("", ""):

> Whatever mode the connection is opened in, any of LF, CRLF or CR will be 
> accepted as the EOL marker for a line.

Inspecting the source [1], Rconn_fgetc endeavours to convert CR and CRLF to LF,
but on encountering the sequence \r\r returns \n and stores \n in the saved
character buffer, resulting in CR, CRLF being interpreted as LF, LF, LF.

I believe that accounts for the extra new line on MacOS (and Linux) and in
binary mode on Windows, but leaves the disappearing line in text mode on Windows
a bit of a mystery still.

Best,

Mikko

[1]: 
https://github.com/wch/r-source/blob/ff4be744953e2702f4194f01f669b7c93b44933e/src/main/connections.c#L4315C1-L4323C11

-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of Duncan Murdoch
Sent: Wednesday, 25 June 2025 15:03
To: Heuvel, E.G. van den (Guido) <g.vandenheu...@cbs.nl>; 
'r-help@R-project.org' <r-help@R-project.org>
Subject: Re: [R] Potential bug in readLines when reading empty lines

On 2025-06-25 2:59 a.m., Heuvel, E.G. van den (Guido) via R-help wrote:
> Hi all,
>
> I encountered some weird behaviour with readLines() recently, and I am 
> wondering if this might be a bug, or, if it is not, how to resolve it. The 
> issue is as follows:
>
> If I have a text file where a line ends with just a carriage return (\r, CR) 
> while the next line is empty and ends in a carriage return / linefeed (\r\n, 
> CR LF), then the empty line is skipped when reading the file with readLines. 
> The following code contains a test case:
>
> ---
> print(R.version)
> # platform       x86_64-w64-mingw32
> # arch           x86_64
> # os             mingw32
> # crt            ucrt
> # system         x86_64, mingw32
> # status
> # major          4
> # minor          4.0
> # year           2024
> # month          04
> # day            24
> # svn rev        86474
> # language       R
> # version.string R version 4.4.0 (2024-04-24 ucrt)
> # nickname       Puppy Cup
>
> txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n")

Doesn't that produce the same thing as "Line 1\r\r\nLine 3\r\n" when you write 
it with writeBin?  If I read the ?readLines page correctly, that string 
contains 3 lines:

    Line 1 CR
    CR LF
    Line 3 CR LF

On the other hand, when I use your construction or mine, I get 4 lines read by 
my Mac:

  readLines("test.txt")
[1] "Line 1" ""       ""       "Line 3"

I'd guess it is processing it as


    Line 1 CR
    CR
    LF
    Line 3 CR LF

So I think there are definitely bugs or bad docs here.

Duncan Murdoch

>
> # Write txt_original as binary to avoid unwanted conversion of end of
> line markers writeBin(charToRaw(txt_original), "test.txt")
>
> txt_actual <- readLines("test.txt")
> print(txt_actual)
> # [1] "Line 1" "Line 3"
>   ---
>
> I included the output of this script on my machine in the comments. I would 
> expect txt_actual to be equal to c("Line 1", "", "Line 3"), but the empty 
> line is skipped.
>
> Is this a bug? And if not, how should I read test.txt in such a way that the 
> empty 2nd line is left intact?
>
> Best regards,
>
> Guido van den Heuvel
> Statistics Netherlands
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This e-mail transmission may contain confidential or legally privileged 
information that is intended only for the individual or entity named in the 
e-mail address. If you are not the intended recipient, you are hereby notified 
that any disclosure, copying, distribution, or reliance upon the contents of 
this e-mail is strictly prohibited. If you have received this e-mail 
transmission in error, please reply to the sender, so that they can arrange for 
proper delivery, and then please delete the message from your computer systems. 
Thank you.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to