The end of file is a problem. In my case I have data files that can end in one 
of several ways. A line can end with \r or \n.
1) No line feed at the end of the last row of data.
2) One line feed at the end of the last row of data.
3) Multiple line feeds at the end of the last row of data.
4) All of the above except with carriage return.
5) A file could end with a line feed and a carriage return.
Some of this is "self-inflicted." People can open the data files in some other 
program and "accidentally" add a line feed or several. They then save and close 
the file before sending it to me.

1) Place all files in one folder with nothing else in the folder.
2) In R get the folder from the user. I used chose.dir()
3) get a list of all files using list.files()
4) Loop through all of the files.
        a) read the file in binary using readBin()
        b) Identify if the file uses \r\n,  \n or \r.
                # This code will do the first step in counting the number of 
\r\n, then one removes \r\n from the file (if it exists) and counts \r and then 
\n.
                  num_crlf <- length(gregexpr("\r\n", content, fixed = 
TRUE)[[1]])
        b) remove all \n and \r at the end of the file.
        c) add one \n or \r to the end of the file as identified in 4a.
        d) save file
        e) end loop

The exact code will depend on what sort of files you are dealing with. 
Unexpected files can generate errors unless trapped for. An empty file, or a 
file that has been edited by multiple users.

Tim

-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of Heuvel, E.G. van den 
(Guido) via R-help
Sent: Wednesday, June 25, 2025 3:00 AM
To: 'r-help@R-project.org' <r-help@R-project.org>
Subject: [R] Potential bug in readLines when reading empty lines

[External Email]

Hi all,

I encountered some weird behaviour with readLines() recently, and I am 
wondering if this might be a bug, or, if it is not, how to resolve it. The 
issue is as follows:

If I have a text file where a line ends with just a carriage return (\r, CR) 
while the next line is empty and ends in a carriage return / linefeed (\r\n, CR 
LF), then the empty line is skipped when reading the file with readLines. The 
following code contains a test case:

---
print(R.version)
# platform       x86_64-w64-mingw32
# arch           x86_64
# os             mingw32
# crt            ucrt
# system         x86_64, mingw32
# status
# major          4
# minor          4.0
# year           2024
# month          04
# day            24
# svn rev        86474
# language       R
# version.string R version 4.4.0 (2024-04-24 ucrt)
# nickname       Puppy Cup

txt_original <- paste0("Line 1\r", "\r\n", "Line 3\r\n")

# Write txt_original as binary to avoid unwanted conversion of end of line 
markers writeBin(charToRaw(txt_original), "test.txt")

txt_actual <- readLines("test.txt")
print(txt_actual)
# [1] "Line 1" "Line 3"
 ---

I included the output of this script on my machine in the comments. I would 
expect txt_actual to be equal to c("Line 1", "", "Line 3"), but the empty line 
is skipped.

Is this a bug? And if not, how should I read test.txt in such a way that the 
empty 2nd line is left intact?

Best regards,

Guido van den Heuvel
Statistics Netherlands

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to