Re: [Rd] read.csv

2024-04-27 Thread Kevin Coombes
I was horrified when I saw John Weinstein's article about Excel turning gene names into dates. Mainly because I had been complaining about that phenomenon for years, and it never remotely occurred to me that you could get a publication out of it. I eventually rectified the situation by publishing

Re: [Rd] read.csv

2024-04-16 Thread Reed A. Cartwright
Gene names being misinterpreted by spreadsheet software (read.csv is no different) is a classic issue in bioinformatics. It seems like every practitioner ends up encountering this issue in due time. E.g. https://pubmed.ncbi.nlm.nih.gov/15214961/ https://genomebiology.biomedcentral.com/articles/10

Re: [Rd] read.csv

2024-04-16 Thread Ben Bolker
Tangentially, your code will be more efficient if you add the data files to a *list* one by one and then apply bind_rows or do.call(rbind,...) after you have accumulated all of the information (see chapter 2 of the _R Inferno_). This may or may not be practically important in your particular

Re: [Rd] read.csv

2024-04-16 Thread Dirk Eddelbuettel
As an aside, the odd format does not seem to bother data.table::fread() which also happens to be my personally preferred workhorse for these tasks: > fname <- "/tmp/r/filename.csv" > read.csv(fname) Gene SNP prot log10p 1 YWHAE 13:62129097_C_T 1433 7.35 2 YWHAE 4:72617557_T_TA 1

Re: [Rd] read.csv

2024-04-16 Thread Duncan Murdoch
On 16/04/2024 7:36 a.m., Rui Barradas wrote: Às 11:46 de 16/04/2024, jing hua zhao escreveu: Dear R-developers, I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it

Re: [Rd] read.csv

2024-04-16 Thread peter dalgaard
Hum... This boils down to > as.numeric("1.23e") [1] 1.23 > as.numeric("1.23e-") [1] 1.23 > as.numeric("1.23e+") [1] 1.23 which in turn comes from this code in src/main/util.c (function R_strtod) if (*p == 'e' || *p == 'E') { int expsign = 1; switch(*++p) { case '-':

Re: [Rd] read.csv

2024-04-16 Thread Rui Barradas
Às 11:46 de 16/04/2024, jing hua zhao escreveu: Dear R-developers, I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it becomes, Gene,SNP,prot,log10p YWHAE,13:621290

Re: [Rd] read.csv

2024-04-16 Thread Dirk Eddelbuettel
On 16 April 2024 at 10:46, jing hua zhao wrote: | Dear R-developers, | | I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it becomes, | | Gene,SNP,prot,log10p | YWH

[Rd] read.csv

2024-04-16 Thread jing hua zhao
Dear R-developers, I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it becomes, Gene,SNP,prot,log10p YWHAE,13:62129097_C_T,1433E,7.35 YWHAE,4:72617557_T_TA,1433E,7.73

[Rd] read.csv quadratic time in number of columns

2023-03-29 Thread Toby Hocking
Dear R-devel, A number of people have observed anecdotally that read.csv is slow for large number of columns, for example: https://stackoverflow.com/questions/7327851/read-csv-is-extremely-slow-in-reading-csv-files-with-large-numbers-of-columns I did a systematic comparison of read.csv with similar

Re: [Rd] read.csv, worrying behaviour?

2021-02-25 Thread Kevin R. Coombes
I believe this is documented behavior. The 'read.csv' function is a front-end to 'read.table' with different default values. IN this particular case, read.csv sets fill = TRUE, which means that it is supposed to fill incomplete lines with NA's. It also sets header=TRUE, which is presumably what

Re: [Rd] read.csv, worrying behaviour?

2021-02-25 Thread TAYLOR, Benjamin (BLACKPOOL TEACHING HOSPITALS NHS FOUNDATION TRUST) via R-devel
Dear all I've been using R for around 16 years now and I've only just become aware of a behaviour of read.csv that I find worrying which is why I'm contacting this list. A simplified example of the behaviour is as follows I created a "test.csv" file containing the following lines: a,b,c,d,e,f,

Re: [Rd] read.csv reads more rows than indicated by wc -l

2012-12-20 Thread Matthew Dowle
Ben, Somewhere on my wish/TO DO list is for someone to rewrite read.table for better robustness *and* efficiency ... Wish granted. New in data.table 1.8.7 : = New function fread(), a fast and friendly file reader. * header, skip, nrows, sep and colClasses are all auto detected. * inte

Re: [Rd] read.csv reads more rows than indicated by wc -l

2012-12-19 Thread Ben Bolker
G See gmail.com> writes: > > When I have a csv file that is more than 6 lines long, not including > the header, and one of the fields is blank for the last few lines, and > there is an extra comma on of the lines with the blank field, > read.csv() makes creates an extra line. > > I attached an

[Rd] read.csv reads more rows than indicated by wc -l

2012-12-19 Thread G See
When I have a csv file that is more than 6 lines long, not including the header, and one of the fields is blank for the last few lines, and there is an extra comma on of the lines with the blank field, read.csv() makes creates an extra line. I attached an example file; I'll also paste the contents

Re: [Rd] read.csv behaviour

2011-09-28 Thread Ben Bolker
Mehmet Suzen mango-solutions.com> writes: > This might be obvious but I was wondering if anyone knows quick and easy > way of writing out a CSV file with varying row lengths, ideally an > initial data read from a CSV file which has the same format. See example > below. > > writeLines(c("A,B,C,D"

[Rd] read.csv behaviour

2011-09-27 Thread Mehmet Suzen
This might be obvious but I was wondering if anyone knows quick and easy way of writing out a CSV file with varying row lengths, ideally an initial data read from a CSV file which has the same format. See example below. I found it quite strange that R cannot write it in one go, so one must appen

Re: [Rd] read.csv and FileEncoding in Windows version of R 2.13.0

2011-06-06 Thread Alexander Peterhansl
leEncoding="UTF-8",header=FALSE) (As you'll see, the file does have a byte order mark.) Regards, Alex -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: Wednesday, June 01, 2011 7:35 PM To: Alexander Peterhansl Cc: R-devel@r-project.org Subje

Re: [Rd] read.csv and FileEncoding in Windows version of R 2.13.0

2011-06-01 Thread Duncan Murdoch
On 01/06/2011 6:00 PM, Alexander Peterhansl wrote: Dear R-devel List: read.csv() seems to have changed in R version 2.13.0 as compared to version 2.12.2 when reading in simple CSV files. Suppose I read in a 2-column CSV file ("test.csv"), say 1, a 2, b If file is encoded as UTF-8 (on Windows

[Rd] read.csv and FileEncoding in Windows version of R 2.13.0

2011-06-01 Thread Alexander Peterhansl
Dear R-devel List: read.csv() seems to have changed in R version 2.13.0 as compared to version 2.12.2 when reading in simple CSV files. Suppose I read in a 2-column CSV file ("test.csv"), say 1, a 2, b If file is encoded as UTF-8 (on Windows 7), then under R 2.13.0 read.csv("test.csv",fileEnco

Re: [Rd] read.csv trap

2011-03-03 Thread Ben Bolker
Ben Bolker gmail.com> writes: > On 02/11/2011 03:37 PM, Laurent Gatto wrote: > > On 11 February 2011 19:39, Ben Bolker gmail.com> wrote: > >> > > [snip] > >> Bump. Is there any opinion about this from R-core?? Will I be scolded if I submit this as a bug ... ?? > >> What is dangerous/confu

Re: [Rd] read.csv trap

2011-02-11 Thread Ben Bolker
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/11/2011 03:37 PM, Laurent Gatto wrote: > On 11 February 2011 19:39, Ben Bolker wrote: >> > [snip] >> >> What is dangerous/confusing is that R silently **wraps** longer lines if >> fill=TRUE (which is the default for read.csv). I encountered thi

Re: [Rd] read.csv trap

2011-02-11 Thread Laurent Gatto
On 11 February 2011 19:39, Ben Bolker wrote: > [snip] > > What is dangerous/confusing is that R silently **wraps** longer lines if > fill=TRUE (which is the default for read.csv).  I encountered this when > working with a colleague on a long, messy CSV file that had some phantom > extra fields in

Re: [Rd] read.csv trap

2011-02-11 Thread Ken.Williams
On 2/11/11 1:39 PM, "Ben Bolker" wrote: >[snip] > Original Message >Subject: read.csv trap >Date: Fri, 04 Feb 2011 11:16:36 -0500 >From: Ben Bolker >To: r-de...@stat.math.ethz.ch , David Earn > > >[snip] >What is dangerous/confusing is that R silently **wraps** longer lines i

Re: [Rd] read.csv trap

2011-02-11 Thread Ben Bolker
Bump. It's been a week since I posted this to r-devel. Any thoughts/discussion? Would R-core be irritated if I submitted a bug report? cheers Ben Original Message Subject: read.csv trap Date: Fri, 04 Feb 2011 11:16:36 -0500 From: Ben Bolker To: r-de...@stat.math.

[Rd] read.csv trap

2011-02-04 Thread Ben Bolker
This is not specifically a bug, but an (implicitly/obscurely) documented behavior of read.csv (or read.table with fill=TRUE) that can be quite dangerous/confusing for users. I would love to hear some discussion from other users and/or R-core about this ... As always, I apologize if I have misse

[Rd] read.csv('/dev/stdin') fails (PR#14218)

2010-02-20 Thread egoldlust
Full_Name: Eric Goldlust Version: 2.10.1 (2009-12-14) x86_64-unknown-linux-gnu OS: Linux 2.6.9-67.0.1.ELsmp x86_64 Submission from: (NULL) (64.22.160.1) After upgrading to from 2.9.1 to 2.10.1, I get unexpected results when calling read.csv('/dev/stdin'). These problems go away when I call read

Re: [Rd] read.csv confused by newline characters in header (PR#14103)

2009-12-02 Thread Peter Dalgaard
g.russ...@eos-solutions.com wrote: > Full_Name: George Russell > Version: 2.10.0 > OS: Microsoft Windows XP Service Pack 2 > Submission from: (NULL) (217.111.3.131) > > > The following code (typed into R --vanilla) > > testString <- '"B1\nB2"\n1\n' > con <- textConnection(testString) > tab <- re

[Rd] read.csv confused by newline characters in header (PR#14103)

2009-12-02 Thread g . russell
Full_Name: George Russell Version: 2.10.0 OS: Microsoft Windows XP Service Pack 2 Submission from: (NULL) (217.111.3.131) The following code (typed into R --vanilla) testString <- '"B1\nB2"\n1\n' con <- textConnection(testString) tab <- read.csv(con,stringsAsFactors = FALSE) produces a data fra

Re: [Rd] read.csv

2009-06-25 Thread Petr Savicky
I am sorry for not including the attachment mentioned in my previous email. Attached now. Petr. --- R-devel/src/library/utils/R/readtable.R 2009-05-18 17:53:08.0 +0200 +++ R-devel-readtable/src/library/utils/R/readtable.R 2009-06-25 10:20:06.0 +0200 @@ -143,9 +143,6 @@

Re: [Rd] read.csv

2009-06-25 Thread Petr Savicky
On Sun, Jun 14, 2009 at 02:56:01PM -0400, Gabor Grothendieck wrote: > If read.csv's colClasses= argument is NOT used then read.csv accepts > double quoted numerics: > > 1: > read.csv(stdin()) > 0: A,B > 1: "1",1 > 2: "2",2 > 3: > A B > 1 1 1 > 2 2 2 > > However, if colClasses is used then it se

Re: [Rd] read.csv

2009-06-16 Thread Petr Savicky
On Sun, Jun 14, 2009 at 09:21:24PM +0100, Ted Harding wrote: > On 14-Jun-09 18:56:01, Gabor Grothendieck wrote: > > If read.csv's colClasses= argument is NOT used then read.csv accepts > > double quoted numerics: > > > > 1: > read.csv(stdin()) > > 0: A,B > > 1: "1",1 > > 2: "2",2 > > 3: > > A B

Re: [Rd] read.csv

2009-06-14 Thread Gabor Grothendieck
On Sun, Jun 14, 2009 at 4:21 PM, Ted Harding wrote: > Or am I missing something?!! The point of this is that the current behavior is not desirable since you can't have quoted numeric fields if you specify colClasses = "numeric" yet you can if you don't. The concepts are not orthogonal but should

Re: [Rd] read.csv

2009-06-14 Thread Ted Harding
On 14-Jun-09 18:56:01, Gabor Grothendieck wrote: > If read.csv's colClasses= argument is NOT used then read.csv accepts > double quoted numerics: > > 1: > read.csv(stdin()) > 0: A,B > 1: "1",1 > 2: "2",2 > 3: > A B > 1 1 1 > 2 2 2 > > However, if colClasses is used then it seems that it does no

[Rd] read.csv

2009-06-14 Thread Gabor Grothendieck
If read.csv's colClasses= argument is NOT used then read.csv accepts double quoted numerics: 1: > read.csv(stdin()) 0: A,B 1: "1",1 2: "2",2 3: A B 1 1 1 2 2 2 However, if colClasses is used then it seems that it does not: > read.csv(stdin(), colClasses = "numeric") 0: A,B 1: "1",1 2: "2",2 3: