Re: [Rd] read.csv

2024-04-27 Thread Kevin Coombes
I was horrified when I saw John Weinstein's article about Excel turning gene names into dates. Mainly because I had been complaining about that phenomenon for years, and it never remotely occurred to me that you could get a publication out of it. I eventually rectified the situation by publishing

Re: [Rd] read.csv

2024-04-16 Thread Reed A. Cartwright
Gene names being misinterpreted by spreadsheet software (read.csv is no different) is a classic issue in bioinformatics. It seems like every practitioner ends up encountering this issue in due time. E.g. https://pubmed.ncbi.nlm.nih.gov/15214961/ https://genomebiology.biomedcentral.com/articles/10

Re: [Rd] read.csv

2024-04-16 Thread Ben Bolker
Tangentially, your code will be more efficient if you add the data files to a *list* one by one and then apply bind_rows or do.call(rbind,...) after you have accumulated all of the information (see chapter 2 of the _R Inferno_). This may or may not be practically important in your particular

Re: [Rd] read.csv

2024-04-16 Thread Dirk Eddelbuettel
As an aside, the odd format does not seem to bother data.table::fread() which also happens to be my personally preferred workhorse for these tasks: > fname <- "/tmp/r/filename.csv" > read.csv(fname) Gene SNP prot log10p 1 YWHAE 13:62129097_C_T 1433 7.35 2 YWHAE 4:72617557_T_TA 1

Re: [Rd] read.csv

2024-04-16 Thread Duncan Murdoch
On 16/04/2024 7:36 a.m., Rui Barradas wrote: Às 11:46 de 16/04/2024, jing hua zhao escreveu: Dear R-developers, I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it

Re: [Rd] read.csv

2024-04-16 Thread peter dalgaard
Hum... This boils down to > as.numeric("1.23e") [1] 1.23 > as.numeric("1.23e-") [1] 1.23 > as.numeric("1.23e+") [1] 1.23 which in turn comes from this code in src/main/util.c (function R_strtod) if (*p == 'e' || *p == 'E') { int expsign = 1; switch(*++p) { case '-':

Re: [Rd] read.csv

2024-04-16 Thread Rui Barradas
Às 11:46 de 16/04/2024, jing hua zhao escreveu: Dear R-developers, I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it becomes, Gene,SNP,prot,log10p YWHAE,13:621290

Re: [Rd] read.csv

2024-04-16 Thread Dirk Eddelbuettel
On 16 April 2024 at 10:46, jing hua zhao wrote: | Dear R-developers, | | I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it becomes, | | Gene,SNP,prot,log10p | YWH

Re: [Rd] read.csv, worrying behaviour?

2021-02-25 Thread Kevin R. Coombes
I believe this is documented behavior. The 'read.csv' function is a front-end to 'read.table' with different default values. IN this particular case, read.csv sets fill = TRUE, which means that it is supposed to fill incomplete lines with NA's. It also sets header=TRUE, which is presumably what

Re: [Rd] read.csv, worrying behaviour?

2021-02-25 Thread TAYLOR, Benjamin (BLACKPOOL TEACHING HOSPITALS NHS FOUNDATION TRUST) via R-devel
Dear all I've been using R for around 16 years now and I've only just become aware of a behaviour of read.csv that I find worrying which is why I'm contacting this list. A simplified example of the behaviour is as follows I created a "test.csv" file containing the following lines: a,b,c,d,e,f,

Re: [Rd] read.csv reads more rows than indicated by wc -l

2012-12-20 Thread Matthew Dowle
Ben, Somewhere on my wish/TO DO list is for someone to rewrite read.table for better robustness *and* efficiency ... Wish granted. New in data.table 1.8.7 : = New function fread(), a fast and friendly file reader. * header, skip, nrows, sep and colClasses are all auto detected. * inte

Re: [Rd] read.csv reads more rows than indicated by wc -l

2012-12-19 Thread Ben Bolker
G See gmail.com> writes: > > When I have a csv file that is more than 6 lines long, not including > the header, and one of the fields is blank for the last few lines, and > there is an extra comma on of the lines with the blank field, > read.csv() makes creates an extra line. > > I attached an

Re: [Rd] read.csv behaviour

2011-09-28 Thread Ben Bolker
Mehmet Suzen mango-solutions.com> writes: > This might be obvious but I was wondering if anyone knows quick and easy > way of writing out a CSV file with varying row lengths, ideally an > initial data read from a CSV file which has the same format. See example > below. > > writeLines(c("A,B,C,D"

Re: [Rd] read.csv and FileEncoding in Windows version of R 2.13.0

2011-06-06 Thread Alexander Peterhansl
leEncoding="UTF-8",header=FALSE) (As you'll see, the file does have a byte order mark.) Regards, Alex -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: Wednesday, June 01, 2011 7:35 PM To: Alexander Peterhansl Cc: R-devel@r-project.org Subje

Re: [Rd] read.csv and FileEncoding in Windows version of R 2.13.0

2011-06-01 Thread Duncan Murdoch
On 01/06/2011 6:00 PM, Alexander Peterhansl wrote: Dear R-devel List: read.csv() seems to have changed in R version 2.13.0 as compared to version 2.12.2 when reading in simple CSV files. Suppose I read in a 2-column CSV file ("test.csv"), say 1, a 2, b If file is encoded as UTF-8 (on Windows

Re: [Rd] read.csv trap

2011-03-03 Thread Ben Bolker
Ben Bolker gmail.com> writes: > On 02/11/2011 03:37 PM, Laurent Gatto wrote: > > On 11 February 2011 19:39, Ben Bolker gmail.com> wrote: > >> > > [snip] > >> Bump. Is there any opinion about this from R-core?? Will I be scolded if I submit this as a bug ... ?? > >> What is dangerous/confu

Re: [Rd] read.csv trap

2011-02-11 Thread Ben Bolker
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/11/2011 03:37 PM, Laurent Gatto wrote: > On 11 February 2011 19:39, Ben Bolker wrote: >> > [snip] >> >> What is dangerous/confusing is that R silently **wraps** longer lines if >> fill=TRUE (which is the default for read.csv). I encountered thi

Re: [Rd] read.csv trap

2011-02-11 Thread Laurent Gatto
On 11 February 2011 19:39, Ben Bolker wrote: > [snip] > > What is dangerous/confusing is that R silently **wraps** longer lines if > fill=TRUE (which is the default for read.csv).  I encountered this when > working with a colleague on a long, messy CSV file that had some phantom > extra fields in

Re: [Rd] read.csv trap

2011-02-11 Thread Ken.Williams
On 2/11/11 1:39 PM, "Ben Bolker" wrote: >[snip] > Original Message >Subject: read.csv trap >Date: Fri, 04 Feb 2011 11:16:36 -0500 >From: Ben Bolker >To: r-de...@stat.math.ethz.ch , David Earn > > >[snip] >What is dangerous/confusing is that R silently **wraps** longer lines i

Re: [Rd] read.csv trap

2011-02-11 Thread Ben Bolker
Bump. It's been a week since I posted this to r-devel. Any thoughts/discussion? Would R-core be irritated if I submitted a bug report? cheers Ben Original Message Subject: read.csv trap Date: Fri, 04 Feb 2011 11:16:36 -0500 From: Ben Bolker To: r-de...@stat.math.

Re: [Rd] read.csv confused by newline characters in header (PR#14103)

2009-12-02 Thread Peter Dalgaard
g.russ...@eos-solutions.com wrote: > Full_Name: George Russell > Version: 2.10.0 > OS: Microsoft Windows XP Service Pack 2 > Submission from: (NULL) (217.111.3.131) > > > The following code (typed into R --vanilla) > > testString <- '"B1\nB2"\n1\n' > con <- textConnection(testString) > tab <- re

Re: [Rd] read.csv

2009-06-25 Thread Petr Savicky
I am sorry for not including the attachment mentioned in my previous email. Attached now. Petr. --- R-devel/src/library/utils/R/readtable.R 2009-05-18 17:53:08.0 +0200 +++ R-devel-readtable/src/library/utils/R/readtable.R 2009-06-25 10:20:06.0 +0200 @@ -143,9 +143,6 @@

Re: [Rd] read.csv

2009-06-25 Thread Petr Savicky
On Sun, Jun 14, 2009 at 02:56:01PM -0400, Gabor Grothendieck wrote: > If read.csv's colClasses= argument is NOT used then read.csv accepts > double quoted numerics: > > 1: > read.csv(stdin()) > 0: A,B > 1: "1",1 > 2: "2",2 > 3: > A B > 1 1 1 > 2 2 2 > > However, if colClasses is used then it se

Re: [Rd] read.csv

2009-06-16 Thread Petr Savicky
On Sun, Jun 14, 2009 at 09:21:24PM +0100, Ted Harding wrote: > On 14-Jun-09 18:56:01, Gabor Grothendieck wrote: > > If read.csv's colClasses= argument is NOT used then read.csv accepts > > double quoted numerics: > > > > 1: > read.csv(stdin()) > > 0: A,B > > 1: "1",1 > > 2: "2",2 > > 3: > > A B

Re: [Rd] read.csv

2009-06-14 Thread Gabor Grothendieck
On Sun, Jun 14, 2009 at 4:21 PM, Ted Harding wrote: > Or am I missing something?!! The point of this is that the current behavior is not desirable since you can't have quoted numeric fields if you specify colClasses = "numeric" yet you can if you don't. The concepts are not orthogonal but should

Re: [Rd] read.csv

2009-06-14 Thread Ted Harding
On 14-Jun-09 18:56:01, Gabor Grothendieck wrote: > If read.csv's colClasses= argument is NOT used then read.csv accepts > double quoted numerics: > > 1: > read.csv(stdin()) > 0: A,B > 1: "1",1 > 2: "2",2 > 3: > A B > 1 1 1 > 2 2 2 > > However, if colClasses is used then it seems that it does no