I was horrified when I saw John Weinstein's article about Excel turning
gene names into dates. Mainly because I had been complaining about that
phenomenon for years, and it never remotely occurred to me that you could
get a publication out of it.
I eventually rectified the situation by publishing
Gene names being misinterpreted by spreadsheet software (read.csv is
no different) is a classic issue in bioinformatics. It seems like
every practitioner ends up encountering this issue in due time. E.g.
https://pubmed.ncbi.nlm.nih.gov/15214961/
https://genomebiology.biomedcentral.com/articles/10
Tangentially, your code will be more efficient if you add the data
files to a *list* one by one and then apply bind_rows or
do.call(rbind,...) after you have accumulated all of the information
(see chapter 2 of the _R Inferno_). This may or may not be practically
important in your particular
As an aside, the odd format does not seem to bother data.table::fread() which
also happens to be my personally preferred workhorse for these tasks:
> fname <- "/tmp/r/filename.csv"
> read.csv(fname)
Gene SNP prot log10p
1 YWHAE 13:62129097_C_T 1433 7.35
2 YWHAE 4:72617557_T_TA 1
On 16/04/2024 7:36 a.m., Rui Barradas wrote:
Às 11:46 de 16/04/2024, jing hua zhao escreveu:
Dear R-developers,
I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile
to note -- my data involves a protein named "1433E" but to save space I drop
the quote so it
Hum...
This boils down to
> as.numeric("1.23e")
[1] 1.23
> as.numeric("1.23e-")
[1] 1.23
> as.numeric("1.23e+")
[1] 1.23
which in turn comes from this code in src/main/util.c (function R_strtod)
if (*p == 'e' || *p == 'E') {
int expsign = 1;
switch(*++p) {
case '-':
Às 11:46 de 16/04/2024, jing hua zhao escreveu:
Dear R-developers,
I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile
to note -- my data involves a protein named "1433E" but to save space I drop
the quote so it becomes,
Gene,SNP,prot,log10p
YWHAE,13:621290
On 16 April 2024 at 10:46, jing hua zhao wrote:
| Dear R-developers,
|
| I came to a somewhat unexpected behaviour of read.csv() which is trivial but
worthwhile to note -- my data involves a protein named "1433E" but to save
space I drop the quote so it becomes,
|
| Gene,SNP,prot,log10p
| YWH
Dear R-developers,
I came to a somewhat unexpected behaviour of read.csv() which is trivial but
worthwhile to note -- my data involves a protein named "1433E" but to save
space I drop the quote so it becomes,
Gene,SNP,prot,log10p
YWHAE,13:62129097_C_T,1433E,7.35
YWHAE,4:72617557_T_TA,1433E,7.73
Dear R-devel,
A number of people have observed anecdotally that read.csv is slow for
large number of columns, for example:
https://stackoverflow.com/questions/7327851/read-csv-is-extremely-slow-in-reading-csv-files-with-large-numbers-of-columns
I did a systematic comparison of read.csv with similar
I believe this is documented behavior. The 'read.csv' function is a
front-end to 'read.table' with different default values. IN this
particular case, read.csv sets fill = TRUE, which means that it is
supposed to fill incomplete lines with NA's. It also sets header=TRUE,
which is presumably what
Dear all
I've been using R for around 16 years now and I've only just become aware of a
behaviour of read.csv that I find worrying which is why I'm contacting this
list. A simplified example of the behaviour is as follows
I created a "test.csv" file containing the following lines:
a,b,c,d,e,f,
Ben,
Somewhere on my wish/TO DO list is for someone to rewrite read.table
for
better robustness *and* efficiency ...
Wish granted. New in data.table 1.8.7 :
=
New function fread(), a fast and friendly file reader.
* header, skip, nrows, sep and colClasses are all auto detected.
* inte
G See gmail.com> writes:
>
> When I have a csv file that is more than 6 lines long, not including
> the header, and one of the fields is blank for the last few lines, and
> there is an extra comma on of the lines with the blank field,
> read.csv() makes creates an extra line.
>
> I attached an
When I have a csv file that is more than 6 lines long, not including
the header, and one of the fields is blank for the last few lines, and
there is an extra comma on of the lines with the blank field,
read.csv() makes creates an extra line.
I attached an example file; I'll also paste the contents
Mehmet Suzen mango-solutions.com> writes:
> This might be obvious but I was wondering if anyone knows quick and easy
> way of writing out a CSV file with varying row lengths, ideally an
> initial data read from a CSV file which has the same format. See example
> below.
>
> writeLines(c("A,B,C,D"
This might be obvious but I was wondering if anyone knows quick and easy
way of writing out a CSV file with varying row lengths, ideally an
initial data read from a CSV file which has the same format. See example
below.
I found it quite strange that R cannot write it in one go, so one must
appen
leEncoding="UTF-8",header=FALSE)
(As you'll see, the file does have a byte order mark.)
Regards,
Alex
-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com]
Sent: Wednesday, June 01, 2011 7:35 PM
To: Alexander Peterhansl
Cc: R-devel@r-project.org
Subje
On 01/06/2011 6:00 PM, Alexander Peterhansl wrote:
Dear R-devel List:
read.csv() seems to have changed in R version 2.13.0 as compared to version
2.12.2 when reading in simple CSV files.
Suppose I read in a 2-column CSV file ("test.csv"), say
1, a
2, b
If file is encoded as UTF-8 (on Windows
Dear R-devel List:
read.csv() seems to have changed in R version 2.13.0 as compared to version
2.12.2 when reading in simple CSV files.
Suppose I read in a 2-column CSV file ("test.csv"), say
1, a
2, b
If file is encoded as UTF-8 (on Windows 7), then under R 2.13.0
read.csv("test.csv",fileEnco
Ben Bolker gmail.com> writes:
> On 02/11/2011 03:37 PM, Laurent Gatto wrote:
> > On 11 February 2011 19:39, Ben Bolker gmail.com> wrote:
> >>
> > [snip]
> >>
Bump. Is there any opinion about this from R-core??
Will I be scolded if I submit this as a bug ... ??
> >> What is dangerous/confu
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/11/2011 03:37 PM, Laurent Gatto wrote:
> On 11 February 2011 19:39, Ben Bolker wrote:
>>
> [snip]
>>
>> What is dangerous/confusing is that R silently **wraps** longer lines if
>> fill=TRUE (which is the default for read.csv). I encountered thi
On 11 February 2011 19:39, Ben Bolker wrote:
>
[snip]
>
> What is dangerous/confusing is that R silently **wraps** longer lines if
> fill=TRUE (which is the default for read.csv). I encountered this when
> working with a colleague on a long, messy CSV file that had some phantom
> extra fields in
On 2/11/11 1:39 PM, "Ben Bolker" wrote:
>[snip]
> Original Message
>Subject: read.csv trap
>Date: Fri, 04 Feb 2011 11:16:36 -0500
>From: Ben Bolker
>To: r-de...@stat.math.ethz.ch , David Earn
>
>
>[snip]
>What is dangerous/confusing is that R silently **wraps** longer lines i
Bump.
It's been a week since I posted this to r-devel. Any
thoughts/discussion? Would R-core be irritated if I submitted a bug report?
cheers
Ben
Original Message
Subject: read.csv trap
Date: Fri, 04 Feb 2011 11:16:36 -0500
From: Ben Bolker
To: r-de...@stat.math.
This is not specifically a bug, but an (implicitly/obscurely)
documented behavior of read.csv (or read.table with fill=TRUE) that can
be quite dangerous/confusing for users. I would love to hear some
discussion from other users and/or R-core about this ... As always, I
apologize if I have misse
Full_Name: Eric Goldlust
Version: 2.10.1 (2009-12-14) x86_64-unknown-linux-gnu
OS: Linux 2.6.9-67.0.1.ELsmp x86_64
Submission from: (NULL) (64.22.160.1)
After upgrading to from 2.9.1 to 2.10.1, I get unexpected results when calling
read.csv('/dev/stdin'). These problems go away when I call read
g.russ...@eos-solutions.com wrote:
> Full_Name: George Russell
> Version: 2.10.0
> OS: Microsoft Windows XP Service Pack 2
> Submission from: (NULL) (217.111.3.131)
>
>
> The following code (typed into R --vanilla)
>
> testString <- '"B1\nB2"\n1\n'
> con <- textConnection(testString)
> tab <- re
Full_Name: George Russell
Version: 2.10.0
OS: Microsoft Windows XP Service Pack 2
Submission from: (NULL) (217.111.3.131)
The following code (typed into R --vanilla)
testString <- '"B1\nB2"\n1\n'
con <- textConnection(testString)
tab <- read.csv(con,stringsAsFactors = FALSE)
produces a data fra
I am sorry for not including the attachment mentioned in my
previous email. Attached now. Petr.
--- R-devel/src/library/utils/R/readtable.R 2009-05-18 17:53:08.0
+0200
+++ R-devel-readtable/src/library/utils/R/readtable.R 2009-06-25
10:20:06.0 +0200
@@ -143,9 +143,6 @@
On Sun, Jun 14, 2009 at 02:56:01PM -0400, Gabor Grothendieck wrote:
> If read.csv's colClasses= argument is NOT used then read.csv accepts
> double quoted numerics:
>
> 1: > read.csv(stdin())
> 0: A,B
> 1: "1",1
> 2: "2",2
> 3:
> A B
> 1 1 1
> 2 2 2
>
> However, if colClasses is used then it se
On Sun, Jun 14, 2009 at 09:21:24PM +0100, Ted Harding wrote:
> On 14-Jun-09 18:56:01, Gabor Grothendieck wrote:
> > If read.csv's colClasses= argument is NOT used then read.csv accepts
> > double quoted numerics:
> >
> > 1: > read.csv(stdin())
> > 0: A,B
> > 1: "1",1
> > 2: "2",2
> > 3:
> > A B
On Sun, Jun 14, 2009 at 4:21 PM, Ted
Harding wrote:
> Or am I missing something?!!
The point of this is that the current behavior is not desirable since you can't
have quoted numeric fields if you specify colClasses = "numeric" yet you
can if you don't. The concepts are not orthogonal but should
On 14-Jun-09 18:56:01, Gabor Grothendieck wrote:
> If read.csv's colClasses= argument is NOT used then read.csv accepts
> double quoted numerics:
>
> 1: > read.csv(stdin())
> 0: A,B
> 1: "1",1
> 2: "2",2
> 3:
> A B
> 1 1 1
> 2 2 2
>
> However, if colClasses is used then it seems that it does no
If read.csv's colClasses= argument is NOT used then read.csv accepts
double quoted numerics:
1: > read.csv(stdin())
0: A,B
1: "1",1
2: "2",2
3:
A B
1 1 1
2 2 2
However, if colClasses is used then it seems that it does not:
> read.csv(stdin(), colClasses = "numeric")
0: A,B
1: "1",1
2: "2",2
3:
35 matches
Mail list logo