I was horrified when I saw John Weinstein's article about Excel turning
gene names into dates. Mainly because I had been complaining about that
phenomenon for years, and it never remotely occurred to me that you could
get a publication out of it.
I eventually rectified the situation by publishing
Gene names being misinterpreted by spreadsheet software (read.csv is
no different) is a classic issue in bioinformatics. It seems like
every practitioner ends up encountering this issue in due time. E.g.
https://pubmed.ncbi.nlm.nih.gov/15214961/
https://genomebiology.biomedcentral.com/articles/10
Tangentially, your code will be more efficient if you add the data
files to a *list* one by one and then apply bind_rows or
do.call(rbind,...) after you have accumulated all of the information
(see chapter 2 of the _R Inferno_). This may or may not be practically
important in your particular
As an aside, the odd format does not seem to bother data.table::fread() which
also happens to be my personally preferred workhorse for these tasks:
> fname <- "/tmp/r/filename.csv"
> read.csv(fname)
Gene SNP prot log10p
1 YWHAE 13:62129097_C_T 1433 7.35
2 YWHAE 4:72617557_T_TA 1
On 16/04/2024 7:36 a.m., Rui Barradas wrote:
Às 11:46 de 16/04/2024, jing hua zhao escreveu:
Dear R-developers,
I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile
to note -- my data involves a protein named "1433E" but to save space I drop
the quote so it
Hum...
This boils down to
> as.numeric("1.23e")
[1] 1.23
> as.numeric("1.23e-")
[1] 1.23
> as.numeric("1.23e+")
[1] 1.23
which in turn comes from this code in src/main/util.c (function R_strtod)
if (*p == 'e' || *p == 'E') {
int expsign = 1;
switch(*++p) {
case '-':
Às 11:46 de 16/04/2024, jing hua zhao escreveu:
Dear R-developers,
I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile
to note -- my data involves a protein named "1433E" but to save space I drop
the quote so it becomes,
Gene,SNP,prot,log10p
YWHAE,13:621290
On 16 April 2024 at 10:46, jing hua zhao wrote:
| Dear R-developers,
|
| I came to a somewhat unexpected behaviour of read.csv() which is trivial but
worthwhile to note -- my data involves a protein named "1433E" but to save
space I drop the quote so it becomes,
|
| Gene,SNP,prot,log10p
| YWH
I believe this is documented behavior. The 'read.csv' function is a
front-end to 'read.table' with different default values. IN this
particular case, read.csv sets fill = TRUE, which means that it is
supposed to fill incomplete lines with NA's. It also sets header=TRUE,
which is presumably what
Dear all
I've been using R for around 16 years now and I've only just become aware of a
behaviour of read.csv that I find worrying which is why I'm contacting this
list. A simplified example of the behaviour is as follows
I created a "test.csv" file containing the following lines:
a,b,c,d,e,f,
Ben,
Somewhere on my wish/TO DO list is for someone to rewrite read.table
for
better robustness *and* efficiency ...
Wish granted. New in data.table 1.8.7 :
=
New function fread(), a fast and friendly file reader.
* header, skip, nrows, sep and colClasses are all auto detected.
* inte
G See gmail.com> writes:
>
> When I have a csv file that is more than 6 lines long, not including
> the header, and one of the fields is blank for the last few lines, and
> there is an extra comma on of the lines with the blank field,
> read.csv() makes creates an extra line.
>
> I attached an
Mehmet Suzen mango-solutions.com> writes:
> This might be obvious but I was wondering if anyone knows quick and easy
> way of writing out a CSV file with varying row lengths, ideally an
> initial data read from a CSV file which has the same format. See example
> below.
>
> writeLines(c("A,B,C,D"
leEncoding="UTF-8",header=FALSE)
(As you'll see, the file does have a byte order mark.)
Regards,
Alex
-Original Message-
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com]
Sent: Wednesday, June 01, 2011 7:35 PM
To: Alexander Peterhansl
Cc: R-devel@r-project.org
Subje
On 01/06/2011 6:00 PM, Alexander Peterhansl wrote:
Dear R-devel List:
read.csv() seems to have changed in R version 2.13.0 as compared to version
2.12.2 when reading in simple CSV files.
Suppose I read in a 2-column CSV file ("test.csv"), say
1, a
2, b
If file is encoded as UTF-8 (on Windows
Ben Bolker gmail.com> writes:
> On 02/11/2011 03:37 PM, Laurent Gatto wrote:
> > On 11 February 2011 19:39, Ben Bolker gmail.com> wrote:
> >>
> > [snip]
> >>
Bump. Is there any opinion about this from R-core??
Will I be scolded if I submit this as a bug ... ??
> >> What is dangerous/confu
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 02/11/2011 03:37 PM, Laurent Gatto wrote:
> On 11 February 2011 19:39, Ben Bolker wrote:
>>
> [snip]
>>
>> What is dangerous/confusing is that R silently **wraps** longer lines if
>> fill=TRUE (which is the default for read.csv). I encountered thi
On 11 February 2011 19:39, Ben Bolker wrote:
>
[snip]
>
> What is dangerous/confusing is that R silently **wraps** longer lines if
> fill=TRUE (which is the default for read.csv). I encountered this when
> working with a colleague on a long, messy CSV file that had some phantom
> extra fields in
On 2/11/11 1:39 PM, "Ben Bolker" wrote:
>[snip]
> Original Message
>Subject: read.csv trap
>Date: Fri, 04 Feb 2011 11:16:36 -0500
>From: Ben Bolker
>To: r-de...@stat.math.ethz.ch , David Earn
>
>
>[snip]
>What is dangerous/confusing is that R silently **wraps** longer lines i
Bump.
It's been a week since I posted this to r-devel. Any
thoughts/discussion? Would R-core be irritated if I submitted a bug report?
cheers
Ben
Original Message
Subject: read.csv trap
Date: Fri, 04 Feb 2011 11:16:36 -0500
From: Ben Bolker
To: r-de...@stat.math.
g.russ...@eos-solutions.com wrote:
> Full_Name: George Russell
> Version: 2.10.0
> OS: Microsoft Windows XP Service Pack 2
> Submission from: (NULL) (217.111.3.131)
>
>
> The following code (typed into R --vanilla)
>
> testString <- '"B1\nB2"\n1\n'
> con <- textConnection(testString)
> tab <- re
I am sorry for not including the attachment mentioned in my
previous email. Attached now. Petr.
--- R-devel/src/library/utils/R/readtable.R 2009-05-18 17:53:08.0
+0200
+++ R-devel-readtable/src/library/utils/R/readtable.R 2009-06-25
10:20:06.0 +0200
@@ -143,9 +143,6 @@
On Sun, Jun 14, 2009 at 02:56:01PM -0400, Gabor Grothendieck wrote:
> If read.csv's colClasses= argument is NOT used then read.csv accepts
> double quoted numerics:
>
> 1: > read.csv(stdin())
> 0: A,B
> 1: "1",1
> 2: "2",2
> 3:
> A B
> 1 1 1
> 2 2 2
>
> However, if colClasses is used then it se
On Sun, Jun 14, 2009 at 09:21:24PM +0100, Ted Harding wrote:
> On 14-Jun-09 18:56:01, Gabor Grothendieck wrote:
> > If read.csv's colClasses= argument is NOT used then read.csv accepts
> > double quoted numerics:
> >
> > 1: > read.csv(stdin())
> > 0: A,B
> > 1: "1",1
> > 2: "2",2
> > 3:
> > A B
On Sun, Jun 14, 2009 at 4:21 PM, Ted
Harding wrote:
> Or am I missing something?!!
The point of this is that the current behavior is not desirable since you can't
have quoted numeric fields if you specify colClasses = "numeric" yet you
can if you don't. The concepts are not orthogonal but should
On 14-Jun-09 18:56:01, Gabor Grothendieck wrote:
> If read.csv's colClasses= argument is NOT used then read.csv accepts
> double quoted numerics:
>
> 1: > read.csv(stdin())
> 0: A,B
> 1: "1",1
> 2: "2",2
> 3:
> A B
> 1 1 1
> 2 2 2
>
> However, if colClasses is used then it seems that it does no
26 matches
Mail list logo