Re: [Rd] read.csv

2024-04-16 Thread Reed A. Cartwright
Gene names being misinterpreted by spreadsheet software (read.csv is no different) is a classic issue in bioinformatics. It seems like every practitioner ends up encountering this issue in due time. E.g. https://pubmed.ncbi.nlm.nih.gov/15214961/ https://genomebiology.biomedcentral.com/articles/10

Re: [Rd] read.csv

2024-04-16 Thread Ben Bolker
Tangentially, your code will be more efficient if you add the data files to a *list* one by one and then apply bind_rows or do.call(rbind,...) after you have accumulated all of the information (see chapter 2 of the _R Inferno_). This may or may not be practically important in your particular

Re: [Rd] read.csv

2024-04-16 Thread Dirk Eddelbuettel
As an aside, the odd format does not seem to bother data.table::fread() which also happens to be my personally preferred workhorse for these tasks: > fname <- "/tmp/r/filename.csv" > read.csv(fname) Gene SNP prot log10p 1 YWHAE 13:62129097_C_T 1433 7.35 2 YWHAE 4:72617557_T_TA 1

Re: [Rd] read.csv

2024-04-16 Thread Duncan Murdoch
On 16/04/2024 7:36 a.m., Rui Barradas wrote: Às 11:46 de 16/04/2024, jing hua zhao escreveu: Dear R-developers, I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it

Re: [Rd] read.csv

2024-04-16 Thread peter dalgaard
Hum... This boils down to > as.numeric("1.23e") [1] 1.23 > as.numeric("1.23e-") [1] 1.23 > as.numeric("1.23e+") [1] 1.23 which in turn comes from this code in src/main/util.c (function R_strtod) if (*p == 'e' || *p == 'E') { int expsign = 1; switch(*++p) { case '-':

Re: [Rd] read.csv

2024-04-16 Thread Rui Barradas
Às 11:46 de 16/04/2024, jing hua zhao escreveu: Dear R-developers, I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it becomes, Gene,SNP,prot,log10p YWHAE,13:621290

Re: [Rd] read.csv

2024-04-16 Thread Dirk Eddelbuettel
On 16 April 2024 at 10:46, jing hua zhao wrote: | Dear R-developers, | | I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it becomes, | | Gene,SNP,prot,log10p | YWH

[Rd] read.csv

2024-04-16 Thread jing hua zhao
Dear R-developers, I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile to note -- my data involves a protein named "1433E" but to save space I drop the quote so it becomes, Gene,SNP,prot,log10p YWHAE,13:62129097_C_T,1433E,7.35 YWHAE,4:72617557_T_TA,1433E,7.73