Gene names being misinterpreted by spreadsheet software (read.csv is
no different) is a classic issue in bioinformatics. It seems like
every practitioner ends up encountering this issue in due time. E.g.
https://pubmed.ncbi.nlm.nih.gov/15214961/
https://genomebiology.biomedcentral.com/articles/10
Tangentially, your code will be more efficient if you add the data
files to a *list* one by one and then apply bind_rows or
do.call(rbind,...) after you have accumulated all of the information
(see chapter 2 of the _R Inferno_). This may or may not be practically
important in your particular
As an aside, the odd format does not seem to bother data.table::fread() which
also happens to be my personally preferred workhorse for these tasks:
> fname <- "/tmp/r/filename.csv"
> read.csv(fname)
Gene SNP prot log10p
1 YWHAE 13:62129097_C_T 1433 7.35
2 YWHAE 4:72617557_T_TA 1
On 16/04/2024 7:36 a.m., Rui Barradas wrote:
Às 11:46 de 16/04/2024, jing hua zhao escreveu:
Dear R-developers,
I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile
to note -- my data involves a protein named "1433E" but to save space I drop
the quote so it
Hum...
This boils down to
> as.numeric("1.23e")
[1] 1.23
> as.numeric("1.23e-")
[1] 1.23
> as.numeric("1.23e+")
[1] 1.23
which in turn comes from this code in src/main/util.c (function R_strtod)
if (*p == 'e' || *p == 'E') {
int expsign = 1;
switch(*++p) {
case '-':
Às 11:46 de 16/04/2024, jing hua zhao escreveu:
Dear R-developers,
I came to a somewhat unexpected behaviour of read.csv() which is trivial but worthwhile
to note -- my data involves a protein named "1433E" but to save space I drop
the quote so it becomes,
Gene,SNP,prot,log10p
YWHAE,13:621290
On 16 April 2024 at 10:46, jing hua zhao wrote:
| Dear R-developers,
|
| I came to a somewhat unexpected behaviour of read.csv() which is trivial but
worthwhile to note -- my data involves a protein named "1433E" but to save
space I drop the quote so it becomes,
|
| Gene,SNP,prot,log10p
| YWH
Dear R-developers,
I came to a somewhat unexpected behaviour of read.csv() which is trivial but
worthwhile to note -- my data involves a protein named "1433E" but to save
space I drop the quote so it becomes,
Gene,SNP,prot,log10p
YWHAE,13:62129097_C_T,1433E,7.35
YWHAE,4:72617557_T_TA,1433E,7.73