Gene names being misinterpreted by spreadsheet software (read.csv is no different) is a classic issue in bioinformatics. It seems like every practitioner ends up encountering this issue in due time. E.g.
https://pubmed.ncbi.nlm.nih.gov/15214961/ https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7 https://www.nature.com/articles/d41586-021-02211-4 https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates On Tue, Apr 16, 2024 at 3:46 AM jing hua zhao <jinghuaz...@hotmail.com> wrote: > > Dear R-developers, > > I came to a somewhat unexpected behaviour of read.csv() which is trivial but > worthwhile to note -- my data involves a protein named "1433E" but to save > space I drop the quote so it becomes, > > Gene,SNP,prot,log10p > YWHAE,13:62129097_C_T,1433E,7.35 > YWHAE,4:72617557_T_TA,1433E,7.73 > > Both read.cv() and readr::read_csv() consider prot(ein) name as (possibly > confused by scientific notation) numeric 1433 which only alerts me when I > tried to combine data, > > all_data <- data.frame() > for (protein in proteins[1:7]) > { > cat(protein,":\n") > f <- paste0(protein,".csv") > if(file.exists(f)) > { > p <- read.csv(f) > print(p) > if(nrow(p)>0) all_data <- bind_rows(all_data,p) > } > } > > proteins[1:7] > [1] "1433B" "1433E" "1433F" "1433G" "1433S" "1433T" "1433Z" > > dplyr::bind_rows() failed to work due to incompatible types nevertheless > rbind() went ahead without warnings. > > Best wishes, > > > Jing Hua > > ______________________________________________ > R-devel@r-project.org mailing list > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!YJzURlAK1O3rlvXvq9xl99aUaYL5iKm9gnN5RBi-WJtWa5IEtodN3vaN9pCvRTZA23dZyfrVD7X8nlYUk7S1AK893A$ ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel