Re: [Rd] read.csv

Reed A. Cartwright Tue, 16 Apr 2024 13:51:33 -0700

Gene names being misinterpreted by spreadsheet software (read.csv is
no different) is a classic issue in bioinformatics. It seems like
every practitioner ends up encountering this issue in due time. E.g.


https://pubmed.ncbi.nlm.nih.gov/15214961/

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7

https://www.nature.com/articles/d41586-021-02211-4

https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates


On Tue, Apr 16, 2024 at 3:46 AM jing hua zhao <[email protected]> wrote:
>
> Dear R-developers,
>
> I came to a somewhat unexpected behaviour of read.csv() which is trivial but 
> worthwhile to note -- my data involves a protein named "1433E" but to save 
> space I drop the quote so it becomes,
>
> Gene,SNP,prot,log10p
> YWHAE,13:62129097_C_T,1433E,7.35
> YWHAE,4:72617557_T_TA,1433E,7.73
>
> Both read.cv() and readr::read_csv() consider prot(ein) name as (possibly 
> confused by scientific notation) numeric 1433 which only alerts me when I 
> tried to combine data,
>
> all_data <- data.frame()
> for (protein in proteins[1:7])
> {
>    cat(protein,":\n")
>    f <- paste0(protein,".csv")
>    if(file.exists(f))
>    {
>      p <- read.csv(f)
>      print(p)
>      if(nrow(p)>0) all_data  <- bind_rows(all_data,p)
>    }
> }
>
> proteins[1:7]
> [1] "1433B" "1433E" "1433F" "1433G" "1433S" "1433T" "1433Z"
>
> dplyr::bind_rows() failed to work due to incompatible types nevertheless 
> rbind() went ahead without warnings.
>
> Best wishes,
>
>
> Jing Hua
>
> ______________________________________________
> [email protected] mailing list
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!YJzURlAK1O3rlvXvq9xl99aUaYL5iKm9gnN5RBi-WJtWa5IEtodN3vaN9pCvRTZA23dZyfrVD7X8nlYUk7S1AK893A$

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] read.csv

Reply via email to