Dear all, Sarah,

ok, I start to give some details.
I use version 13 of R in OSX (downloaded and installed less than 1 year ago).
I pasted below the results of the str() command.
The data frame looks perfect for me.
I generated the dataframe using the following commands:

rog=read.table("/Users/andreaf/biodata/roger/data/roger_uuk_offt_seqs.tsv",
header=TRUE, sep="\t")
rog2=add_median_column(rog,"Gene_ID","Score")
rog3=add_count_column(rog2,"Gene_ID", "siRNAcount")
rog4= add_mean_column(rog3,"Gene_ID","CellNumber", "MeanCellNumber")
gn=read.table("/Users/andreaf/biodata/data/gene_names_desc.tsv",
header=TRUE, sep="\t")
rog5=merge(rog4,gn, by.x="Gene_ID", by.y="Gene_ID", all.x=TRUE, all.y=FALSE)
write.table(rog5, file =
"/Users/andreaf/biodata/roger/data/roger_uuk_july2011.tsv",
quote=FALSE, sep="\t",  row.names = FALSE)


str(rog5)
'data.frame':   61869 obs. of  14 variables:
 $ Gene_ID       : int  1 2 2 2 2 9 9 9 9 10 ...
 $ Score         : num  1.63 0.892 0.473 1.274 1.585 ...
 $ CellNumber    : int  1085 4031 882 1705 3876 3932 3309 4461 2906 2705 ...
 $ siRNAid       : Factor w/ 61869 levels
"SI00000007","SI00000035",..: 57157 52189 52188 1 58783 36041 36040
36038 36039 36044 ...
 $ offt          : int  553 107 712 516 93 1245 1240 1080 673 711 ...
 $ RSArank       : int  18334 14751 14751 14751 14751 7209 7209 7209
7209 3723 ...
 $ sequence      : Factor w/ 61869 levels "AAACAACACAACCAUAUCGAG",..:
20497 30102 9841 11752 20305 5537 58794 16241 19223 44336 ...
 $ seed6         : Factor w/ 3129 levels "AAACAA","AAACAC",..: 976
2267 241 446 947 37 3032 702 878 2711 ...
 $ seed7         : Factor w/ 8947 levels "AAACAAA","AAACAAC",..: 3071
5933 819 1369 2998 134 8641 2196 2768 7486 ...
 $ Median        : num  1.63 1.08 1.08 1.08 1.08 ...
 $ siRNAcount    : num  1 4 4 4 4 4 4 4 4 4 ...
 $ MeanCellNumber: num  1085 2624 2624 2624 2624 ...
 $ GeneName      : Factor w/ 25068 levels "1/2-SBSRNA4",..: 3 5 5 5 5
17324 17324 17324 17324 17326 ...
 $ GeneDesc      : Factor w/ 23388 levels "1-acylglycerol-3-phosphate
O-acyltransferase 1 (lysophosphatidic acid acyltransferase,
alpha)",..: 622 627 627 627 627 14313 14313 14313 14313 14315 ...






The problem arise when I try to write that data frame on the file
(i.e. the last command above... write.table  ).
If I open the text file that it generates I found the following lines
(pasted below).
The first lines are OK (i.e. 14 columns, like the dataframe), while at
a certain point I get lines with only 3 columns !!!
The bad lines that contain only 3 columns have the name and the
description of the gene (i.e. the content of the file that I merged
with).
Besides, these strange lines also get repeated (see the bottom).
Why do I get also those strange lines in the file ?  and apparently
NOT in the dataframe, that instead looks perfect ?




990.33333333333 HNRNPF  heterogeneous nuclear ribonucleoprotein F
3185    0.844752        601     SI02651824      816     12598   
UUGAACACCUCAAUGUACCGG   UGAACA  UGAACAC 0.844752        1990.33333333333        
HNRNPF  heterogeneous
nuclear ribonucleoprotein F
3185    1.3576397       1508    SI02651838      586     12598   
UUUACUCAUUAUCACAUGCUA   UUACUC  UUACUCA 0.844752        1990.33333333333        
HNRNPF  heterogeneous
nuclear ribonucleoprotein F
3187    0.7738942       2001    SI00439831      1368    12620   
UUUAACAUAAUUCAACUGCUU   UUAACA  UUAACAU 0.7738942       1522.33333333333        
HNRNPH1 heterogeneous
nuclear ribonucleoprotein H1 (H)
3187    1.179532        790     SI00439845      913     12620   
UUAAGUUUAACAGUUAUAGUU   UAAGUU  UAAGUUU 0.7738942       1522.33333333333        
HNRNPH1 heterogeneous
nuclear ribonucleoprotein H1 (H)
3187    0.6908423       1776    SI02654799      1244    12620   
UUAAAGAUUUCAAUAUACCUG   UAAAGA  UAAAGAU 0.7738942       1522.33333333333        
HNRNPH1 heterogeneous
nuclear ribonucleoprotein H1 (H)
3188    0.6514164       915     SI00439866      1444    3319    
UUAACAAACAUGCCAAAUGUU   UAACAA  UAACAAA 0.69809605      1541    HNRNPH2 
heterogeneous
nuclear ribonucleoprotein H2 (H)
3189    HNRNPH3 heterogeneous nuclear ribonucleoprotein H3 (2H9)
3190    HNRNPK  heterogeneous nuclear ribonucleoprotein K
3191    HNRNPL  heterogeneous nuclear ribonucleoprotein L
3192    HNRNPU  heterogeneous nuclear ribonucleoprotein U (scaffold
attachment factor A)
3193    HOAC    hypoacusis 2 (autosomal recessive)
3195    TLX1    T-cell leukemia homeobox 1
3196    TLX2    T-cell leukemia homeobox 2
3197    HOXA@   homeobox A cluster
3198    HOXA1   homeobox A1
3199    HOXA2   homeobox A2
3200    HOXA3   homeobox A3
3201    HOXA4   homeobox A4
3202    HOXA5   homeobox A5
3203    HOXA6   homeobox A6
3204    HOXA7   homeobox A7
3205    HOXA9   homeobox A9
3206    HOXA10  homeobox A10
3207    HOXA11  homeobox A11
3208    HPCA    hippocalcin
3209    HOXA13  homeobox A13
3210    HOXB@   homeobox B cluster
3211    HOXB1   homeobox B1
3212    HOXB2   homeobox B2
3213    HOXB3   homeobox B3
3214    HOXB4   homeobox B4
3215    HOXB5   homeobox B5
3216    HOXB6   homeobox B6
3217    HOXB7   homeobox B7
3218    HOXB8   homeobox B8
3219    HOXB9   homeobox B9
3220    HOXC@   homeobox C cluster
3221    HOXC4   homeobox C4
3222    HOXC5   homeobox C5
3223    HOXC6   homeobox C6
3224    HOXC8   homeobox C8
3225    HOXC9   homeobox C9
3226    HOXC10  homeobox C10
3227    HOXC11  homeobox C11
3228    HOXC12  homeobox C12
3229    HOXC13  homeobox C13
3230    HOXD@   homeobox D cluster
3231    HOXD1   homeobox D1
3232    HOXD3   homeobox D3
3233    HOXD4   homeobox D4
3234    HOXD8   homeobox D8
3235    HOXD9   homeobox D9
3236    HOXD10  homeobox D10
3237    HOXD11  homeobox D11
3238    HOXD12  homeobox D12
3239    HOXD13  homeobox D13
3240    HP      haptoglobin
3241    HPCAL1  hippocalcin-like 1
3242    HPD     4-hydroxyphenylpyruvate dioxygenase
3244    HPE1    holoprosencephaly 1, alobar
3247    HPFH2   hereditary persistence of fetal hemoglobin, heterocellular,
Indian type
3248    HPGD    hydroxyprostaglandin dehydrogenase 15-(NAD)
3249    HPN     hepsin
3250    HPR     haptoglobin-related protein
3251    HPRT1   hypoxanthine phosphoribosyltransferase 1
3254    HPRTP2  hypoxanthine phosphoribosyltransferase pseudogene 2
3255    HPRTP3  hypoxanthine phosphoribosyltransferase pseudogene 3
3257    HPS1    Hermansky-Pudlak syndrome 1
3258    HPT     hypoparathyroidism
3259    HPV6AI1 human papillomavirus (type 6a) integration site 1
3260    HPV18I1 human papilloma virus (type 18) integration site 1
3261    HPV18I2 human papillomavirus (type 18) integration site 2
3262    HPVC1   human papillomavirus (type 18) E5 central sequence-like 1
3263    HPX     hemopexin
3265    HRAS    v-Ha-ras Harvey rat sarcoma viral oncogene homolog
3266    ERAS    ES cell expressed Ras
3267    AGFG1   ArfGAP with FG repeats 1
3268    AGFG2   ArfGAP with FG repeats 2
3269    HRH1    histamine receptor H1
3270    HRC     histidine rich calcium binding protein
3272    HRES1   HTLV-1 related endogenous sequence
3273    HRG     histidine-rich glycoprotein
3274    HRH2    histamine receptor H2
3275    PRMT2   protein arginine methyltransferase 2
3276    PRMT1   protein arginine methyltransferase 1
3278    HRPT1   hyperparathyroidism 1
3280    HES1    hairy and enhancer of split 1, (Drosophila)
3281    HSBP1   heat shock factor binding protein 1
3283    HSD3B1  hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid
delta-isomerase 1
3284    HSD3B2  hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid
delta-isomerase 2
3290    HSD11B1 hydroxysteroid (11-beta) dehydrogenase 1
3291    HSD11B2 hydroxysteroid (11-beta) dehydrogenase 2
3292    HSD17B1 hydroxysteroid (17-beta) dehydrogenase 1
3293    HSD17B3 hydroxysteroid (17-beta) dehydrogenase 3
3294    HSD17B2 hydroxysteroid (17-beta) dehydrogenase 2
3295    HSD17B4 hydroxysteroid (17-beta) dehydrogenase 4
3297    HSF1    heat shock transcription factor 1
3298    HSF2    heat shock transcription factor 2
3299    HSF4    heat shock transcription factor 4
3300    DNAJB2  DnaJ (Hsp40) homolog, subfamily B, member 2
3301    DNAJA1  DnaJ (Hsp40) homolog, subfamily A, member 1
3303    HSPA1A  heat shock 70kDa protein 1A
3304    HSPA1B  heat shock 70kDa protein 1B
3305    HSPA1L  heat shock 70kDa protein 1-like
3306    HSPA2   heat shock 70kDa protein 2
3308    HSPA4   heat shock 70kDa protein 4
3309    HSPA5   heat shock 70kDa protein 5 (glucose-regulated protein, 78kDa)
3310    HSPA6   heat shock 70kDa protein 6 (HSP70B)
3188    0.7339296       1242    SI00439852      425     3319    
UGUAGCUCUAACGAUACCGGG   GUAGCU  GUAGCUC 0.69809605      1541    HNRNPH2 
heterogeneous
nuclear ribonucleoprotein H2 (H)
3189    HNRNPH3 heterogeneous nuclear ribonucleoprotein H3 (2H9)
3190    HNRNPK  heterogeneous nuclear ribonucleoprotein K
3191    HNRNPL  heterogeneous nuclear ribonucleoprotein L
3192    HNRNPU  heterogeneous nuclear ribonucleoprotein U (scaffold
attachment factor A)
3193    HOAC    hypoacusis 2 (autosomal recessive)
3195    TLX1    T-cell leukemia homeobox 1
3196    TLX2    T-cell leukemia homeobox 2
3197    HOXA@   homeobox A cluster
3198    HOXA1   homeobox A1
3199    HOXA2   homeobox A2
3200    HOXA3   homeobox A3
3201    HOXA4   homeobox A4


Thankyou very much,
Best Regards,
Andrea






On Mon, Jul 18, 2011 at 2:45 PM, Sarah Goslee <sarah.gos...@gmail.com> wrote:
> Hi Andrea,
>
> On Mon, Jul 18, 2011 at 6:07 AM, Andrea Franceschini <ata...@gmail.com> wrote:
>> Dear all,
>>
>> I merged 2 data frames using the merged command and the resulting data frame
>> looks perfect into R.
>>
>> However, I have serious problems when I try to write this new data frame
>> into a file using the write.table command.
>>
>> Basically I get parts of the second file that I merged into the file.
>>
>> What could it be the problem ?
>
> I have no idea. What did you do? What does your new data frame look like?
> What commands did you issue? What result did you get? What result did
> you expect? Can you provide a small reproducible example, or at least
> a great deal more information? str() is a good start.
>
> Sarah
>>
>> Thankyou very much,
>> Best Regards,
>> Andrea
>>
>> --
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to