Hi, I have been trying unsuccessfully to plot data using different colors based on a variable within a subset of an imported file. The file I am reading is about 20000 lines long and has a column (in the example called FILE) that contains approximately 100 unique entries. I would like to plot a subset of the data from the file and key the color from the FILE column, This is what my file looks like : CHR SNP BP NMISS BETA SE R2 T P REGION FILE RANDOM 1 rs17035189 10519610 135 0.3518 1.928 0.0002501 0.1824 0.8555 TCTX 4730341 0.284627081 6 rs3763311 32484154 109 -2.05 1.624 0.01467 -1.262 0.2096 TCTX 670603 0.083147673 6 rs3892710 32790839 106 0.5695 4.743 0.0001386 0.1201 0.9047 TCTX 7150403 0.549192815 6 rs3864300 32379785 102 9.208 6.416 0.02018 1.435 0.1544 TCTX 7210017 0.837265988 6 rs6912002 32873245 13 -1.295 5.043 0.005963 -0.2569 0.802 TCTX 2710441 0.170566699 5 rs4024109 35955374 9 26.19 31.01 0.09245 0.8444 0.4263 TCTX 2650653 0.298573497 6 rs3129719 32769757 16 10.35 7.44 0.1215 1.391 0.1859 TCTX 2900504 0.378538235 6 rs476885 32402690 109 -0.09378 1.552 3.411e-05 -0.06041 0.9519 TCTX 670603 0.017970964 10 rs12570766 5602540 139 0.6182 6.66 6.289e-05 0.09283 0.9262 TCTX 4560767 0.004973939 etc
And this is the code that I have: assoc_data <- read.table("master.out", header =TRUE) par(fig=c(0, 10, 0, 10 )/10, mar=c(10,8,2,8),xpd=NA, cex.axis=2) attach(assoc_data) curr_assoc <- assoc_data[CHR == 1 & BP > 500000 & BP < 1000000, ] #these criteria change based on input from another file #count the number of transcripts transcripts <- length(unique(curr_assoc$FILE)) #generate that number of unique ³FILE² entries in my subset of data my_colors <- rainbow(transcripts) plot(curr_assoc$BP, log10(curr_assoc$P)*-1, pch=20, col=c(my_colors)[curr_assoc$FILE], ylim=c(-15, 15),xaxs="i", xlab=NA, cex=0.7, cex.lab=2) detach(assoc_data) The problem is that when I plot this I only see (for example) 2 colors instead of the expected 10. I believe that the problem I am having is that the FILE column is being recoded when I read the table (as a factor?) and that only factors within the range of my colors are being plotted (so if I have 10 colors but there are 100 unique entries in my FILE column, and the variables recoded 2, 7, 12, 34, 60, 64, 65, 70 and 71 are used, only 2 and 7 will be plotted). Many thanks for any suggestions/pointers, I have dug around in the help archives for a couple of hours but no joy. ----------------------- Andrew Singleton [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.