[R] Coloring leaves in Dendrogram according to gene names
Hello, I am a new R user and have a question regarding dendrogram coloring. I would like to color each leaf in the dendrogram (dhc) according to a specific criterion. For me this criterion is the gene name. For this, I created a data.frame with 2 variables: The gene name and the corresponding color. Using the following function, adapted from "dendrapply {stats}", I still have the same color for the leaves. Using the standard function (with the my_colors[i]), I do have the leaves of different colors !!! The structure of "my_colors[i]" is the same as the structure of "color" !!! Do you have any idea on what is going wrong ? Many thanks for your help, targetGenes<-length(unique(bm$Target)) #Getting the number of unique target genes my_colors<-rainbow(targetGenes) #Creating a vector of colors for each gene gene2color<-data.frame(gene=unique(bm$Target), color=my_colors) #Creating the data.frame dhc<-as.dendrogram(fit.x) #the dendrogram using the hclust object (fit.x) local({ colLab <<- function(n) { if(is.leaf(n)) { a <- attributes(n) crm <- as.numeric(a$label) gene <- as.character(bm[which(bm$CRM_ID==crm),5]) color <- as.character(gene2color[which(gene2color$genes==gene),2]) i <<- i+1 attr(n, "nodePar") <- # c(a$nodePar, list(lab.col = my_colors[i], lab.font= i%%3)) c(a$nodePar, list(lab.col = color)) } n } my_colors <- grDevices::rainbow(attr(dhc,"members")) i <- 0 }) dL <- dendrapply(dhc, colLab) op <- par(mfrow=2:1) plot(dhc) plot(dL)pierre.khoue...@embl.de -- View this message in context: http://n4.nabble.com/Coloring-leaves-in-Dendrogram-according-to-gene-names-tp1838329p1838329.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] comparing real set vs sampled sets
Dear R helper, I have a statistic question. I have a vector of 500 values for which I need to assess the statistical significance of occurrence real.dist <- realValues For that, I sampled from my data large data pool 1000 other vectors of 500 values each. I then run ks.test with my real vec vs each of the sampled vectors. ks.res<-unlist(lapply(l.sampled,function(x){ ks <- ks.test(real.dist, x$dist) as.numeric(ks[["statistic"]]) })) I now have 1000 "D" values with their corresponding p.values. How can I have a general p.value saying that my real data differs from the sampled one, and thus significant ? Any suggestion ? Many thanks, -- View this message in context: http://r.789695.n4.nabble.com/comparing-real-set-vs-sampled-sets-tp4672709.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subselecting on Data frame
Dear all, I have a data frame of features (example pasted below) from which I would like to select, say: how many triplets of features (corresponding to rows) have the same Scaff and the same "Cat" and a score >0.6 and fall in a distance of max 1 (distance defined as Start of row[i+1] - End of row[i]) I've been trying that using selectors and combn in R but it is becoming complicated. Is there an intuitive way to achieve that elegantly ? Many thanks, Best, Scaff Start End Score Cat scaff_234 767099 767299 0.93cat1 scaff_234 790221 790421 0.924 cat1 scaff_234 1341263 1341463 0.845 cat2 scaff_234 1543343 1543543 0.715 cat2 scaff_234 1551844 1552044 0.967 cat1 scaff_234 1560829 1561029 0.825 cat2 scaff_234 1580868 1581068 0.929 cat3 scaff_234 1589612 1589812 0.744 cat3 scaff_234 1597306 1597885 0.864 cat2 scaff_234 1598617 1599091 0.908 cat2 scaff_234 1613500 1613700 0.705 cat2 scaff_234 1614297 1614643 0.748 cat1 scaff_234 1623852 1624052 0.799 cat2 scaff_234 1669873 1670073 0.691 cat2 scaff_234 1670210 1670515 0.904 cat1 scaff_234 1822690 1822890 0.918 cat2 scaff_234 1824905 1825105 0.854 cat2 scaff_234 1826092 1826292 0.95cat2 scaff_234 1855240 1855457 0.962 cat2 scaff_234 1872803 1873106 0.97cat2 scaff_234 1894767 1894967 0.945 cat1 scaff_234 1903338 1903538 0.854 cat3 scaff_234 1920157 1920509 0.739 cat1 scaff_234 1944032 1944232 0.871 cat2 scaff_234 1976753 1976953 0.847 cat2 scaff_234 1992677 1992877 0.694 cat2 scaff_234 2007772 2007972 0.916 cat2 scaff_234 2009638 2010167 0.945 cat2 -- View this message in context: http://r.789695.n4.nabble.com/subselecting-on-Data-frame-tp4672992.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] obtain triplets from Data Frame columns
Hi Guys, I have a list elements in two columns of a data frame. I want first to subselect on V1 and then to form and count all possible and unique triplets of V1 with the corresponding elements in V2 but exclude triplets for which a pair (V1 V2) does not exists: Example input V1 V2 AB AC DE DF D G EF EG F G Example output DEF DEG DFG EFG (ABC is eliminated because the pair B C does not exist in the data frame) Total: 4 triplets Here is what I tried, but was unsuccessful: uniq.V1 <- unique(df.V1) original.pairs <- do.call(paste, c(df[c("V1", "V2")], sep = ":")) nbElements <- 3 l.res<-lapply(uniq.V1, function(x){ set <- c(x, unlist(subset(df$1==x, select=c(V2 if(length(set) >= nbElements){ tmp.combn <- combn(set, nbElements, simplify=FALSE) ## I tried here to create all possible combination of pairs to test with the original pairs and return only the successful ones but it became a very complicated structure } }) Any help/suggestion is appreciated, Best -- View this message in context: http://r.789695.n4.nabble.com/obtain-triplets-from-Data-Frame-columns-tp4673091.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] obtain triplets from Data Frame columns
Hello Jean, Thanks for the reply. However, you solution doesn't reproduce the output that I desire. I updated my post with my solution full of loops. If there is a more fancy/elegant way, I'll take it. Best, -- View this message in context: http://r.789695.n4.nabble.com/obtain-triplets-from-Data-Frame-columns-tp4673091p4673164.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dependent column(s) in data frame
Dear all, I have a data frame with a status column and some condition columns. (a dput of part of it is listed below). I would like to know if: 1) There are more chances to have a "status" of "1" when more than one conditions have the value of "1" ? 2) The "status" column is depending on any one or a combination of the condition columns Say, do I have a status of "1" whenever condition 2 & 3 (or only condition 2) are met ? Do you know what type of analysis one can use to do that ? Thanks in advance, P dput(df) structure(list(status = c(0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L), cond.1 = c(0L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L), cond.2 = c(1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L), cond.3 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L), cond.4 = c(0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L), cond.5 = c(0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L )), .Names = c("status", "cond.1", "cond.2", "cond.3", "cond.4", "cond.5"), row.names = c(NA, -50L), class = "data.frame") -- View this message in context: http://r.789695.n4.nabble.com/dependent-column-s-in-data-frame-tp4685561.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dependent column(s) in data frame
Hi, Thanks for the reply, I will wait a couple of days and eventually post elsewhere unless I find the solution myself. Best. -- View this message in context: http://r.789695.n4.nabble.com/dependent-column-s-in-data-frame-tp4685561p4685633.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dependent column(s) in data frame
Many thanks David, I will have a look on logistic regression for my case. Do you know about a good example regarding logistic regression ? I was thinking also of using Multiple Factor Analysis too (MFA - like in FactoMineR). However I am not sure how successful this is going to be. Best, P. -- View this message in context: http://r.789695.n4.nabble.com/dependent-column-s-in-data-frame-tp4685561p4685684.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dependent column(s) in data frame
Well I first thought of using the MFA by considering my data as categorical but indeed the correspondence analysis methods are better suitable for my data structure, Thanks again, P -- View this message in context: http://r.789695.n4.nabble.com/dependent-column-s-in-data-frame-tp4685561p4685734.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Apply function over elemetns of a list
Hello, I have a list "ll - see below" on which I would like to apply a function accessing every pair of elements in the list. For instance, I want to apply the "sum" function on "6635 + 6636" and return the sum, the on "6635 + 6637", ... Any hint to do that using apply / mapply / rapply ? Thanks, > ll [[1]] [1] "6635" "6636" [[2]] [1] "6635" "6637" [[3]] [1] "6636" "6637" -- View this message in context: http://r.789695.n4.nabble.com/Apply-function-over-elemetns-of-a-list-tp2952538p2952538.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.