Dear all, I have a data frame of features (example pasted below) from which I would like to select, say:
how many triplets of features (corresponding to rows) have the same Scaff and the same "Cat" and a score >0.6 and fall in a distance of max 10000 (distance defined as Start of row[i+1] - End of row[i]) I've been trying that using selectors and combn in R but it is becoming complicated. Is there an intuitive way to achieve that elegantly ? Many thanks, Best, Scaff Start End Score Cat scaff_234 767099 767299 0.93 cat1 scaff_234 790221 790421 0.924 cat1 scaff_234 1341263 1341463 0.845 cat2 scaff_234 1543343 1543543 0.715 cat2 scaff_234 1551844 1552044 0.967 cat1 scaff_234 1560829 1561029 0.825 cat2 scaff_234 1580868 1581068 0.929 cat3 scaff_234 1589612 1589812 0.744 cat3 scaff_234 1597306 1597885 0.864 cat2 scaff_234 1598617 1599091 0.908 cat2 scaff_234 1613500 1613700 0.705 cat2 scaff_234 1614297 1614643 0.748 cat1 scaff_234 1623852 1624052 0.799 cat2 scaff_234 1669873 1670073 0.691 cat2 scaff_234 1670210 1670515 0.904 cat1 scaff_234 1822690 1822890 0.918 cat2 scaff_234 1824905 1825105 0.854 cat2 scaff_234 1826092 1826292 0.95 cat2 scaff_234 1855240 1855457 0.962 cat2 scaff_234 1872803 1873106 0.97 cat2 scaff_234 1894767 1894967 0.945 cat1 scaff_234 1903338 1903538 0.854 cat3 scaff_234 1920157 1920509 0.739 cat1 scaff_234 1944032 1944232 0.871 cat2 scaff_234 1976753 1976953 0.847 cat2 scaff_234 1992677 1992877 0.694 cat2 scaff_234 2007772 2007972 0.916 cat2 scaff_234 2009638 2010167 0.945 cat2 -- View this message in context: http://r.789695.n4.nabble.com/subselecting-on-Data-frame-tp4672992.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.