Hi all, Thanks for the responses. Herve's example is a good small size example of what I wanted.
> y <- c(16, -3, -2, 15, 15, 0, 8, 15, -2) > someCoolFunc(-2, y) [1] 3 9 > someCoolFunc(15, y) [1] 4 5 8 The requirement is that I want someCoolFunc() to run in O(number of matches) time, instead of O(size of y). This is because y is big. And I don't know all the queries I want to do up-front. And the results of some queries might change the queries I want to do in the future. @David: I hope the above description is more clear. @Enrico, Herve: I want both the functionality provided by one function. - On repeated calls, fmatch() does give O(1) performance, but it does not give all matches. - findMatches() gives all matches, but I need to know the entire vector x beforehand. I don't have that luxury. I do have something that works now, using split and fmatch (package fastmatch). So just posting that in case anyone in the future has the same problem. > y.unique <- unique(y) > > # create a map from the unique elements of y to the locations of all > occurrences of the element > y.map <- split(1:length(y), match(y, y.unique)) > > # write a wrapper function that does a look-up on the unique list. and then > returns all matches using the map. > someCoolFunc <- function(x) { y.map[[ fmatch(x, y.unique) ]] } On Tue, 7 Apr 2015 at 13:21 Hervé Pagès <hpa...@fredhutch.org> wrote: > > Hi Keshav, > > findMatches() in the S4Vectors/IRanges packages (Bioconductor) I think > does what you want: > > library(IRanges) > y <- c(16L, -3L, -2L, 15L, 15L, 0L, 8L, 15L, -2L) > x <- c(unique(y), 999L) > hits <- findMatches(x, y) > > Then: > > > hits > Hits object with 9 hits and 0 metadata columns: > queryHits subjectHits > <integer> <integer> > [1] 1 1 > [2] 2 2 > [3] 3 3 > [4] 3 9 > [5] 4 4 > [6] 4 5 > [7] 4 8 > [8] 5 6 > [9] 6 7 > ------- > queryLength: 7 > subjectLength: 9 > > The Hits object can be turned into a list with: > > > as.list(hits) > [[1]] > [1] 1 > > [[2]] > [1] 2 > > [[3]] > [1] 3 9 > > [[4]] > [1] 4 5 8 > > [[5]] > [1] 6 > > [[6]] > [1] 7 > > [[7]] > integer(0) > > H. > > > sessionInfo() > R version 3.2.0 beta (2015-04-05 r68151) > Platform: x86_64-unknown-linux-gnu (64-bit) > Running under: Ubuntu 14.04.2 LTS > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats4 stats graphics grDevices utils datasets > [8] methods base > > other attached packages: > [1] IRanges_2.1.43 S4Vectors_0.5.22 BiocGenerics_0.13.11 > > loaded via a namespace (and not attached): > [1] tools_3.2.0 > > On 04/06/2015 01:56 PM, Keshav Dhandhania wrote: > > Hi, > > > > I know that one can find all occurrences of x in a vector v by doing > >> which(x == v). > > > > However, if I need to do this again and again, where v is remaining the > > same, then this is quite inefficient. In my particular case, I need to do > > this millions of times, and length(v) = 100 million. > > > > Does anyone have suggestion on how to go about it? > > I know of a package called fmatch that does the above for the match > > function. But they don't handle multiple matches. > > > > Thanks > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.