Thanks for the info, Jim. - GV
On Tue, Jan 13, 2009 at 12:27 PM, jim holtman <jholt...@gmail.com> wrote: > Is this fast enough for you; matches of 2000 against 2M tags takes 0.2 > seconds: > >> str(x) > chr [1:2000] "EAEDC" "DACCD" "BEAAD" "CDDDA" "ABDCA" "ACACC" "DADAA" > "ABCAD" ... >> str(z) > chr [1:2000000] "EAEDC" "DACCD" "BEAAD" "CDDDA" "ABDCA" "ACACC" > "DADAA" "ABCAD" ... >> system.time(y <- match(x,z)) > user system elapsed > 0.2 0.0 0.2 >> str(y) > int [1:2000] 1 2 3 4 5 6 7 8 9 10 ... >> > > > > On Mon, Jan 12, 2009 at 10:17 PM, Gundala Viswanath <gunda...@gmail.com> > wrote: >> Yes Jim, exactly. >> >> BTW, I found from ?match >> >> " Matching for lists is potentially very slow and best avoided >> except in simple cases." >> >> Since I am doing this for million of tags. Is there a faster alternatives? >> >> >> - Gundala Viswanath >> Jakarta - Indonesia >> >> >> >> On Tue, Jan 13, 2009 at 12:14 PM, jim holtman <jholt...@gmail.com> wrote: >>> Is this what you want: >>> >>>> repo <- c("AAA", "AAT", "AAC", "AAG", "ATA","ATT") >>>> qr <- c("AAC", "ATT", "ATT","AAC", "ATT", "ATT", "AAT", "ATT", "ATT") >>>> match(qr, repo) >>> [1] 3 6 6 3 6 6 2 6 6 >>>> >>> >>> >>> >>> On Mon, Jan 12, 2009 at 9:22 PM, Gundala Viswanath <gunda...@gmail.com> >>> wrote: >>>> Hi Jorge and all, >>>> >>>> How can I modified your code when >>>> >>>> query size can be bigger than repository, >>>> meaning that it can contain repeats. >>>> >>>> e.g. qr <- c("AAC", "ATT", "ATT","AAC", "ATT", "ATT", "AAT", "ATT", "ATT", >>>> ) >>>> >>>> >>>> Sorry, I should have mentioned this earlier. >>>> >>>> >>>> - Gundala Viswanath >>>> Jakarta - Indonesia >>>> >>>> >>>> >>>> On Tue, Jan 13, 2009 at 11:11 AM, Jorge Ivan Velez >>>> <jorgeivanve...@gmail.com> wrote: >>>>> >>>>> Perhaps >>>>> which(repo%in%qr) >>>>> ? >>>>> HTH, >>>>> >>>>> Jorge >>>>> >>>>> >>>>> On Mon, Jan 12, 2009 at 9:07 PM, Gundala Viswanath <gunda...@gmail.com> >>>>> wrote: >>>>>> >>>>>> Dear all, >>>>>> >>>>>> Suppose I have the following vector as repository: >>>>>> >>>>>> > repo <- c("AAA", "AAT", "AAC", "AAG", "ATA","ATT") >>>>>> >>>>>> Given another query vector >>>>>> >>>>>> > qr <- c("AAC", "ATT") >>>>>> >>>>>> is there a way I can find the query index in repository in a fast way. >>>>>> >>>>>> Giving: >>>>>> >>>>>> [1] 3 6 >>>>>> >>>>>> Typically the size of repo is around ~12million element, and >>>>>> query around ~1 million element. >>>>>> >>>>>> >>>>>> - Gundala Viswanath >>>>>> Jakarta - Indonesia >>>>>> >>>>>> ______________________________________________ >>>>>> R-help@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>> >>> -- >>> Jim Holtman >>> Cincinnati, OH >>> +1 513 646 9390 >>> >>> What is the problem that you are trying to solve? >>> >> > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.