One place to start is to use Rprof to see where time is being spent. I used the sample you sent and this is what I got:
0 16.7 root 1. 16.2 system.time 2. . 16.1 testfoo 3. . . 16.1 setdiff 4. . . . 8.2 as.vector 5. . . . . 8.2 findSubsets 6. . . . . . 6.4 increment 7. . . . . . . 4.2 as.vector 8. . . . . . . . 3.6 outer 9. . . . . . . . . 0.3 rep.int 7. . . . . . . 1.6 c 7. . . . . . . 0.2 max 4. . . . 7.9 unique 5. . . . . 7.3 match 5. . . . . 0.3 unique.default 1. 0.5 sort 2. . 0.5 standardGeneric 3. . . 0.3 sample 3. . . 0.2 sort 4. . . . 0.2 sort.default 5. . . . . 0.2 sort.int Of the 16.7 seconds to execute the code, 16.1 was taken up in 'setdiff'. Maybe there is some other way you can determine the difference. So if you continue to use 'setdiff', it does not look like there is much that can be done. On Wed, Jun 27, 2012 at 10:36 AM, Adrian Duşa <dusa.adr...@gmail.com> wrote: > Dear R-help list, > > Part of a program I wrote seem to take a significant amount of time, > therefore I am looking for an alternative approach. > In order to explain what is does: > > - the input is a sorted vector of integer numbers > - some higher numbers may be derived (using a mathematical formula) > from lower numbers, therefore they should be eliminated > - at the end, the vector should contain only uniquely defined numbers > > Pet hypothetical example, input vector: > - 2 3 4 5 6 7 8 9 10 > - number 2 generates 4, 7, 10 > - 2 3 5 6 8 9 (surviving vector) > - number 3 generates 5 and 9 > - 2 3 6 8 (surviving vector) > - number 6 generates 8 > - final surviving vector 2 3 6 > > Function foo(x, ...) generates the numbers, my current approach being: > #### > index <- 0 > while ((index <- index + 1) < length(numbers)) { > numbers <- setdiff(numbers, foo(numbers[index])) > } > #### > > This seem to take quite some time (but I don't know any other way of > doing it), hence my question(s): > - would there be another (quicker) implementation in R? > - alternatively, should I go for a C implementation? > > (actually, I did create a C implementation, but it doesn't bring any > more speed... it is actually a bit slower). > > A real-life pet example, using the function findSubsets() from the QCA > package (our foo function above): > > #### > library(QCA) > testfoo <- function(x, y) { > index <- 0 > while((index <- index + 1) < length(x)) { > x <- setdiff(x, findSubsets(y, x[index], max(x))) > } > return(x) > } > > nofl <- rep(3, 14) > set.seed(12345) > numbers <- sort(sample(seq(prod(nofl)), 1000000)) > > system.time(result <- testfoo(numbers, nofl)) > #### > user system elapsed > 8.168 2.049 10.148 > > Any hint will be highly appreciated, thanks in advance, > Adrian > > -- > Adrian Dusa > Romanian Social Data Archive > 1, Schitu Magureanu Bd. > 050025 Bucharest sector 5 > Romania > Tel.:+40 21 3126618 \ > +40 21 3120210 / int.101 > Fax: +40 21 3158391 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.