Re: [R] a problem of approach

jim holtman Wed, 27 Jun 2012 09:30:06 -0700

One place to start is to use Rprof to see where time is being spent.
I used the sample you sent and this is what I got:



  0  16.7 root
  1.   16.2 system.time
  2. .   16.1 testfoo
  3. . .   16.1 setdiff
  4. . . .    8.2 as.vector
  5. . . . .    8.2 findSubsets
  6. . . . . .    6.4 increment
  7. . . . . . .    4.2 as.vector
  8. . . . . . . .    3.6 outer
  9. . . . . . . . .    0.3 rep.int
  7. . . . . . .    1.6 c
  7. . . . . . .    0.2 max
  4. . . .    7.9 unique
  5. . . . .    7.3 match
  5. . . . .    0.3 unique.default
  1.    0.5 sort
  2. .    0.5 standardGeneric
  3. . .    0.3 sample
  3. . .    0.2 sort
  4. . . .    0.2 sort.default
  5. . . . .    0.2 sort.int

Of the 16.7 seconds to execute the code, 16.1 was taken up in
'setdiff'.  Maybe there is some other way you can determine the
difference.  So if you continue to use 'setdiff', it does not look
like there is much that can be done.


On Wed, Jun 27, 2012 at 10:36 AM, Adrian Duşa <dusa.adr...@gmail.com> wrote:
> Dear R-help list,
>
> Part of a program I wrote seem to take a significant amount of time,
> therefore I am looking for an alternative approach.
> In order to explain what is does:
>
> - the input is a sorted vector of integer numbers
> - some higher numbers may be derived (using a mathematical formula)
> from lower numbers, therefore they should be eliminated
> - at the end, the vector should contain only uniquely defined numbers
>
> Pet hypothetical example, input vector:
> - 2 3 4 5 6 7 8 9 10
> - number 2 generates 4, 7, 10
> - 2 3 5 6 8 9 (surviving vector)
> - number 3 generates 5 and 9
> - 2 3 6 8 (surviving vector)
> - number 6 generates 8
> - final surviving vector 2 3 6
>
> Function foo(x, ...) generates the numbers, my current approach being:
> ####
> index <- 0
> while ((index <- index + 1) < length(numbers)) {
>    numbers <- setdiff(numbers, foo(numbers[index]))
> }
> ####
>
> This seem to take quite some time (but I don't know any other way of
> doing it), hence my question(s):
> - would there be another (quicker) implementation in R?
> - alternatively, should I go for a C implementation?
>
> (actually, I did create a C implementation, but it doesn't bring any
> more speed... it is actually a bit slower).
>
> A real-life pet example, using the function findSubsets() from the QCA
> package (our foo function above):
>
> ####
> library(QCA)
> testfoo <- function(x, y) {
>    index <- 0
>    while((index <- index + 1) < length(x)) {
>        x <- setdiff(x, findSubsets(y, x[index], max(x)))
>    }
>    return(x)
> }
>
> nofl <- rep(3, 14)
> set.seed(12345)
> numbers <- sort(sample(seq(prod(nofl)), 1000000))
>
> system.time(result <- testfoo(numbers, nofl))
> ####
>   user  system elapsed
>  8.168   2.049  10.148
>
> Any hint will be highly appreciated, thanks in advance,
> Adrian
>
> --
> Adrian Dusa
> Romanian Social Data Archive
> 1, Schitu Magureanu Bd.
> 050025 Bucharest sector 5
> Romania
> Tel.:+40 21 3126618 \
>        +40 21 3120210 / int.101
> Fax: +40 21 3158391
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a problem of approach

Reply via email to