Thanks Charles. That must be it. (Berwin also noticed this.) When convolve hit the wall, I switched over to FFTW in C. That is actually pretty nice code which runs in n log(n) even for prime n and takes special account of factors of n up to about 19 or so. So if the R team ever wants to put in a new FFT, that looks like a good one.
But I think easier fix for me, or others in this boat, would just be to write a new convolve(x,y) that pads x and y with zeros out to length 2*max( length(x), length(y) ). Then if x and y have very composite lengths, especially powers of 2, the fft should go quickly. -Art Quoting "Charles C. Berry" <[EMAIL PROTECTED]>: > On Tue, 18 Dec 2007, Art Owen wrote: > >> Dear R-ophiles, >> >> I've found something very odd when I apply convolve >> to ever larger vectors. Here is an example below >> with vectors ranging from 2^11 to 2^17. There is >> a funny bump up at 2^12. Then it gets very slow at 2^16. > > The time is consumed by fft, which is called with vectors of length > 2*2^i-1 when type = 'o' > >> system.time( fft(seq_len(2^13-1)) ) > user system elapsed > 0.31 0.00 0.32 >> system.time( fft(seq_len(2^14-1)) ) > user system elapsed > 0 0 0 >> > > > There are no factors of 2^13-1 or 2^17-1 or 2^18-1 > >> for (i in 11:20 ) print( c(index=i, nfact=length(which( 0 == > (2^(i+1)-1)%%(2:trunc(sqrt(2^(i+1)-1)) ))))) > index nfact > 11 11 > index nfact > 12 0 > index nfact > 13 3 > index nfact > 14 3 > index nfact > 15 7 > index nfact > 16 0 > index nfact > 17 15 > index nfact > 18 0 > index nfact > 19 23 > index nfact > 20 5 >> > > It looks like the code in fft.c tries to find factors of the series > length and works from there. > > So, the size of the problem evidently depends on its factors. > > HTH, > > Chuck > > >> >> >>> for( i in 11:20 )print( system.time(convolve(1:2^i,1:2^i,type="o"))) >> user system elapsed >> 0.002 0.000 0.002 >> user system elapsed >> 0.373 0.002 0.375 >> user system elapsed >> 0.014 0.001 0.016 >> user system elapsed >> 0.031 0.002 0.034 >> user system elapsed >> 0.126 0.004 0.130 >> user system elapsed >> 194.095 0.013 194.185 >> user system elapsed >> 0.345 0.011 0.356 >> >> This example is run on a fedora machine with 64 bits. I hit the same >> wall at 2^16 on a Macbook (Intel processor I think). The fedora machine >> is running R 2.5.0. That's a bit old (April 07) but I saw no mention of >> this speed >> problem in some web searching, and it's not mentioned in the 2.6 >> what's new notes. >> >> I've rerun it and found the same bump at 12 and wall at 16. >> The timing at 2^16 can change appreciably. In one >> other case it was about 270 user, 271 elapsed. >> The 2^18 case ran for hours without ever finishing. >> >> At first I thought that this was a memory latency issue. Maybe it >> is. But that makes it hard to explain why 2^17 works better than >> 2^16. I've seen that three times now, so I'm almost ready to call it >> reproducible. >> Also, one of the machines I'm using has lots of memory. Maybe it's >> a cache issue ... but that still does not explain why 2^12 is slower >> than 2^13 or 2^16 is slower than 2^17. >> >> I've checked by running convolve without type="o" and I don't >> see the wall. Similarly fft does not have that problem. >> >> Here's an example without type="open" >>> for( k in 11:20)print(system.time( convolve( 1:2^k,1:2^k))) >> user system elapsed >> 0.001 0.000 0.000 >> user system elapsed >> 0.001 0.000 0.001 >> user system elapsed >> 0.002 0.000 0.002 >> user system elapsed >> 0.004 0.000 0.004 >> user system elapsed >> 0.009 0.001 0.010 >> user system elapsed >> 0.017 0.001 0.018 >> user system elapsed >> 0.138 0.005 0.143 >> user system elapsed >> 0.368 0.012 0.389 >> user system elapsed >> 1.010 0.032 1.051 >> user system elapsed >> 1.945 0.069 2.015 >> >> This is more what I expected. Something like N or N log(N) , with >> the difference hard to discern in granularity and noise. >> >> The convolve function is not very big (see below). When type is >> not specified, it defaults to "circular". So my guess is that something >> mysterious might be happening inside the first else clause below, >> at least on some architectures. >> >> -Art Owen >> >> >> >>> convolve >> function (x, y, conj = TRUE, type = c("circular", "open", "filter")) >> { >> type <- match.arg(type) >> n <- length(x) >> ny <- length(y) >> Real <- is.numeric(x) && is.numeric(y) >> if (type == "circular") { >> if (ny != n) >> stop("length mismatch in convolution") >> } >> else { >> n1 <- ny - 1 >> x <- c(rep.int(0, n1), x) >> n <- length(y <- c(y, rep.int(0, n - 1))) >> } >> x <- fft(fft(x) * (if (conj) >> Conj(fft(y)) >> else fft(y)), inv = TRUE) >> if (type == "filter") >> (if (Real) >> Re(x) >> else x)[-c(1:n1, (n - n1 + 1):n)]/n >> else (if (Real) >> Re(x) >> else x)/n >> } >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > Charles C. Berry (858) 534-2098 > Dept of > Family/Preventive Medicine > E mailto:[EMAIL PROTECTED] UC San Diego > http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.