Hi Dan, This is fantastic. I've just run your code with same data as before and the results are:
BEFORE: user system elapsed 8166.07 2.98 8194.43 AFTER (with Dan's code): user system elapsed 18.53 0.03 18.59 So with my "real" data this code is over 440 times faster ..... Thank you so much! Monica > Date: Tue, 16 Sep 2008 14:10:34 -0400 > From: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > CC: r-help@r-project.org > Subject: Re: [R] Spatial join ? optimizing code > > Hi Monica, > > I think the key to speeding this up is, for every point in 'track', to > compute the distance to all points in 'classif' 'simultaneously', > using vectorized calculations. Here's my function. On my laptop it's > about 160 times faster than the original for the case I looked at > (10,000 observations in track and 500 in classif). I get around 18 > seconds for the 30,000 and 4,000 example (2 GHz processor running > linux). > > Dan > > dist.merge2 <- function(x, y, xeast, xnorth, yeast, ynorth) { > ## construct data frame d in which d[i,] contains information > ## associated with the closest point in y to x[i,] > xpos <- as.matrix(x[,c(xeast, xnorth)]) > xposl <- lapply(seq.int(nrow(x)), function(i) xpos[i,]) > ypos <- t(as.matrix(y[,c(yeast, ynorth)])) > yinfo <- y[,! colnames(y) %in% c(yeast,ynorth)] > > get.match.and.dist <- function(point) { > sqdists <- colSums((point - ypos)^2) > ind <- which.min(sqdists) > c(ind, sqrt(sqdists[ind])) > } > match <- sapply(xposl, get.match.and.dist) > cbind(xpos, mindist=match[2,], yinfo[match[1,],]) > } > > It's marginally faster to convert xpos to a list followed by sapply as > I do here, than to leave it as a matrix and use apply to get the > matches. > > > > > > > On Tue, Sep 16, 2008 at 04:23:33PM +0000, Monica Pisica wrote: >> >> Hi, >> >> Few days ago I have asked about spatial join on the minimum distance between >> 2 sets of points with coordinates and attributes in 2 different data frames. >> >> Simon Knapp sent code to do it when calculating distance on a sphere using >> lat, long coordinates and I've change his code to use Euclidian distances >> since my data had UTM coordinates. >> >> Typically one data frame has around 30 000 points and the classification >> data frame has around 4000 points, and the aim is to add to each point from >> the first data frame all the attributes from the second data frame of the >> point that is closest to it. >> >> On my PC (Dell, OptiPlex GX620, X86 ? based PC, 4 GB RAM, 3192 Mhz processor) >> It took quite a long time to do the join: >> >> user system elapsed >> 8166.07 2.98 8194.43 >> >> Sys.info() >> sysname release >> "Windows" "XP" >> version nodename >> "build 2600, Service Pack 2" >> machine >> "x86" >> I am running R 2.7.1 patched. >> I wonder if any of you can suggest or help (or have time) in optimizing this >> code to make it run faster. My programming skills are not high enough to do >> it. >> >> Thanks, >> >> Monica >> >> #### code follows: >> #### x a data frame with over 30000 points with coord in UTM, xeast, xnorth >> #### y a data frame with over 4000 points with UTM coord (yeast, ynorth) and >> ##### classification >> ### calculating Euclidian distance >> >> dist <- function(xeast, xnorth, yeast, ynorth) { >> ((xeast-yeast)^2 + (xnorth-ynorth)^2)^0.5 >> } >> >> ### doing the merge by location with minimum distance >> >> dist.merge <- function(x, y, xeast, xnorth, yeast, ynorth){ >> tmp <- t(apply(x[,c(xeast, xnorth)], 1, function(x, y){ >> dists <- apply(y, 1, function(x, y) dist(x[2], >> x[1], y[2], y[1]), x) >> cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,] >> } >> , y[,c(yeast, ynorth)])) >> tmp <- cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(yeast, >> ynorth), names(y))]) >> row.names(tmp) <- NULL >> tmp >> } >> >> #### code end >> >> _________________________________________________________________ >> >> Live. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > http://www.stats.ox.ac.uk/~davison _________________________________________________________________ 50F681DAD532637!5295.entry?ocid=TXT_TAGLM_WL_domore_092008 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.