Re: [R] Spatial join - optimizing code

Monica Pisica Tue, 16 Sep 2008 12:02:15 -0700

Hi Dan,

This is fantastic. I've just run your code with same data as before and the 
results are:


BEFORE:

   user  system  elapsed
8166.07    2.98  8194.43

AFTER (with Dan's code):

   user  system elapsed 
  18.53    0.03   18.59 

So with my "real" data this code is over 440 times faster .....

Thank you so much!

Monica




> Date: Tue, 16 Sep 2008 14:10:34 -0400
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> CC: r-help@r-project.org
> Subject: Re: [R] Spatial join ? optimizing code
>
> Hi Monica,
>
> I think the key to speeding this up is, for every point in 'track', to
> compute the distance to all points in 'classif' 'simultaneously',
> using vectorized calculations. Here's my function. On my laptop it's
> about 160 times faster than the original for the case I looked at
> (10,000 observations in track and 500 in classif). I get around 18
> seconds for the 30,000 and 4,000 example (2 GHz processor running
> linux).
>
> Dan
>
> dist.merge2 <- function(x, y, xeast, xnorth, yeast, ynorth) {
> ## construct data frame d in which d[i,] contains information
> ## associated with the closest point in y to x[i,]
> xpos <- as.matrix(x[,c(xeast, xnorth)])
> xposl <- lapply(seq.int(nrow(x)), function(i) xpos[i,])
> ypos <- t(as.matrix(y[,c(yeast, ynorth)]))
> yinfo <- y[,! colnames(y) %in% c(yeast,ynorth)]
>
> get.match.and.dist <- function(point) {
> sqdists <- colSums((point - ypos)^2)
> ind <- which.min(sqdists)
> c(ind, sqrt(sqdists[ind]))
> }
> match <- sapply(xposl, get.match.and.dist)
> cbind(xpos, mindist=match[2,], yinfo[match[1,],])
> }
>
> It's marginally faster to convert xpos to a list followed by sapply as
> I do here, than to leave it as a matrix and use apply to get the
> matches.
>
>
>
>
>
>
> On Tue, Sep 16, 2008 at 04:23:33PM +0000, Monica Pisica wrote:
>>
>> Hi,
>>
>> Few days ago I have asked about spatial join on the minimum distance between 
>> 2 sets of points with coordinates and attributes in 2 different data frames.
>>
>> Simon Knapp sent code to do it when calculating distance on a sphere using 
>> lat, long coordinates and I've change his code to use Euclidian distances 
>> since my data had UTM coordinates.
>>
>> Typically one data frame has around 30 000 points and the classification 
>> data frame has around 4000 points, and the aim is to add to each point from 
>> the first data frame all the attributes from the second data frame of the 
>> point that is closest to it.
>>
>> On my PC (Dell, OptiPlex GX620, X86 ? based PC, 4 GB RAM, 3192 Mhz processor)
>> It took quite a long time to do the join:
>>
>> user system elapsed
>> 8166.07 2.98 8194.43
>>
>> Sys.info()
>> sysname release
>> "Windows" "XP"
>> version nodename
>> "build 2600, Service Pack 2"
>> machine
>> "x86"
>> I am running R 2.7.1 patched.
>> I wonder if any of you can suggest or help (or have time) in optimizing this 
>> code to make it run faster. My programming skills are not high enough to do 
>> it.
>>
>> Thanks,
>>
>> Monica
>>
>> #### code follows:
>> #### x a data frame with over 30000 points with coord in UTM, xeast, xnorth
>> #### y a data frame with over 4000 points with UTM coord (yeast, ynorth) and
>> ##### classification
>> ### calculating Euclidian distance
>>
>> dist <- function(xeast, xnorth, yeast, ynorth) {
>> ((xeast-yeast)^2 + (xnorth-ynorth)^2)^0.5
>> }
>>
>> ### doing the merge by location with minimum distance
>>
>> dist.merge <- function(x, y, xeast, xnorth, yeast, ynorth){
>> tmp <- t(apply(x[,c(xeast, xnorth)], 1, function(x, y){
>> dists <- apply(y, 1, function(x, y) dist(x[2],
>> x[1], y[2], y[1]), x)
>> cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,]
>> }
>> , y[,c(yeast, ynorth)]))
>> tmp <- cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(yeast,
>> ynorth), names(y))])
>> row.names(tmp) <- NULL
>> tmp
>> }
>>
>> #### code end
>>
>> _________________________________________________________________
>>
>> Live.
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> http://www.stats.ox.ac.uk/~davison

_________________________________________________________________


50F681DAD532637!5295.entry?ocid=TXT_TAGLM_WL_domore_092008
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Spatial join - optimizing code

Reply via email to