Re: [R] finding euclidean proximate points in two datasets

David Winsemius Thu, 20 May 2010 09:22:42 -0700


On May 20, 2010, at 11:12 AM, Alexander Shenkin wrote:

On 5/20/2010 9:18 AM, David Winsemius wrote:
On May 20, 2010, at 10:02 AM, Alexander Shenkin wrote:
Hello all,
I've been pouring through the various spatial packages, buthaven't come
across the right thing yet.
There is a SIG for such questions.
thanks - just joined it.
Given a set of points in 2-d space X, i'm trying to find thesubset ofpoints in Y proximate to each point in X. Furthermore, theproximitythreshold of each point in X differs (X$threshold). I'veconstructed
this myself already, but it's horrificly slow with a dataset of 40k+
points in one set, and a 700 in the other.

A very inefficient example of what I'm looking for:
Not really a reproducible example. If euclidean_dist is a function ,
then it is not one in any of the packages I have installed.
it's not reproducible - i'll make a better effort to include
reproducible code in the future. and that function is just one iwouldhave written, but didn't want to clog the email with code. Anyway,here
is a reproducible example:

X = data.frame(x=c(1,2,3), y=c(2,3,1), threshold=c(1,2,4))
Y = data.frame(x=c(5,2,3,4,2,5,2,3), y=c(5,2,2,4,1,2,3,1))
proximate=list()
i=1
for (pt in 1:length(X$x)) {
   proximate[[i]] <- sqrt((X[pt,]$x - Y$x)^2 + (X[pt,]$y - Y$y)^2) >
X[pt,]$threshold
   i=i+1
}
proximate

This might be a more R-ish way of approaching the problem. It is notalways the case that apply functions will be faster than looping, butthey may be more compact and they get you ready for thinking aboutindexed solutions which would be generally faster.

> apply(X, 1, function(.x) { (.x[1] - Y$x )^2+(.x[2] - Y$y)^2<.x[3]^2 } )


      [,1]  [,2]  [,3]
[1,] FALSE FALSE FALSE
[2,] FALSE  TRUE  TRUE
[3,] FALSE  TRUE  TRUE
[4,] FALSE FALSE  TRUE
[5,] FALSE FALSE  TRUE
[6,] FALSE FALSE  TRUE
[7,] FALSE  TRUE  TRUE
[8,] FALSE FALSE  TRUE

Transposing lets you check that the results are the same as yoursolution modulo yours and mine being slightly different classes:

> t( apply(X, 1, function(.x) { (.x[1] - Y$x )^2+(.x[2] - Y$y)^2<.x[3]^2 } ) )

      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]
[1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[2,] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE
[3,] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

> proximate
[[1]]
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

[[2]]
[1] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE

[[3]]
[1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE


  for (pt in X$idx) {
      proximity[i] = euclidian_dist(X[pt]$x, X[pt]$y, Y$x, Y$y) <
X$threshold
   i = i+1
  }

Have you considered first creating a subset of candidate pointsthat are

within "threshold" of each reference point on both coordinates. That
might sidestep a lot of calculations on points that are easily
eliminated on a single comparison. Then you could calculate distances

within that surviving subset of points. On average that should giveyou

an over 50% "hit rate":

(4/3)*pi*0.5^3

[1] 0.5235988


That's a nice idea.  I'll still be waiting quite a while while my

machine cranks, but not as long. Still - I suspect there would bemuch

bigger gains if there were tailored packages.  I'll re-post over on
sig-geo about that.  thanks.

Perhaps crossdist() in spatstat is what I should use, and thencode a
comparison with X$threshold after the cross-distances are computed.
However, I was wondering if there was another tool I should be
considering. Any and all thoughts are very welcome. Thanks inadvance.
Thanks,
Allie
--
Alexander Shenkin
PhD Candidate
School of Natural Resources and Environment
University of Florida


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] finding euclidean proximate points in two datasets

Reply via email to