On May 20, 2010, at 11:12 AM, Alexander Shenkin wrote:

On 5/20/2010 9:18 AM, David Winsemius wrote:

On May 20, 2010, at 10:02 AM, Alexander Shenkin wrote:

Hello all,

I've been pouring through the various spatial packages, but haven't come
across the right thing yet.

There is a SIG for such questions.

thanks - just joined it.

Given a set of points in 2-d space X, i'm trying to find the subset of points in Y proximate to each point in X. Furthermore, the proximity threshold of each point in X differs (X$threshold). I've constructed
this myself already, but it's horrificly slow with a dataset of 40k+
points in one set, and a 700 in the other.

A very inefficient example of what I'm looking for:

Not really a reproducible example. If euclidean_dist is a function ,
then it is not one in any of the packages I have installed.

it's not reproducible - i'll make a better effort to include
reproducible code in the future. and that function is just one i would have written, but didn't want to clog the email with code. Anyway, here
is a reproducible example:

X = data.frame(x=c(1,2,3), y=c(2,3,1), threshold=c(1,2,4))
Y = data.frame(x=c(5,2,3,4,2,5,2,3), y=c(5,2,2,4,1,2,3,1))
proximate=list()
i=1
for (pt in 1:length(X$x)) {
   proximate[[i]] <- sqrt((X[pt,]$x - Y$x)^2 + (X[pt,]$y - Y$y)^2) >
X[pt,]$threshold
   i=i+1
}
proximate

This might be a more R-ish way of approaching the problem. It is not always the case that apply functions will be faster than looping, but they may be more compact and they get you ready for thinking about indexed solutions which would be generally faster.

> apply(X, 1, function(.x) { (.x[1] - Y$x )^2+(.x[2] - Y$y)^2 <.x[3]^2 } )

      [,1]  [,2]  [,3]
[1,] FALSE FALSE FALSE
[2,] FALSE  TRUE  TRUE
[3,] FALSE  TRUE  TRUE
[4,] FALSE FALSE  TRUE
[5,] FALSE FALSE  TRUE
[6,] FALSE FALSE  TRUE
[7,] FALSE  TRUE  TRUE
[8,] FALSE FALSE  TRUE

Transposing lets you check that the results are the same as your solution modulo yours and mine being slightly different classes:

> t( apply(X, 1, function(.x) { (.x[1] - Y$x )^2+(.x[2] - Y$y)^2 <.x[3]^2 } ) )
      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]
[1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[2,] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE
[3,] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

> proximate
[[1]]
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

[[2]]
[1] FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE

[[3]]
[1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE



  for (pt in X$idx) {
      proximity[i] = euclidian_dist(X[pt]$x, X[pt]$y, Y$x, Y$y) <
X$threshold
   i = i+1
  }


Have you considered first creating a subset of candidate points that are
within "threshold" of each reference point on both coordinates. That
might sidestep a lot of calculations on points that are easily
eliminated on a single comparison. Then you could calculate distances
within that surviving subset of points. On average that should give you
an over 50% "hit rate":

(4/3)*pi*0.5^3
[1] 0.5235988

That's a nice idea.  I'll still be waiting quite a while while my
machine cranks, but not as long. Still - I suspect there would be much
bigger gains if there were tailored packages.  I'll re-post over on
sig-geo about that.  thanks.

Perhaps crossdist() in spatstat is what I should use, and then code a
comparison with X$threshold after the cross-distances are computed.
However, I was wondering if there was another tool I should be
considering. Any and all thoughts are very welcome. Thanks in advance.

Thanks,
Allie
--
Alexander Shenkin
PhD Candidate
School of Natural Resources and Environment
University of Florida

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to