On May 20, 2010, at 11:12 AM, Alexander Shenkin wrote:
On 5/20/2010 9:18 AM, David Winsemius wrote:
On May 20, 2010, at 10:02 AM, Alexander Shenkin wrote:
Hello all,
I've been pouring through the various spatial packages, but
haven't come
across the right thing yet.
There is a SIG for such questions.
thanks - just joined it.
Given a set of points in 2-d space X, i'm trying to find the
subset of
points in Y proximate to each point in X. Furthermore, the
proximity
threshold of each point in X differs (X$threshold). I've
constructed
this myself already, but it's horrificly slow with a dataset of 40k+
points in one set, and a 700 in the other.
A very inefficient example of what I'm looking for:
Not really a reproducible example. If euclidean_dist is a function ,
then it is not one in any of the packages I have installed.
it's not reproducible - i'll make a better effort to include
reproducible code in the future. and that function is just one i
would
have written, but didn't want to clog the email with code. Anyway,
here
is a reproducible example:
X = data.frame(x=c(1,2,3), y=c(2,3,1), threshold=c(1,2,4))
Y = data.frame(x=c(5,2,3,4,2,5,2,3), y=c(5,2,2,4,1,2,3,1))
proximate=list()
i=1
for (pt in 1:length(X$x)) {
proximate[[i]] <- sqrt((X[pt,]$x - Y$x)^2 + (X[pt,]$y - Y$y)^2) >
X[pt,]$threshold
i=i+1
}
proximate
This might be a more R-ish way of approaching the problem. It is not
always the case that apply functions will be faster than looping, but
they may be more compact and they get you ready for thinking about
indexed solutions which would be generally faster.
> apply(X, 1, function(.x) { (.x[1] - Y$x )^2+(.x[2] - Y$y)^2
<.x[3]^2 } )
[,1] [,2] [,3]
[1,] FALSE FALSE FALSE
[2,] FALSE TRUE TRUE
[3,] FALSE TRUE TRUE
[4,] FALSE FALSE TRUE
[5,] FALSE FALSE TRUE
[6,] FALSE FALSE TRUE
[7,] FALSE TRUE TRUE
[8,] FALSE FALSE TRUE
Transposing lets you check that the results are the same as your
solution modulo yours and mine being slightly different classes:
> t( apply(X, 1, function(.x) { (.x[1] - Y$x )^2+(.x[2] - Y$y)^2
<.x[3]^2 } ) )
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[2,] FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE
[3,] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> proximate
[[1]]
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[[2]]
[1] FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE
[[3]]
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
for (pt in X$idx) {
proximity[i] = euclidian_dist(X[pt]$x, X[pt]$y, Y$x, Y$y) <
X$threshold
i = i+1
}
Have you considered first creating a subset of candidate points
that are
within "threshold" of each reference point on both coordinates. That
might sidestep a lot of calculations on points that are easily
eliminated on a single comparison. Then you could calculate distances
within that surviving subset of points. On average that should give
you
an over 50% "hit rate":
(4/3)*pi*0.5^3
[1] 0.5235988
That's a nice idea. I'll still be waiting quite a while while my
machine cranks, but not as long. Still - I suspect there would be
much
bigger gains if there were tailored packages. I'll re-post over on
sig-geo about that. thanks.
Perhaps crossdist() in spatstat is what I should use, and then
code a
comparison with X$threshold after the cross-distances are computed.
However, I was wondering if there was another tool I should be
considering. Any and all thoughts are very welcome. Thanks in
advance.
Thanks,
Allie
--
Alexander Shenkin
PhD Candidate
School of Natural Resources and Environment
University of Florida
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.