Re: [R] How to do a probability density based filtering in 2D?

David Winsemius Fri, 19 Nov 2010 20:25:35 -0800


On Nov 19, 2010, at 9:40 PM, Emmanuel Levy wrote:

Hello David,

I thought about this at first as well, e.g.,

 x1.lim = quantile(x1,prob=c(0.05,0.95))
 y2.lim = quantile(y2,prob=c(0.05,0.95))

 x1.sub = x1[ x1 > x1.lim[1] & x1 < x1.lim[2] & y2 > y2.lim[1] & y2 <
y2.lim[2]]
 y2.sub = y2[ x1 > x1.lim[1] & x1 < x1.lim[2] & y2 > y2.lim[1] & y2 <
y2.lim[2]]

But this is actually does not do what I want because it is not based
on the density of the data. What I would like is to keep only the
points within an area where the density of points is over x.

I now see what you were asking. My initial efforts with the examplein the kde2d help page did not turn out the way I naively expected.And the multimodal nature of the geyser example makes it even lessamenable to the approach you outlined above. The points in the kde2dgrid are not associated with any particular point in the data,although I suppose one could find the closest point on the 50 x 50kde2d grid to each of of the data points and take the median orwhatever percentile were desired of that set of measures. Sounds toomessy....

But I think I came up with a potential solution that will drawestimated boundaries that will contain the highest density areas andwith 50% of the points:


require(Mass)

require(coda) # I was worried that it might not run without BRugs butit didn't complain

require(emdbook)  # (lots of dependencies)
HPDregionplot(mcmc(geyser, prob=0.5))
points(geyser)

Looking at this result made me wonder whether I had seen this before,and I thought of the locfit package. See page 86 of Loader's text:


require(locfit)
fit.trim<- locfit( ~ lp(waiting, duration, nn=0.1,scale=0), data=geyser)

plot(fit.trim, xlim=c(min(geyser$waiting),min(geyser$duration),max(geyser$waiting), max(geyser$duration) ) )geyser$dens <-(fitted(fit.trim)) ; med.dens <- median(geyser$dens);high <- geyser$dens > med.dens

# These densities are associated with points.

points(geyser[!high, "waiting"], geyser[!high, "duration"],col="blue", cex=0.5 )points(geyser[high, "waiting"], geyser[high, "duration"], col="red",cex=0.5 )

(I could not get the contour lines at 95% and 50% coverage using thecode in the book so this was my best shot. I would be interested inknowing how to do it with locfit, which I why I am copying Andy Liaw,although he probably won't see it until Monday.)

--
David.


In other words, if you imagine a contour plot, I'd like to keep all
the points within a given contour line. So the data has to be
interpolated or smoothed at some point. I am using the kde2d function
of the MASS package to plot contour lines but I don't know how to
retrieve subsets of what I plot.

Also I wouldn't be surprised if there is a trick with quantile that
escapes my mind.

Thanks for your help,

Emmanuel

On 19 November 2010 21:25, David Winsemius <dwinsem...@comcast.net>wrote:


On Nov 19, 2010, at 8:44 PM, Emmanuel Levy wrote:

Hello,
This sounds like a problem to which many solutions should exist,but I
did not manage to find one.

Basically, given a list of datapoints, I'd like to keep those within
the X% percentile highest density.
That would be equivalent to retain only points within a given lineof
a contour plot.

Thanks to anybody who could let me know which function I could use!


What's wrong with quantile?


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to do a probability density based filtering in 2D?

Reply via email to