On Nov 19, 2010, at 9:40 PM, Emmanuel Levy wrote:
Hello David,
I thought about this at first as well, e.g.,
x1.lim = quantile(x1,prob=c(0.05,0.95))
y2.lim = quantile(y2,prob=c(0.05,0.95))
x1.sub = x1[ x1 > x1.lim[1] & x1 < x1.lim[2] & y2 > y2.lim[1] & y2 <
y2.lim[2]]
y2.sub = y2[ x1 > x1.lim[1] & x1 < x1.lim[2] & y2 > y2.lim[1] & y2 <
y2.lim[2]]
But this is actually does not do what I want because it is not based
on the density of the data. What I would like is to keep only the
points within an area where the density of points is over x.
I now see what you were asking. My initial efforts with the example
in the kde2d help page did not turn out the way I naively expected.
And the multimodal nature of the geyser example makes it even less
amenable to the approach you outlined above. The points in the kde2d
grid are not associated with any particular point in the data,
although I suppose one could find the closest point on the 50 x 50
kde2d grid to each of of the data points and take the median or
whatever percentile were desired of that set of measures. Sounds too
messy....
But I think I came up with a potential solution that will draw
estimated boundaries that will contain the highest density areas and
with 50% of the points:
require(Mass)
require(coda) # I was worried that it might not run without BRugs but
it didn't complain
require(emdbook) # (lots of dependencies)
HPDregionplot(mcmc(geyser, prob=0.5))
points(geyser)
Looking at this result made me wonder whether I had seen this before,
and I thought of the locfit package. See page 86 of Loader's text:
require(locfit)
fit.trim<- locfit( ~ lp(waiting, duration, nn=0.1,scale=0), data=geyser)
plot(fit.trim, xlim=c(min(geyser$waiting),min(geyser
$duration),max(geyser$waiting), max(geyser$duration) ) )
geyser$dens <-(fitted(fit.trim)) ; med.dens <- median(geyser$dens);
high <- geyser$dens > med.dens
# These densities are associated with points.
points(geyser[!high, "waiting"], geyser[!high, "duration"],
col="blue", cex=0.5 )
points(geyser[high, "waiting"], geyser[high, "duration"], col="red",
cex=0.5 )
(I could not get the contour lines at 95% and 50% coverage using the
code in the book so this was my best shot. I would be interested in
knowing how to do it with locfit, which I why I am copying Andy Liaw,
although he probably won't see it until Monday.)
--
David.
In other words, if you imagine a contour plot, I'd like to keep all
the points within a given contour line. So the data has to be
interpolated or smoothed at some point. I am using the kde2d function
of the MASS package to plot contour lines but I don't know how to
retrieve subsets of what I plot.
Also I wouldn't be surprised if there is a trick with quantile that
escapes my mind.
Thanks for your help,
Emmanuel
On 19 November 2010 21:25, David Winsemius <dwinsem...@comcast.net>
wrote:
On Nov 19, 2010, at 8:44 PM, Emmanuel Levy wrote:
Hello,
This sounds like a problem to which many solutions should exist,
but I
did not manage to find one.
Basically, given a list of datapoints, I'd like to keep those within
the X% percentile highest density.
That would be equivalent to retain only points within a given line
of
a contour plot.
Thanks to anybody who could let me know which function I could use!
What's wrong with quantile?
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.