David, Great idea! It is simple yet very effective. I didn’t know that sample can be used with probability/weight. I feel that I learned a lot. Thanks a lot for the help! Heyi
--- On Fri, 3/23/12, David Winsemius <dwinsem...@comcast.net> wrote: > From: David Winsemius <dwinsem...@comcast.net> > Subject: Re: [R] Nonparametric bivariate distribution estimation and sampling > To: "heyi xiao" <xiaohey...@yahoo.com> > Cc: "Sarah Goslee" <sarah.gos...@gmail.com>, r-help@r-project.org > Date: Friday, March 23, 2012, 5:29 PM > > On Mar 23, 2012, at 3:55 PM, heyi xiao wrote: > > > David, > > Thanks a lot for the specific suggestions. That’s > very helpful. My question 1 is fully answered now. I guess I > am not clear enough for my question 2. I would like to > generate a random sample using the estimated probability > density (as a result of my question 1) as the reference > distribution. > > > Say, I get a matrix of the estimated density (at some > grid points) using MASS::kde2d. How can I use that result as > a reference distribution to sample data from? I know it is a > trivial issue for parametric distributions like bivariate > normal, but what about such a nonparametric bivariate > reference distribution? Any particular procedures or > functions I can use? > > See if this works: > > data(geyser, package="MASS") > x <- cbind(geyser$duration, geyser$waiting) > est <- bkde2D(x, bandwidth=c(0.7, 7)) > > # Heh, I realized after I did this that I started with > KernSmooth::bkde2D > # and checked the results with MASS::kde2d > # only difference appears to be name of density matrix > # Construct a dataframe with X.Y information and the data > from the bivariate density. > # The output of bkde2D with n=50 is: > > #List of 3 > #$ x1 : num [1:51] -0.2167 -0.0823 0.052 0.1863 > 0.3207 ... > # $ x2 : num [1:51] 32.5 34.2 35.9 37.7 39.4 ... > # $ fhat: num [1:51, 1:51] 3.05e-19 2.17e-19 3.25e-19 > 2.17e-19 0.00 ... > > # The index X.Y could be X + 51*Y and there would be a 1:1 > mapping from (X,Y) to X.Y > # and the fhat values would be properly arranged > > dfrm <- expand.grid(X=1:51, Y=1:51) > dfrm$fhat <- c(est$fhat) > > #Sample randomly from X.Y with length=51*51 using the fhat > values for prob. > > # The X.Y "index" never actually gets computed > # but is implicit in the order of the data.frame > sampfrm <- dfrm[sample(51*51, 300, prob=est$fhat) , ] > f2 <- with(sampfrm, MASS::kde2d(X, Y, n = 50, lims > = c(0, 51, 0, 51)) ) > persp(f2) > > # Looks reasonable to my eye anyway. > > --David. > > > The reason I don’t want to use sampling (with > replacement, I can sample more data than I have without > replacement), as this will generate lots of duplicate data > points, if I want to generated bigger dataset yet my raw > data do not have a big sample size. The scatter plot of the > sampled data doesn’t look good this way. > > Heyi > > > > > > --- On Fri, 3/23/12, David Winsemius <dwinsem...@comcast.net> > wrote: > > > >> From: David Winsemius <dwinsem...@comcast.net> > >> Subject: Re: [R] Nonparametric bivariate > distribution estimation and sampling > >> To: "heyi xiao" <xiaohey...@yahoo.com> > >> Cc: "Sarah Goslee" <sarah.gos...@gmail.com>, > r-help@r-project.org > >> Date: Friday, March 23, 2012, 2:20 PM > >> > >> On Mar 23, 2012, at 1:53 PM, heyi xiao wrote: > >> > >>> Sarah, > >>> Thanks for the response. I actually have > several years > >> of working experience with R and statistics, > although may > >> not be as good as you. that’s why I am here ;) I > dug > >> deeper into R documentations and previous R-help > posts, and > >> couldn’t found anything particular help. Again, I > want to > >> do two things: (1) estimate the probability density > of this > >> bivariate distribution using some nonparametric > method > >> (kernel, spline etc); > >> > >> ?MASS::kde2d > >> ?KernSmooth::bkde2D > >> ?ade4::s.kde2d > >> help(package=locfit) > >> > >>> (2) sample a big dataset from this bivariate > >> distribution for a simulation study. > >> > >> What is wrong with `sample`? > >> > >> # to get sample of size n without replacement > >> set.seed(42) > >> dfrm[ sample(1:NROW(dfrm), n) , ] > >> > >> --David. > >>> If my questions are not clear enough show my > how I can > >> improve, or which part is not clear enough. If you > have any > >> particular suggestions/comments, you are more than > welcome. > >> Thanks! > >>> Heyi > >>> > >>> > >>> --- On Fri, 3/23/12, Sarah Goslee <sarah.gos...@gmail.com> > >> wrote: > >>> > >>>> From: Sarah Goslee <sarah.gos...@gmail.com> > >>>> Subject: Re: [R] Nonparametric bivariate > >> distribution estimation and sampling > >>>> To: "heyi xiao" <xiaohey...@yahoo.com> > >>>> Cc: r-help@r-project.org > >>>> Date: Friday, March 23, 2012, 12:26 PM > >>>> R can do all of that and more. > >>>> > >>>> But you'll need to put some work in reading > about > >> how to use > >>>> R, about > >>>> the statistical methods involved, and about > how to > >> use them > >>>> to best > >>>> effect. You might want, for instance, > generalized > >> additive > >>>> models. Or > >>>> not. If your question isn't more > fully-formed than > >> this, > >>>> your best bet > >>>> is almost certainly to talk to a local > >> statistician, spend > >>>> some time > >>>> working with R, and then come back to the > list > >> with > >>>> specific > >>>> questions. > >>>> > >>>> Sarah > >>>> > >>>> On Fri, Mar 23, 2012 at 12:17 PM, heyi xiao > <xiaohey...@yahoo.com> > >>>> wrote: > >>>>> Dear all, > >>>>> I have a bivariate dataset from a > preliminary > >> study. I > >>>> want to do two things: (1) estimate the > probability > >> density > >>>> of this bivariate distribution using some > >> nonparametric > >>>> method (kernel, spline etc); (2) sample a > big > >> dataset from > >>>> this bivariate distribution for a > simulation > >> study. > >>>>> Is there any good method or package I > can use > >> in R for > >>>> my work? I don’t want parametric models > like > >> bivariate > >>>> normal distribution etc, as I would like > to > >> accurate model > >>>> my data. I don’t want to use the > bootstrapping > >> approach, > >>>> i.e. sampling with replacement, as this > will > >> generate lots > >>>> of duplicate data points. Any thoughts or > input > >> will be > >>>> highly appreciated! > >>>>> Heyi > >>>>> > >>>>> > >>>> > >>>> --Sarah Goslee > >>>> http://www.functionaldiversity.org > >>>> > >>> > >>> ______________________________________________ > >>> R-help@r-project.org > >> mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, > self-contained, > >> reproducible code. > >> > >> David Winsemius, MD > >> West Hartford, CT > >> > >> > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.