Re: [R] options in 'rnorm' to set the lower bound of normal dist

Prof Brian Ripley Thu, 27 Mar 2008 08:14:37 -0700

On Thu, 27 Mar 2008, [EMAIL PROTECTED] wrote:

Brian Ripley's suggestions of truncated normal and
log-normal are of course resources for ensuring that
you get positive simulated results.


I was answering the question asked!

Wanting to simulate from the fitted mean and sd does *not* mean you wantto emulate the data -- indeed, it is often done to see if peculiar aspectsof data have a noticeable effect on the rest of the analysis.

I've done this with analytical chemist colleagues several times, althoughI did point out it was better to use robust summaries.

However, I think youre real problem (having looked at
the numbers you quote) is that you should not be thinking
of using a normal distribution, or anything similar, at all.

Your variables dat$x and dat$y have:
  mean(dat$x)
## [1] 0.3126667
  sd(dat$x)
## [1] 0.3372137

  mean(dat$y)
## [1] 0.4223137
  sd(dat$y)
## [1] 0.5120841

so in both cases the SD is about the same as the mean, and
getting negative values from simulated normal distributions
with these means and SDs is inevitable.

But now look at the two series of 15 numbers in dat$x and dat$y:
The first 12 are of the order of 0.15 and 0.20 respectively,
while the final 3 in each case are of the order of 1.0 and 1.5
respectively. And this is where your large SD is coming from.
Neither series is from a simple normal distribution.

  mean(dat$x[1:12])
## [1] 0.1519167
  sd(dat$x[1:12])
## [1] 0.02447432
and it is impossible for the last 3 ( 1.032 0.803 1.032)
to have come from a normal distribution giving the first 12.

  mean(dat$y[1:12])
## [1] 0.1807932
  sd(dat$y[1:12])
## [1] 0.03380083
with a similar conclusion; and likewise for the last three
1.551043 1.063100 1.551043

Note that, for the first 12 in each case, the SD less that
1/5 of the mean:

  mean(dat$x[1:12])/sd(dat$x[1:12])
## [1] 6.207186
  mean(dat$y[1:12])/sd(dat$y[1:12])
## [1] 5.348779

so, where the first 12 of x and y are concerned, if you sampled
from normal distributions with the same means and SDs you would
get a negative number with probability less than

  pnorm(-5)
## [1] 2.866516e-07

What you in fact have here is that the numbers are in two groups,
each with a small SD relative to its mean:

            dat$x             dat$y
         Mean     SD       Mean     SD
------------------------+-------------------
1:12    0.152   0.024  |  0.181   0.034
                       |
13:15    0.956   0.132  |  1.39    0.282

Note that for dat$x Mean/SD approx = 6 for each sub-series,
and for data$y Mean/SD approx = 5 for each subseries, so
you could be looking at results which display a nearly
constant coefficient of variation. Now, this is indeed a
property of the log-normal distribution (as well as of
others), so that could indeed be worth considering.
However, you still have to account for the apparent split
noted above into distinct groups.

So you are really facing a modelling question: why did
the numbers come out as they did, and what is a good way
to represent that mechanism as a distribution?

With best wishes,
Ted.

On 27-Mar-08 12:27:55, Tom Cohen wrote:


Dear list,
  I have a dataset containing values obtained from two different
instruments (x and y).
I want to generate 5 samples from normal distribution for each
instrument based on
their means and standard deviations. The problem is values from both
instruments are
non-negative, so if using rnorm I would get some negative values. Is
there any options
to determine the lower bound of normal distribution to be 0 or can I
simulate the
samples in different ways to avoid the negative values?


 > dat
    id     x         y
75 101 0.134 0.1911315
79 102 0.170 0.1610306
76 103 0.134 0.1911315
84 104 0.170 0.1610306
74 105 0.134 0.1911315
80 106 0.170 0.1610306
77 107 0.134 0.1911315
81 108 0.170 0.1610306
82 109 0.170 0.1610306
78 111 0.170 0.1610306
83 112 0.170 0.1610306
85 113 0.097 0.2777778
2  201 1.032 1.5510434
1  202 0.803 1.0631001
5  203 1.032 1.5510434

mu<-apply(dat[,-1],2,mean)
sigma<-apply(dat[,-1],2,sd)
  len<-5
n<-20
s1<-vector("list",len)
  set.seed(7)
for(i in 1:len){
    s1[[i]]<-cbind.data.frame(x=rnorm(n*i,mean=mu[1],sd=sigma[1]),
                             y=rnorm(n*i,mean=mu[2],sd=sigma[2]))
               }

Thanks for any help,
Tom


---------------------------------
Sök efter kärleken!

      [[alternative HTML version deleted]]


--------------------------------------------------------------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 27-Mar-08                                       Time: 14:00:44
------------------------------ XFMail ------------------------------

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] options in 'rnorm' to set the lower bound of normal dist

Reply via email to