My local R-devel version now has (in ?lm) Non-‘NULL’ ‘weights’ can be used to indicate that different observations have different variances (with the values in ‘weights’ being inversely proportional to the variances); or equivalently, when the elements of ‘weights’ are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations (including the case that there are w_i observations equal to y_i and the data have been summarized). However, in the latter case, notice that within-group variation is not used. Therefore, the sigma estimate and residual degrees of freedom may be suboptimal; in the case of replication weights, even wrong. Hence, standard errors and analysis of variance tables should be treated with care.
OK? -pd > On 12 Oct 2017, at 13:48 , Arie ten Cate <arietenc...@gmail.com> wrote: > > OK. We have now three suggestions to repair the text: > - remove the text > - add "not" at the beginning of the text > - add at the end of the text a warning; something like: > > "Note that in this case the standard estimates of the parameters are > in general not correct, and hence also the t values and the p value. > Also the number of degrees of freedom is not correct. (The parameter > values are correct.)" > > A remark about the glm example: the Reference manual says: "For a > binomial GLM prior weights are used to give the number of trials when > the response is the proportion of successes ....". Hence in the > binomial case the weights are frequencies. > With y <- 0.51 and w <- 100 you get the same result. > > Arie > > On Mon, Oct 9, 2017 at 5:22 PM, peter dalgaard <pda...@gmail.com> wrote: >> AFAIR, it is a little more subtle than that. >> >> If you have replication weights, then the estimates are right, it is "just" >> that the SE from summary.lm() are wrong. Somehow, the text should reflect >> this. >> >> It is of some importance when you put glm() into the mix, because you can in >> fact get correct results from things like >> >> y <- c(0,1) >> w <- c(49,51) >> glm(y~1, weights=w, family=binomial) >> >> -pd >> >>> On 9 Oct 2017, at 07:58 , Arie ten Cate <arietenc...@gmail.com> wrote: >>> >>> Yes. Thank you; I should have quoted it. >>> I suggest to remove this text or to add the word "not" at the beginning. >>> >>> Arie >>> >>> On Sun, Oct 8, 2017 at 4:38 PM, Viechtbauer Wolfgang (SP) >>> <wolfgang.viechtba...@maastrichtuniversity.nl> wrote: >>>> Ah, I think you are referring to this part from ?lm: >>>> >>>> "(including the case that there are w_i observations equal to y_i and the >>>> data have been summarized)" >>>> >>>> I see; indeed, I don't think this is what 'weights' should be used for >>>> (the other part before that is correct). Sorry, I misunderstood the point >>>> you were trying to make. >>>> >>>> Best, >>>> Wolfgang >>>> >>>> -----Original Message----- >>>> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Arie ten >>>> Cate >>>> Sent: Sunday, 08 October, 2017 14:55 >>>> To: r-devel@r-project.org >>>> Subject: [Rd] Discourage the weights= option of lm with summarized data >>>> >>>> Indeed: Using 'weights' is not meant to indicate that the same >>>> observation is repeated 'n' times. As I showed, this gives erroneous >>>> results. Hence I suggested that it is discouraged rather than >>>> encouraged in the Details section of lm in the Reference manual. >>>> >>>> Arie >>>> >>>> ---Original Message----- >>>> On Sat, 7 Oct 2017, wolfgang.viechtba...@maastrichtuniversity.nl wrote: >>>> >>>> Using 'weights' is not meant to indicate that the same observation is >>>> repeated 'n' times. It is meant to indicate different variances (or to >>>> be precise, that the variance of the last observation in 'x' is >>>> sigma^2 / n, while the first three observations have variance >>>> sigma^2). >>>> >>>> Best, >>>> Wolfgang >>>> >>>> -----Original Message----- >>>> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Arie ten >>>> Cate >>>> Sent: Saturday, 07 October, 2017 9:36 >>>> To: r-devel@r-project.org >>>> Subject: [Rd] Discourage the weights= option of lm with summarized data >>>> >>>> In the Details section of lm (linear models) in the Reference manual, >>>> it is suggested to use the weights= option for summarized data. This >>>> must be discouraged rather than encouraged. The motivation for this is >>>> as follows. >>>> >>>> With summarized data the standard errors get smaller with increasing >>>> numbers of observations. However, the standard errors in lm do not get >>>> smaller when for instance all weights are multiplied with the same >>>> constant larger than one, since the inverse weights are merely >>>> proportional to the error variances. >>>> >>>> Here is an example of the estimated standard errors being too large >>>> with the weights= option. The p value and the number of degrees of >>>> freedom are also wrong. The parameter estimates are correct. >>>> >>>> n <- 10 >>>> x <- c(1,2,3,4) >>>> y <- c(1,2,5,4) >>>> w <- c(1,1,1,n) >>>> xb <- c(x,rep(x[4],n-1)) # restore the original data >>>> yb <- c(y,rep(y[4],n-1)) >>>> print(summary(lm(yb ~ xb))) >>>> print(summary(lm(y ~ x, weights=w))) >>>> >>>> Compare with PROC REG in SAS, with a WEIGHT statement (like R) and a >>>> FREQ statement (for summarized data). >>>> >>>> Arie >>>> >>>> ______________________________________________ >>>> R-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> -- >> Peter Dalgaard, Professor, >> Center for Statistics, Copenhagen Business School >> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> Phone: (+45)38153501 >> Office: A 4.23 >> Email: pd....@cbs.dk Priv: pda...@gmail.com >> >> >> >> >> >> >> >> >> -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel