It's on my todo list (for R-devel, it is not _that_ important), other things just keep taking priority...
-pd > On 28 Nov 2017, at 09:29 , Arie ten Cate <arietenc...@gmail.com> wrote: > > Since the three posters agree (only) that there is a bug, I propose to > file it as a bug, which is the least we can do now. > > There is more to it: the only other case of a change in the Reference > Manual which I know of, is also about the weights option! This is in > coxph. The Reference Manual version 3.0.0 (2013) says about coxph: > > " ... If weights is a vector of integers, then the estimated > coefficients are equivalent to estimating the model from data with the > individual cases replicated as many times as indicated by weights." > > This is not true, as can be seen from the following code, which uses > the data from the first example in the Reference Manual of coxph: > > library(survival) > print(df1 <- as.data.frame(list( > time=c(4,3,1,1,2,2,3), > status=c(1,1,1,0,1,1,0), > x=c(0,2,1,1,1,0,0), > sex=c(0,0,0,0,1,1,1) > ))) > print(w <- rep(2,7)) > print(coxph(Surv(time,status) ~ x + strata(sex),data=df1,weights=w)) > # manually doubling the data: > print(df2 <- rbind(df1,df1)) > print(coxph(Surv(time,status) ~ x + strata(sex), data=df2)) > > This should not come as a surprise, since with coxph the computation > of the likelihood (given the parameters) for a single observation uses > also the other observations. > > This bug has been repaired. The present Reference Manual of coxph says > that the weights option specifies a vector of case weights, to which > is added only: "For a thorough discussion of these see the book by > Therneau and Grambsch." > > Let us repair the other bug also. > > Arie > > On Thu, Oct 12, 2017 at 1:48 PM, Arie ten Cate <arietenc...@gmail.com> wrote: >> OK. We have now three suggestions to repair the text: >> - remove the text >> - add "not" at the beginning of the text >> - add at the end of the text a warning; something like: >> >> "Note that in this case the standard estimates of the parameters are >> in general not correct, and hence also the t values and the p value. >> Also the number of degrees of freedom is not correct. (The parameter >> values are correct.)" >> >> A remark about the glm example: the Reference manual says: "For a >> binomial GLM prior weights are used to give the number of trials when >> the response is the proportion of successes ....". Hence in the >> binomial case the weights are frequencies. >> With y <- 0.51 and w <- 100 you get the same result. >> >> Arie >> >> On Mon, Oct 9, 2017 at 5:22 PM, peter dalgaard <pda...@gmail.com> wrote: >>> AFAIR, it is a little more subtle than that. >>> >>> If you have replication weights, then the estimates are right, it is "just" >>> that the SE from summary.lm() are wrong. Somehow, the text should reflect >>> this. >>> >>> It is of some importance when you put glm() into the mix, because you can >>> in fact get correct results from things like >>> >>> y <- c(0,1) >>> w <- c(49,51) >>> glm(y~1, weights=w, family=binomial) >>> >>> -pd >>> >>>> On 9 Oct 2017, at 07:58 , Arie ten Cate <arietenc...@gmail.com> wrote: >>>> >>>> Yes. Thank you; I should have quoted it. >>>> I suggest to remove this text or to add the word "not" at the beginning. >>>> >>>> Arie >>>> >>>> On Sun, Oct 8, 2017 at 4:38 PM, Viechtbauer Wolfgang (SP) >>>> <wolfgang.viechtba...@maastrichtuniversity.nl> wrote: >>>>> Ah, I think you are referring to this part from ?lm: >>>>> >>>>> "(including the case that there are w_i observations equal to y_i and the >>>>> data have been summarized)" >>>>> >>>>> I see; indeed, I don't think this is what 'weights' should be used for >>>>> (the other part before that is correct). Sorry, I misunderstood the point >>>>> you were trying to make. >>>>> >>>>> Best, >>>>> Wolfgang >>>>> >>>>> -----Original Message----- >>>>> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Arie >>>>> ten Cate >>>>> Sent: Sunday, 08 October, 2017 14:55 >>>>> To: r-devel@r-project.org >>>>> Subject: [Rd] Discourage the weights= option of lm with summarized data >>>>> >>>>> Indeed: Using 'weights' is not meant to indicate that the same >>>>> observation is repeated 'n' times. As I showed, this gives erroneous >>>>> results. Hence I suggested that it is discouraged rather than >>>>> encouraged in the Details section of lm in the Reference manual. >>>>> >>>>> Arie >>>>> >>>>> ---Original Message----- >>>>> On Sat, 7 Oct 2017, wolfgang.viechtba...@maastrichtuniversity.nl wrote: >>>>> >>>>> Using 'weights' is not meant to indicate that the same observation is >>>>> repeated 'n' times. It is meant to indicate different variances (or to >>>>> be precise, that the variance of the last observation in 'x' is >>>>> sigma^2 / n, while the first three observations have variance >>>>> sigma^2). >>>>> >>>>> Best, >>>>> Wolfgang >>>>> >>>>> -----Original Message----- >>>>> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Arie >>>>> ten Cate >>>>> Sent: Saturday, 07 October, 2017 9:36 >>>>> To: r-devel@r-project.org >>>>> Subject: [Rd] Discourage the weights= option of lm with summarized data >>>>> >>>>> In the Details section of lm (linear models) in the Reference manual, >>>>> it is suggested to use the weights= option for summarized data. This >>>>> must be discouraged rather than encouraged. The motivation for this is >>>>> as follows. >>>>> >>>>> With summarized data the standard errors get smaller with increasing >>>>> numbers of observations. However, the standard errors in lm do not get >>>>> smaller when for instance all weights are multiplied with the same >>>>> constant larger than one, since the inverse weights are merely >>>>> proportional to the error variances. >>>>> >>>>> Here is an example of the estimated standard errors being too large >>>>> with the weights= option. The p value and the number of degrees of >>>>> freedom are also wrong. The parameter estimates are correct. >>>>> >>>>> n <- 10 >>>>> x <- c(1,2,3,4) >>>>> y <- c(1,2,5,4) >>>>> w <- c(1,1,1,n) >>>>> xb <- c(x,rep(x[4],n-1)) # restore the original data >>>>> yb <- c(y,rep(y[4],n-1)) >>>>> print(summary(lm(yb ~ xb))) >>>>> print(summary(lm(y ~ x, weights=w))) >>>>> >>>>> Compare with PROC REG in SAS, with a WEIGHT statement (like R) and a >>>>> FREQ statement (for summarized data). >>>>> >>>>> Arie >>>>> >>>>> ______________________________________________ >>>>> R-devel@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>>> ______________________________________________ >>>> R-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> -- >>> Peter Dalgaard, Professor, >>> Center for Statistics, Copenhagen Business School >>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >>> Phone: (+45)38153501 >>> Office: A 4.23 >>> Email: pd....@cbs.dk Priv: pda...@gmail.com >>> >>> > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel