[R] testing parallelism of does-response curves using nls()
Hi, I am trying to use F test or Chi-square test to test if 2 5-parameter (A, B, xmid, scal and H) logistic curves are parallel based on residual sum of squares. What's usually done is to first fit the 2 curves using a constraint (or global) model where all parameters are kept the same except for "xmid"; then fit 2 independent curves using unconstraint models where all 5 parameters are allowed for change. The extra residual sum of squares can then be used to test parallelism using either Chi-square or F test. Now, fitting 2 separate curves are straight forward using nls(), but how to fit a global model to only allow "xmid" to be different among the 2 curves? BTW, is there a selfStart function for 5-parameter logistic like SSfpl for 4-parameter logistic function? Thanks very much, John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testing parallelism of does-response curves using package "drc"
Hi Eik, Thanks for pointing out this great package "drc". Can you or anyone please help me to specify a constrained 5 PL model where only the transition parameter ("e" in L.5()) are allowed to differ? BTW, what does the argument "pmodels=" do in the model specification? And what about arguments "lower=" and "upper="? I feel they might be related to specifying constrained models. Thanks John --- On Sat, 3/13/10, Eik Vettorazzi wrote: > From: Eik Vettorazzi > Subject: Re: [R] testing parallelism of does-response curves using nls() > To: "array chip" > Cc: r-help@r-project.org > Date: Saturday, March 13, 2010, 9:12 AM > Hi John, > you may have a look at the drc-package (or look at the > maintainers web > site www.bioassay.dk) > I think it does exactly what you want. > HTH > Eik > > array chip schrieb: > > Hi, I am trying to use F test or Chi-square test to > test if 2 5-parameter (A, B, xmid, scal and H) logistic > curves are parallel based on residual sum of squares. > > > > What's usually done is to first fit the 2 curves using > a constraint (or global) model where all parameters are kept > the same except for "xmid"; then fit 2 independent curves > using unconstraint models where all 5 parameters are allowed > for change. The extra residual sum of squares can then be > used to test parallelism using either Chi-square or F test. > > > > Now, fitting 2 separate curves are straight forward > using nls(), but how to fit a global model to only allow > "xmid" to be different among the 2 curves? > > > > BTW, is there a selfStart function for 5-parameter > logistic like SSfpl for 4-parameter logistic function? > > > > Thanks very much, > > > > John > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] datadist() in Design library
Hi I got an error message using datadist() from Design package: > library(Design,T) > dd <- datadist(beta.final) > options(datadist="dd") > lrm(Disease ~ gsct+apcct+rarct, x=TRUE, y=TRUE) Error in eval(expr, envir, enclos) : object "Disease" not found All variables inclduing response variable "Disease" are in the data frame "beta.final", why then "Disease" can not be found? I thought with datadist(), all variables are presented to the model fitting functions. maybe I am wrong? thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] validate() in Design library
Hi, another question about validate() in Design library. The arugment "B" of this function is number of repetition for method="bootstrap", which is easy to understand; but for method="crossvalidation", B is the number of groups of omitted observations. This is confusing, I don't understand what it means. Let's say 5-fold cross validation, all samples are divided into 5 groups of equal number of samples, 4 groups will be used as training and the model developed there will be tested in the 1 group left-over. And the process circulate for all 5 groups. What does the "B" argument mean in this example? B=5? or B=1 because 1 group of samples omitted from model development? Thanks Yi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] datadist() in Design library
Dear Jorge, Yes, with "data=beta.final" in the lrm(), it worked. But I thought one of the reasons for datadist() is to make it unnecessary to specify the data frame in the lrm(). Maybe I am completely wrong here. Thanks John --- On Thu, 7/9/09, Jorge Ivan Velez wrote: > From: Jorge Ivan Velez > Subject: Re: [R] datadist() in Design library > To: "array chip" > Cc: "R mailing list" > Date: Thursday, July 9, 2009, 6:55 PM > Dear John, > Have you tried it specifying the 'data' > argument as suggested in lrm help? > Try this: > lrm(Disease ~ gsct + > apcct + rarct, data = beta.final, x > = TRUE, y = TRUE) > > > HTH, > Jorge > > On Thu, Jul 9, 2009 at 6:46 PM, > array chip > wrote: > > > > > Hi I got an error message using datadist() from Design > package: > > > > > library(Design,T) > > > dd <- datadist(beta.final) > > > options(datadist="dd") > > > lrm(Disease ~ gsct+apcct+rarct, x=TRUE, y=TRUE) > > Error in eval(expr, envir, enclos) : object > "Disease" not found > > > > All variables inclduing response variable > "Disease" are in the data frame > "beta.final", why then "Disease" can not > be found? I thought with datadist(), all variables are > presented to the model fitting functions. maybe I am wrong? > > > > > > thanks > > > > John > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] prevalence in logistic regression lrm()
Hi, I am wondering if there is a way to specify the prevalence of events in logistic regression using lrm() from Design package? Linear Discriminant Analysis using lda() from MASS library has an argument "prior=" that we can use to specify the prevalent of events when the actual dataset being analyzed does not have a representative prevalence. How can we incorporate this information in lrm()? The concern I have is if my dataset does not have a representative prevalence, then the probability generated by lrm() will not be meaningful. Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] trying to understand lme() results
Hi, I have very simple balanced randomized block design where I total have 48 observations of a measure of weights of a product, the product was manufactured at 4 sites, so each site has 12 observations. I want to use lme() from nlme package to estimate the standard error of the product weight. So the data look like: MW site 1 540311 2 552861 3 543962 4 523272 5 559633 6 548933 7 573384 8 555974 : : : The random effect model is: Y = mu + b + e where b is random block effect and e is model error. so I fitted a lme model as: obj<-lme(MW~1, random=~1|site, data=dat) summary(obj) Linear mixed-effects model fit by REML Random effects: Formula: ~1 | site (Intercept) Residual StdDev:2064.006 1117.567 Fixed effects: MW ~ 1 Value Std.Error DF t-value p-value (Intercept) 55901.31 1044.534 44 53.51796 0 : : Number of Observations: 48 Number of Groups: 4 I also did: anova(obj) numDF denDF F-value p-value (Intercept) 144 2864.173 <.0001 I believe my standard error estimate is from "Residual" under "Random Effects" part of summary(), which is 1117.567. Now my question is regarding t test under "Fixed effects". I think it's testing whether the over mean weight is 0 or not, which is not interesting anyway. But how the standard error of 1044.534 is calculated? I thought it should be sqrt(MSE)=1117.567 instead. anyone can explain? Same goes to the F test using anova(obj). The F test statistic is equal to square of the t test statistic because of 1 df of numerator. But what's the mean sum of squares of numerator and denominator, where to find them? BTW, I think denominator mean sum of squares (MSE) should be 1117.567^2, but this is not consistent to the standard error in the t test (1044.534). Thanks a lot for any help John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] trying to understand lme() results
Thanks Dennis for the thorough explanation and correction on the design. John --- On Thu, 4/1/10, Dennis Murphy wrote: From: Dennis Murphy Subject: Re: [R] trying to understand lme() results To: "array chip" Cc: R-help@r-project.org Date: Thursday, April 1, 2010, 12:33 AM Hi: On Wed, Mar 31, 2010 at 2:31 PM, array chip wrote: Hi, I have very simple balanced randomized block design where I total have 48 observations of a measure of weights of a product, the product was manufactured at 4 sites, so each site has 12 observations. I want to use lme() from nlme package to estimate the standard error of the product weight. It's a balanced one-way design where site is assumed to be a random factor. If you want to call it a block, fine, but if this were a genuine RCBD, there would be treatments randomly assigned to 'units' within site, and that's not present here. So the data look like: MW site 1 54031 1 2 55286 1 3 54396 2 4 52327 2 5 55963 3 6 54893 3 7 57338 4 8 55597 4 : : : The random effect model is: Y = mu + b + e where b is random block effect and e is model error. so I fitted a lme model as: obj<-lme(MW~1, random=~1|site, data=dat) summary(obj) Linear mixed-effects model fit by REML Random effects: Formula: ~1 | site (Intercept) Residual StdDev: 2064.006 1117.567 Fixed effects: MW ~ 1 Value Std.Error DF t-value p-value (Intercept) 55901.31 1044.534 44 53.51796 0 : : Number of Observations: 48 Number of Groups: 4 I also did: anova(obj) numDF denDF F-value p-value (Intercept) 1 44 2864.173 <.0001 I believe my standard error estimate is from "Residual" under "Random Effects" part of summary(), which is 1117.567. Yes. Now my question is regarding t test under "Fixed effects". I think it's testing whether the over mean weight is 0 or not, which is not interesting anyway. But how the standard error of 1044.534 is calculated? I thought it should be sqrt(MSE)=1117.567 instead. anyone can explain? When the data are balanced, the population variance of \bar{y}.., the sample grand mean, is E(MSA)/N, where MSA is the between-site mean square and N is the total sample size (Searle, Casella and McCulloch, _Variance Components_, p. 54, formula (37) derived for the balanced data case - the corresponding ANOVA table, with expected mean squares, would be on p. 60). The plug-in estimate of E(MSA) is MSA = n * s^2(Intercept) + s^2(error) = 12 * (2064.006)^2 + 1117.567^2, where n = 12 = number of observations per site. The standard error for \bar{y}.. is then sqrt(MSA/N). Doing these calculations in R, xx <- 12 * (2064.006)^2 + (1117.567)^2 sqrt(xx/48) [1] 1044.533 which, within rounding error, is what lme() gives you in the test for fixed effects. Same goes to the F test using anova(obj). The F test statistic is equal to square of the t test statistic because of 1 df of numerator. But what's the mean sum of squares of numerator and denominator, where to find them? BTW, I think denominator mean sum of squares (MSE) should be 1117.567^2, but this is not consistent to the standard error in the t test (1044.534). lme() fits by ML or REML, so it doesn't output a conventional ANOVA table as part of the output. If you want to see the sums of squares and mean squares, use aov(). In the balanced one-way model, the observed df, SS and MS are the same in both the fixed effects and random effects models, but the expected mean square for treatments differs between the two models. HTH, Dennis Thanks a lot for any help John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] can't read excel file with read.xls()
Hi, I encountered a problem of not being able to read in an excel spreadsheet using read.xls() in the xlsReadWrite package. can anyone help? Here is an example code write.xls(matrix(1:9,nrow=3),"ttt.xls") read.xls("ttt.xls") Error in read.xls("ttt.xls") : Unexpected error. Message: Can't find the file "ttt.xls" The "ttt.xls" file was created in the current working folder successfully. Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can't read excel file with read.xls()
Thanks for the link, very useful. A simple fix to the code using normalizePath(), or full path name. read.xls(normalizePath("ttt.xls")) --- On Fri, 4/2/10, Gabor Grothendieck wrote: > From: Gabor Grothendieck > Subject: Re: [R] can't read excel file with read.xls() > To: "array chip" > Cc: r-h...@stat.math.ethz.ch > Date: Friday, April 2, 2010, 7:05 PM > Read about the bugs and workarounds > on this page: > http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows&s=excel > > On Fri, Apr 2, 2010 at 6:56 PM, array chip > wrote: > > Hi, I encountered a problem of not being able to read > in an excel spreadsheet using read.xls() in the xlsReadWrite > package. can anyone help? Here is an example code > > > > > > write.xls(matrix(1:9,nrow=3),"ttt.xls") > > read.xls("ttt.xls") > > Error in read.xls("ttt.xls") : > > Unexpected error. Message: Can't find the file > "ttt.xls" > > > > The "ttt.xls" file was created in the current working > folder successfully. > > > > Thanks > > > > John > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 3-D response surface using wireframe()
I am working on plotting a response surface using wireframe(). The default style/orientation is z | | y | \ | \ | \ | \| \ | \ | \ | \|x 0 Now what I want the orientation of axes is: z | | | | | /0\ / \ / \ / \ / \ / \ y z My understanding is that the screen=list(z=,y=,x=) control the orientation of axes, but even after reading the help page of screen argument, I still don't understand how to use it. screen: "A list determining the sequence of rotations to be applied to the data before being plotted. The initial position starts with the viewing point along the positive z-axis, and the x and y axes in the usual position. Each component of the list should be named one of "x", "y" or "z" (repititions are allowed), with their values indicating the amount of rotation about that axis in degrees." Can anyone explain to me how the screen argument works? And what values (x,y,z) I should choose for the orientation that I want? Another question is wireframe(0 will draw all 8 edges of the cubic by default, is there anyway that I can control what edges I can draw, what I can hide? thanks very much! John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3-D response surface using wireframe()
Thank you David. The document is very helpful. I had a typo in my 2nd example plot, one of the "z" is "x", "z" is the vertical one. Thanks John --- On Wed, 4/7/10, David Winsemius wrote: > From: David Winsemius > Subject: Re: [R] 3-D response surface using wireframe() > To: "array chip" > Cc: r-help@r-project.org > Date: Wednesday, April 7, 2010, 8:07 AM > A search with the following > strategy: > > RSiteSearch("lattice wireframe rotate axes") > > Followed by adding requests to search earlier years' > archives produced this link which has a further link to a > document that answers most of your questions, at least the > ones that are comprehensible: > > http://tolstoy.newcastle.edu.au/R/e2/help/07/03/12534.html > > --David. > > On Apr 6, 2010, at 7:12 PM, array chip wrote: > > > I am working on plotting a response surface using > wireframe(). The default style/orientation is > > > > z > > | > > | > > y | > > \ | > > \ | > > \ | > > \ | > > \ | > > \ | > > \ | > > \|x > > 0 > > > > Now what I want the orientation of axes is: > > > > > z > > > | > > > | > > > | > > > | > > > | > > > /0\ > > > / \ > > > / \ > > > / \ > > > / \ > > / > \ > > y > z > > Two z axes? How interesting! > > > > > My understanding is that the screen=list(z=,y=,x=) > control the orientation of axes, but even after reading the > help page of screen argument, I still don't understand how > to use it. > > > > screen: "A list determining the sequence of rotations > to be applied to the data before being plotted. The initial > position starts with the viewing point along the positive > z-axis, and the x and y axes in the usual position. Each > component of the list should be named one of "x", "y" or "z" > (repititions are allowed), with their values indicating the > amount of rotation about that axis in degrees." > > > > Can anyone explain to me how the screen argument > works? And what values (x,y,z) I should choose for the > orientation that I want? > > > > Another question is wireframe(0 will draw all 8 edges > of the cubic by default, is there anyway that I can control > what edges I can draw, what I can hide? > > > > thanks very much! > > > > John > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > David Winsemius, MD > West Hartford, CT > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3-D response surface using wireframe()
With the help document, i finally find a set of values of for x=,y= and z= in "screen" argument that gives me the correct rotation of the plot. But now it plots x and y axis (tick marks and labels) along the top of the plot. Is there one way to plot x and y axis on the bottom of the plot? Thanks John --- On Wed, 4/7/10, David Winsemius wrote: > From: David Winsemius > Subject: Re: [R] 3-D response surface using wireframe() > To: "array chip" > Cc: r-help@r-project.org > Date: Wednesday, April 7, 2010, 8:07 AM > A search with the following > strategy: > > RSiteSearch("lattice wireframe rotate axes") > > Followed by adding requests to search earlier years' > archives produced this link which has a further link to a > document that answers most of your questions, at least the > ones that are comprehensible: > > http://tolstoy.newcastle.edu.au/R/e2/help/07/03/12534.html > > --David. > > On Apr 6, 2010, at 7:12 PM, array chip wrote: > > > I am working on plotting a response surface using > wireframe(). The default style/orientation is > > > > z > > | > > | > > y | > > \ | > > \ | > > \ | > > \ | > > \ | > > \ | > > \ | > > \|x > > 0 > > > > Now what I want the orientation of axes is: > > > > > z > > > | > > > | > > > | > > > | > > > | > > > /0\ > > > / \ > > > / \ > > > / \ > > > / \ > > / > \ > > y > z > > Two z axes? How interesting! > > > > > My understanding is that the screen=list(z=,y=,x=) > control the orientation of axes, but even after reading the > help page of screen argument, I still don't understand how > to use it. > > > > screen: "A list determining the sequence of rotations > to be applied to the data before being plotted. The initial > position starts with the viewing point along the positive > z-axis, and the x and y axes in the usual position. Each > component of the list should be named one of "x", "y" or "z" > (repititions are allowed), with their values indicating the > amount of rotation about that axis in degrees." > > > > Can anyone explain to me how the screen argument > works? And what values (x,y,z) I should choose for the > orientation that I want? > > > > Another question is wireframe(0 will draw all 8 edges > of the cubic by default, is there anyway that I can control > what edges I can draw, what I can hide? > > > > thanks very much! > > > > John > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > David Winsemius, MD > West Hartford, CT > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3-D response surface using wireframe()
Scott, This is a good explanation and a good practice. Thank you, John --- On Thu, 4/8/10, Waichler, Scott R wrote: > From: Waichler, Scott R > Subject: Re: 3-D response surface using wireframe() > To: "arrayprof...@yahoo.com" > Cc: "r-help@r-project.org" > Date: Thursday, April 8, 2010, 9:51 AM > Regarding the screen argument in > wireframe(), here is what I understand about how it works > after much trial-and-error: > > After each rotation, new axes are in effect defined for the > next rotation as at the start: x is to the right of > the 2D view, y is towards the top, and z is positive out of > the page towards you. Understanding this "reset" of > coordinate system after each rotation is key to > understanding how the sequence of rotations will be > done. Rotations follow the right-hand rule: > positive angles follow curved fingers of right hand, with > thumb pointing in positive direction of associated axis. > > I labeled a wooden block with axes and turned it in my hand > to help me make the initial guess at the sequence of > rotations I would want for a given view. > > Scott Waichler > Pacific Northwest National Laboratory > P.O. Box 999, Richland, WA 99352 > scott.waich...@pnl.gov > 509-372-4423, 509-341-4051 (cell) > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3-D response surface using wireframe()
David, That does the job! Thanks a lot. Now I am very very close to what I want. Still have a couple of small adjustments to make. 1. I use drape=TRUE to draw grid and color on the surface, is there a parameter to adjust the density of the grid? 2. Is there a way that I can add grid to the axis surface? I mean the sides of the box, between x & y, between x & z, and between y & z? And I need to choose which 3 side of the box that I want to add grid? Thank you all for the help. It's fun to play with wireframe John --- On Wed, 4/7/10, David Winsemius wrote: > From: David Winsemius > Subject: Re: [R] 3-D response surface using wireframe() > To: "array chip" > Cc: r-help@r-project.org > Date: Wednesday, April 7, 2010, 9:22 PM > > On Apr 7, 2010, at 8:58 PM, array chip wrote: > > > With the help document, i finally find a set of values > of for x=,y= > > and z= in "screen" argument that gives me the correct > rotation of > > the plot. But now it plots x and y axis (tick marks > and labels) > > along the top of the plot. Is there one way to plot x > and y axis on > > the bottom of the plot? > > Look at the scpos argument to specify the scales location. > (Still > lacking an example and therrefore doing this from memory.) > > -- > David > > > > Thanks > > > > John > > > > --- On Wed, 4/7/10, David Winsemius > wrote: > > > >> From: David Winsemius > >> Subject: Re: [R] 3-D response surface using > wireframe() > >> To: "array chip" > >> Cc: r-help@r-project.org > >> Date: Wednesday, April 7, 2010, 8:07 AM > >> A search with the following > >> strategy: > >> > >> RSiteSearch("lattice wireframe rotate axes") > >> > >> Followed by adding requests to search earlier > years' > >> archives produced this link which has a further > link to a > >> document that answers most of your questions, at > least the > >> ones that are comprehensible: > >> > >> http://tolstoy.newcastle.edu.au/R/e2/help/07/03/12534.html > >> > >> --David. > >> > >> On Apr 6, 2010, at 7:12 PM, array chip wrote: > >> > >>> I am working on plotting a response surface > using > >> wireframe(). The default style/orientation is > >>> > >>> z > >>> | > >>> | > >>> y | > >>> \ | > >>> \ | > >>> \ | > >>> \ | > >>> \ | > >>> \ | > >>> \ | > >>> > \|x > >>> 0 > >>> > >>> Now what I want the orientation of axes is: > >>> > >>> > >> z > >>> > >> | > >>> > >> | > >>> > >> | > >>> > >> | > >>> > >> | > >>> > >> /0\ > >>> > >> / \ > >>> > >> / \ > >>> > >> / \ > >>> > >> / > \ > >>> > / > >> \ > >>> y > >> z > >> > >> Two z axes? How interesting! > >> > >>> > >>> My understanding is that the > screen=list(z=,y=,x=) > >> control the orientation of axes, but even after > reading the > >> help page of screen argument, I still don't > understand how > >> to use it. > >>> > >>> screen: "A list determining the sequence of > rotations > >> to be applied to the data before being plotted. > The initial > >> position starts with the viewing point along the > positive > >> z-axis, and the x and y axes in the usual > position. Each > >> component of the list should be named one of "x", > "y" or "z" > >> (repititions are allowed), with their values > indicating the > >> amount of rotation about that axis in degrees." > >>> > >>> Can anyone explain to me how the screen > argument > >> works? And what values (x,y,z) I should choose for > the > >> orientation that I want? > >>> > >>> Another question is wireframe(0 will draw all > 8 edges > >> of the cubic by default, is there anyway that I > can control > >> what edges I can draw, what I can hide? > >>> > >>> thanks very much! > >>> > >>> John > >>> > >>> > __ > >>> R-help@r-project.org > >> mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, > self-contained, > >> reproducible code. > >> > >> David Winsemius, MD > >> West Hartford, CT > >> > >> > > > > > > > > > > David Winsemius, MD > West Hartford, CT > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 3-D response surface using wireframe()
David, Thansk again! Sarkar's Lattice book is excellent source for lattice. Here is a link for all the figures and codes used in the book. You example is figure 13.7. http://lmdvr.r-forge.r-project.org/figures/figures.html I got the first point! For the second point below, Figure 13.7 an excellent example for projecting the 3D dataset onto the bounding surface, but it's not what I meant. I think I didn't explain what I wanted clearly. What I really wanted is a simple straight grid lines across the tick marks for 3 bounding surfaces of the box, not a projection of the 3D dataset. Hope I have explained clearly this time. Many thanks John --- On Thu, 4/8/10, David Winsemius wrote: > From: David Winsemius > Subject: Re: [R] 3-D response surface using wireframe() > To: "array chip" > Cc: r-help@r-project.org > Date: Thursday, April 8, 2010, 3:46 PM > > On Apr 8, 2010, at 3:13 PM, array chip wrote: > > > David, > > > > That does the job! Thanks a lot. > > > > Now I am very very close to what I want. Still have a > couple of > > small adjustments to make. > > > > 1. I use drape=TRUE to draw grid and color on the > surface, is there > > a parameter to adjust the density of the grid? > > If you mean the spacing between points, then isn't that > determined by > the density of the gridded data arguments before they get > to the > wireframe function? > > > > > 2. Is there a way that I can add grid to the axis > surface? I mean > > the sides of the box, between x & y, between x > & z, and between y & > > z? And I need to choose which 3 side of the box that I > want to add > > grid? > > See Figure 13.7 of Sarkar's Lattice text for an example of > a panel > function that collapses the contourLines of the volcano > dataset at the > top bounding surface by using ltransform3dto3d with a z > argument of > zlim.scaled[2]. I would think that a grid could be 3dto3d > transformed > similarly. > > -- > David. > > > > > Thank you all for the help. It's fun to play with > wireframe > > > > John > > > > --- On Wed, 4/7/10, David Winsemius > wrote: > > > >> From: David Winsemius > >> Subject: Re: [R] 3-D response surface using > wireframe() > >> To: "array chip" > >> Cc: r-help@r-project.org > >> Date: Wednesday, April 7, 2010, 9:22 PM > >> > >> On Apr 7, 2010, at 8:58 PM, array chip wrote: > >> > >>> With the help document, i finally find a set > of values > >> of for x=,y= > >>> and z= in "screen" argument that gives me the > correct > >> rotation of > >>> the plot. But now it plots x and y axis (tick > marks > >> and labels) > >>> along the top of the plot. Is there one way to > plot x > >> and y axis on > >>> the bottom of the plot? > >> > >> Look at the scpos argument to specify the scales > location. > >> (Still > >> lacking an example and therrefore doing this from > memory.) > >> > >> -- > >> David > >>> > >>> Thanks > >>> > >>> John > >>> > >>> --- On Wed, 4/7/10, David Winsemius > >> wrote: > >>> > >>>> From: David Winsemius > >>>> Subject: Re: [R] 3-D response surface > using > >> wireframe() > >>>> To: "array chip" > >>>> Cc: r-help@r-project.org > >>>> Date: Wednesday, April 7, 2010, 8:07 AM > >>>> A search with the following > >>>> strategy: > >>>> > >>>> RSiteSearch("lattice wireframe rotate > axes") > >>>> > >>>> Followed by adding requests to search > earlier > >> years' > >>>> archives produced this link which has a > further > >> link to a > >>>> document that answers most of your > questions, at > >> least the > >>>> ones that are comprehensible: > >>>> > >>>> http://tolstoy.newcastle.edu.au/R/e2/help/07/03/12534.html > >>>> > >>>> --David. > >>>> > >>>> On Apr 6, 2010, at 7:12 PM, array chip > wrote: > >>>> > >>>>> I am working on plotting a response > surface > >> using > >>>> wireframe(). The default style/orientation > is > >>>>> > >>>>> z
Re: [R] 3-D response surface using wireframe()
Hi David and Felix, Thank you very much for your suggestions. To be honest, this has become beyond my understanding of lattice plots now. I am relatively new to lattice plots, so have no idea how function within function works (for example, how does panel.3dpolygon() within panel.3d.wireframe() within wirefarme() works, totally have no clue. Felix, your example code of panel.3dpolygon() for volcano plot does what I want, but again, I don't know how to tweak your example to suit my case. I attached an example dataset, and an example of the plot that I wanted to make (especially those grid lines on the 3 bounding surfaces of the box, and if possible remove those front edges of the box to make it look like open). dat<-read.table("dat.txt",sep='\t',header=T,row.names=1) library(lattice) wireframe(z ~ x*y, data = dat, scales = list(arrows = FALSE, cex=0.9, col="black",font=3, tick.number=6, z=list(tick.number=10, tck=0.8,distance=0.8),x=list(tck=0.8,distance=0.6),y=list(tck=0.7,distance=0.6)), zlim=seq(-14,4,by=2), zlab=list(label="Z", rot=90,cex=0.9), xlab=list(label="X", rot=15.5), ylab=list(label="Y", rot=-33), drape = T, at=seq(min(dat$z),max(dat$z),length=50), col.regions=rgb(colorRamp(c("white", "red"))(seq(0, 1, length = 50)), max = 255), colorkey = F, par.box=list(lwd=2), ## line width of box screen = list(z = 210, x = -75, y = 5), scpos=list(x=9,y=5,z=2) ## where axes are draw ) Thank you all very much for the help. It's fun to learn. John --- On Thu, 4/8/10, Felix Andrews wrote: > From: Felix Andrews > Subject: Re: [R] 3-D response surface using wireframe() > To: "David Winsemius" > Cc: "array chip" , r-help@r-project.org > Date: Thursday, April 8, 2010, 9:56 PM > On 9 April 2010 11:18, David > Winsemius > wrote: > > On Apr 8, 2010, at 8:29 PM, array chip wrote: > > > >> David, > >> > >> Thansk again! Sarkar's Lattice book is excellent > source for lattice. Here > >> is a link for all the figures and codes used in > the book. You example is > >> figure 13.7. > >> > >> http://lmdvr.r-forge.r-project.org/figures/figures.html > >> > >> I got the first point! For the second point below, > Figure 13.7 an > >> excellent example for projecting the 3D dataset > onto the bounding surface, > >> but it's not what I meant. I think I didn't > explain what I wanted clearly. > >> What I really wanted is a simple straight grid > lines across the tick marks > >> for 3 bounding surfaces of the box, not a > projection of the 3D dataset. Hope > >> I have explained clearly this time. > > > > You have not convinced me that I misunderstood what > you wanted. I figured > > that you would use something other than transforming > the data driven contour > > lines. But if you want to use a lattice function there > is a panel.grid, but > > I still suspect it will need to be 3dto3d transformed > onto one of the "lim" > > extremes. > > Might be a little easier to use panel.3dpolygon from > latticeExtra. > (or not) > e.g. something like > > wireframe(volcano, drape = TRUE, scales = list(arrows = > FALSE), > panel.3d.wireframe = function(x,y,z,...) { > panel.3dwire(x,y,z,...) > panel.3dpolygon(x = rep(pretty(x), each = 3), > y = min(y), z = > c(range(z),NA), > > ..., border="grey", lwd=2) > }) > > > > > >> > >> Many thanks > >> > >> John > >> > >> > >> --- On Thu, 4/8/10, David Winsemius > wrote: > >> > >>> From: David Winsemius > >>> Subject: Re: [R] 3-D response surface using > wireframe() > >>> To: "array chip" > >>> Cc: r-help@r-project.org > >>> Date: Thursday, April 8, 2010, 3:46 PM > >>> > >>> On Apr 8, 2010, at 3:13 PM, array chip wrote: > >>> > >>>> David, > >>>> > >>>> That does the job! Thanks a lot. > >>>> > >>>> Now I am very very close to what I want. > Still have a > >>> > >>> couple of > >>>> > >>>> small adjustments to make. > >>>> > >>>> 1. I use drape=TRUE to draw grid and color > on the > >>> > >>> surface, is there > >>>> > >>>> a parameter to adjust the density of the > grid? > >>> > >>> If you mean the spacing between points, then > isn't that > >>> determined by >
Re: [R] 3-D response surface using wireframe()
Sorry the example plot didn't go through last time, here it is: Thanks John --- On Fri, 4/9/10, array chip wrote: > From: array chip > Subject: Re: [R] 3-D response surface using wireframe() > To: "David Winsemius" , "Felix Andrews" > > Cc: r-help@r-project.org > Date: Friday, April 9, 2010, 1:09 PM > Hi David and Felix, > > Thank you very much for your suggestions. To be honest, > this has become beyond my understanding of lattice plots > now. I am relatively new to lattice plots, so have no idea > how function within function works (for example, how does > panel.3dpolygon() within panel.3d.wireframe() within > wirefarme() works, totally have no clue. > > Felix, your example code of panel.3dpolygon() for volcano > plot does what I want, but again, I don't know how to tweak > your example to suit my case. > > I attached an example dataset, and an example of the plot > that I wanted to make (especially those grid lines on the 3 > bounding surfaces of the box, and if possible remove those > front edges of the box to make it look like open). > > dat<-read.table("dat.txt",sep='\t',header=T,row.names=1) > > library(lattice) > wireframe(z ~ x*y, data = dat, > scales = list(arrows = FALSE, cex=0.9, col="black",font=3, > tick.number=6, z=list(tick.number=10, > tck=0.8,distance=0.8),x=list(tck=0.8,distance=0.6),y=list(tck=0.7,distance=0.6)), > zlim=seq(-14,4,by=2), > zlab=list(label="Z", rot=90,cex=0.9), > xlab=list(label="X", rot=15.5), > ylab=list(label="Y", rot=-33), > drape = T, > at=seq(min(dat$z),max(dat$z),length=50), > col.regions=rgb(colorRamp(c("white", "red"))(seq(0, 1, > length = 50)), max = 255), > colorkey = F, > par.box=list(lwd=2), ## line width of box > screen = list(z = 210, x = -75, y = 5), > scpos=list(x=9,y=5,z=2) ## where axes are draw > ) > > Thank you all very much for the help. It's fun to learn. > > John > > --- On Thu, 4/8/10, Felix Andrews > wrote: > > > From: Felix Andrews > > Subject: Re: [R] 3-D response surface using > wireframe() > > To: "David Winsemius" > > Cc: "array chip" , > r-help@r-project.org > > Date: Thursday, April 8, 2010, 9:56 PM > > On 9 April 2010 11:18, David > > Winsemius > > wrote: > > > On Apr 8, 2010, at 8:29 PM, array chip wrote: > > > > > >> David, > > >> > > >> Thansk again! Sarkar's Lattice book is > excellent > > source for lattice. Here > > >> is a link for all the figures and codes used > in > > the book. You example is > > >> figure 13.7. > > >> > > >> http://lmdvr.r-forge.r-project.org/figures/figures.html > > >> > > >> I got the first point! For the second point > below, > > Figure 13.7 an > > >> excellent example for projecting the 3D > dataset > > onto the bounding surface, > > >> but it's not what I meant. I think I didn't > > explain what I wanted clearly. > > >> What I really wanted is a simple straight > grid > > lines across the tick marks > > >> for 3 bounding surfaces of the box, not a > > projection of the 3D dataset. Hope > > >> I have explained clearly this time. > > > > > > You have not convinced me that I misunderstood > what > > you wanted. I figured > > > that you would use something other than > transforming > > the data driven contour > > > lines. But if you want to use a lattice function > there > > is a panel.grid, but > > > I still suspect it will need to be 3dto3d > transformed > > onto one of the "lim" > > > extremes. > > > > Might be a little easier to use panel.3dpolygon from > > latticeExtra. > > (or not) > > e.g. something like > > > > wireframe(volcano, drape = TRUE, scales = list(arrows > = > > FALSE), > > panel.3d.wireframe = function(x,y,z,...) { > > panel.3dwire(x,y,z,...) > > panel.3dpolygon(x = rep(pretty(x), each = 3), > > y = min(y), z = > > c(range(z),NA), > > > > ..., border="grey", lwd=2) > > }) > > > > > > > > > >> > > >> Many thanks > > >> > > >> John > > >> > > >> > > >> --- On Thu, 4/8/10, David Winsemius > > wrote: > > >> > > >>> From: David Winsemius > > >>> Subject: Re: [R] 3-D response surf
Re: [R] 3-D response surface using wireframe()
David, Thanks for the 2 previous posts from Sarkar. Actually, I am now one step closer. I am now able to remove the 3 outer lines of the bounding box using par.box argument, even Sarkar said in his 2008 post that par.box() does not control different boundaries, so maybe it was fixed. Replacing "par.box=list(lwd=2)" in my original code with "par.box=list(lwd=2,col=c(1,1,1,NA,1,1,NA,NA,1))" will now remove the 3 outer lines of the bounding box. The only thing missing here is the 3 inner lines of the box (behind the plot) are dashed lines, not solid. And par.box argument only control those 9 visible lines of the bounding box. As for how to draw grid lines onto the 3 surfaces, I still have no clue. But as you pointed out Sarkar indicated in his 2007 post that it might be possible. Thanks John --- On Fri, 4/9/10, David Winsemius wrote: > From: David Winsemius > Subject: Re: [R] 3-D response surface using wireframe() > To: "array chip" > Cc: r-help@r-project.org > Date: Friday, April 9, 2010, 3:48 PM > I do not think the mail server > accepts .jpg formats which was the > format in which I got your attachment the first time > (because of your > having copied me directly.) I don't see much need to > send a pdf > because the code you offered does work and the data made it > through > (because .txt and .pdf are types that the mailserver > accepts.) > > Back in 2007 Sarkar suggested that it would be possible to > project > grids on the walls of the bounding box but since the > original poster > did not reply, it appears Sarkar did not deliver a worked > solution. > > http://finzi.psych.upenn.edu/R/Rhelp02/archive/95759.html > > And then in 2008 he referred the questioner to the section > of the > Lattice examples I earlier cited: > > http://finzi.psych.upenn.edu/Rhelp10/2008-October/176466.html > > -- > David. > > On Apr 9, 2010, at 3:27 PM, array chip wrote: > > > Sorry the example plot didn't go through last time, > here it is: > > > > Thanks > > > > John > > > > --- On Fri, 4/9/10, array chip > wrote: > > > >> From: array chip > >> Subject: Re: [R] 3-D response surface using > wireframe() > >> To: "David Winsemius" , > "Felix Andrews" >> > > >> Cc: r-help@r-project.org > >> Date: Friday, April 9, 2010, 1:09 PM > >> Hi David and Felix, > >> > >> Thank you very much for your suggestions. To be > honest, > >> this has become beyond my understanding of lattice > plots > >> now. I am relatively new to lattice plots, so have > no idea > >> how function within function works (for example, > how does > >> panel.3dpolygon() within panel.3d.wireframe() > within > >> wirefarme() works, totally have no clue. > >> > >> Felix, your example code of panel.3dpolygon() for > volcano > >> plot does what I want, but again, I don't know how > to tweak > >> your example to suit my case. > >> > >> I attached an example dataset, and an example of > the plot > >> that I wanted to make (especially those grid lines > on the 3 > >> bounding surfaces of the box, and if possible > remove those > >> front edges of the box to make it look like > open). > >> > >> > dat<-read.table("dat.txt",sep='\t',header=T,row.names=1) > >> > >> library(lattice) > >> wireframe(z ~ x*y, data = dat, > >> scales = list(arrows = FALSE, cex=0.9, > col="black",font=3, > >> tick.number=6, z=list(tick.number=10, > >> tck > >> = > >> 0.8 > >> ,distance > >> > =0.8),x=list(tck=0.8,distance=0.6),y=list(tck=0.7,distance=0.6)), > >> zlim=seq(-14,4,by=2), > >> zlab=list(label="Z", rot=90,cex=0.9), > >> xlab=list(label="X", rot=15.5), > >> ylab=list(label="Y", rot=-33), > >> drape = T, > >> at=seq(min(dat$z),max(dat$z),length=50), > >> col.regions=rgb(colorRamp(c("white", > "red"))(seq(0, 1, > >> length = 50)), max = 255), > >> colorkey = F, > >> par.box=list(lwd=2), ## line width of box > >> screen = list(z = 210, x = -75, y = 5), > >> scpos=list(x=9,y=5,z=2) ## where axes are draw > >> ) > >> > >> Thank you all very much for the help. It's fun to > learn. > >> > >> John > >> > >> --- On Thu, 4/8/10, Felix Andrews > >> wrote: > >> > >>> From
[R] smooth.spline() fucntion
Hi, all, I found that the smooth.spline() function produces different results between R and S-Plus. I was trying to play different parameters of the function without any success. The script of the function contains Fortran code, so it seems impossible to port the code from S-Plus to R (or I may be wrong). Can anyone suggest anything so that I can produce the same result in R as in S-plus? Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sparse PCA using nsprcomp package
Hi all, I am using nsprcomp() from nsprcomp package to run sparse PCA. The output is very much like regular PCA by prcomp() in that it provides "sdev" for standard deviation of principle components (PC). For regular PCA by prcomp(), we can easily calculate the percent of total variance explained by the first k PCs by using cumsum(obj$sdev^2) because these PCs are independent of each other so you can simply add up the variance of these PCs. For sparse PCA, as far as I understand, the generated PCs are not independent of each other anymore, so you can not simply add up variances to calculate percentage of variance explained by the first k PCs. For example, in the package of elasticnet where spca() also performs sparse PCA, one of the output from spca() is "pev" for percent explained variation which is based on so-called "adjusted" variance that adjusted for the fact that these variances of PCs are not independent anymore. My question is for nsprcomp, how can I calculate percent explained variation by using "sdev" when I know these PCs are not independent of each other? Thanks! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sparse PCA using nsprcomp package
HI Christian, Thanks so much for the detailed explanation! I look forward to the new release of nsprcomp package! At the meantime, I will use the function below for calculation of "adjusted" standard deviation. I have 2 more questions, hope you can shed some lights on: 1). Assume now I can calculate these "adjusted" standard deviation from sparse PCA, should the percent variation explained by each sparse PC be calculated using the sum of all these "adjusted" variance (i.e. square of the "adjusted" standard deviation) as the denominator (then these percent variation explained will always add up to 1 if all sparse PCs are counted, or using the sum of the PC variances estimated by REGULAR PCA as the denominator (then, adding up all PCs may not be equal to 1)? 2). How do you choose the 2 important parameters in nsprcomp(), ncomp and k? If for example, my regular PCA showed that I need 20 PCs to account for 80% of the variation in my dataset, does it mean I should set ncomp=20? And then what about any rules setting the value of "k"? 3). Would you recommend nscumcomp() or nsprcomp() in general? Thanks so much again, John From: Christian Sigg Cc: r-help@r-project.org Sent: Thursday, September 5, 2013 2:43 PM Subject: Re: [R] sparse PCA using nsprcomp package Hi John I am currently traveling and have sporadic net access, I therefore can only answer briefly. It's also quite late, I hope what follows still makes sense... > For regular PCA by prcomp(), we can easily calculate the percent of total > variance explained by the first k PCs by using cumsum(obj$sdev^2) because > these PCs are independent of each other so you can simply add up the variance > of these PCs. For sparse PCA, as far as I understand, the generated PCs are > not independent of each other anymore, so you can not simply add up variances > to calculate percentage of variance explained by the first k PCs. For > example, in the package of elasticnet where spca() also performs sparse PCA, > one of the output from spca() is "pev" for percent explained variation which > is based on so-called "adjusted" variance that adjusted for the fact that > these variances of PCs are not independent anymore. You are correct that measuring explained variance is more involved with sparse (or non-negative) PCA, because the principal axes no longer correspond to eigenvectors of the covariance matrix, and are usually not even orthogonal. The next update for the 'nsprcomp' package is almost done, and one of the changes will concern the reported standard deviations. In the current version (0.3), the standard deviations are computed from the scores matrix X*W, where X is the data matrix and W is the (pseudo-)rotation matrix consisting of the sparse loadings. Computing variance this way has the advantage that 'sdev' is consistent with the scores matrix, but it has the disadvantage that some of the explained variance is counted more than once because of the non-orthogonality of the principal axes. One of the symptoms of this counting is that the variance of a later component can actually exceed the variance of an earlier component, which is not possible in regular PCA. In the new version of the package, 'sdev' will report the _additional_ standard deviation of each component, i.e. the variance not explained by the previous components. Given a basis of the space spanned by the previous PAs, the variance of the PC is computed after projecting the current PA to the ortho-complement space of the basis. This procedure reverts back to standard PCA if no sparsity or non-negativity constraints are enforced on the PAs. > My question is for nsprcomp, how can I calculate percent explained variation > by using "sdev" when I know these PCs are not independent of each other? The new version of the package will do it for you. Until then, you can use something like the following function asdev <- function(X, W) { nc <- ncol(W) sdev <- numeric(nc) Q <- qr.Q(qr(W)) Xp <- X for (cc in seq_len(nc)) { sdev[cc] <- sd(Xp%*%W[ , cc]) Xp <- Xp - Xp%*%Q[ , cc]%*%t(Q[ , cc]) } return(sdev) } to compute the additional variances for given X and W. The package documentation will explain the above in some more detail, and I will also have a small blog post which compares the 'nsprcomp' and 'spca' routine from the 'elasticnet' package on the 'marty' data from the EMA package. Best regards Christian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sparse PCA using nsprcomp package
Hi Christian, Thank you so much for sharing your thoughts, I was a real pleasure to read and learn! Approximately when do you expect the new release of the package? Best, John From: Christian Sigg Cc: "r-help@r-project.org" Sent: Monday, September 9, 2013 8:06 AM Subject: Re: [R] sparse PCA using nsprcomp package Hi John > 1). Assume now I can calculate these "adjusted" standard deviation from > sparse PCA, should the percent variation explained by each sparse PC be > calculated using the sum of all these "adjusted" variance (i.e. square of the > "adjusted" standard deviation) as the denominator (then these percent > variation explained will always add up to 1 if all sparse PCs are counted, or > using the sum of the PC variances estimated by REGULAR PCA as the denominator > (then, adding up all PCs may not be equal to 1)? It depends on what you want to do with this percentage, but to me the second would be more meaningful. A sparse PCA will usually be truncated (fewer than all possible components are computed), and due to the additional constraints on the principal axes you will usually explain less variance than with standard PCA. I would want to know what I lose in a sparse PCA w.r.t. a standard PCA. Note that you don't actually have to compute the standard PCA if you are only interested in the total variance of the data, i.e. the sum of all variances. The total variance 1/(n-1)*sum(diag(t(X)%*%X)) for the zero-mean data matrix X is invariant to a rotation of the coordinate system and therefore identical to Z <- X%*%W 1/(n-1)*sum(diag(t(Z)%*%Z)) so you can skip computing the PCA rotation matrix W. The fastest way to compute the total variance is probably 1/(n-1)*sum(X^2) because all expressions compute the squared Frobenius norm of X. If you want to compare variances of individual components, then compute a regular PCA. I also had a look how the spca function computes the "percentage explained variation". I don't yet entirely understand what is going on, but the results differ from using the "asdev" function I mentioned in my previous reply. Keep that in mind if you want to compare nsprcomp to spca. > 2). How do you choose the 2 important parameters in nsprcomp(), ncomp and k? > If for example, my regular PCA showed that I need 20 PCs to account for 80% > of the variation in my dataset, does it mean I should set ncomp=20? And then > what about any rules setting the value of "k"? I don't have any hard answers for this question. There are a number of heuristics for choosing the number of components in regular PCA (e.g. the PCA book by Jolliffe presents several), and some of them should translate to sparse PCA. If you think that 20 PCs or 80% explained variance works well for regular PCA, I suggest also using 20 components in sparse PCA, then measure the explained variance and then increase the number of components (if necessary) to again achieve 80% explained variance. Same for setting the cardinality parameter k. You could use a criterion such as BIC to optimize the trade-off between model fidelity and complexity, but I don't have any experience how well this works in practice. What I did so far was to check for loadings with small magnitudes. I set k such that all loadings have "substantial" magnitudes. In the end what matters is what follows after running the algorithm. Are you directly interpreting the sparse PCA result, or is this an intermediate step in a complete data processing pipeline with a measurable goal? If the latter, choose the algortihm parameters that give you the best results at the end of the pipeline. > 3). Would you recommend nscumcomp() or nsprcomp() in general? It depends whether you want to perform a sequential or cumulative analysis of your data. If you want maximum variance in the first (second, third, ...) PC, and specify the precise cardinality of each principal axis, then use nsprcomp. If instead you want to only specify the total cardinality of all loadings and leave the distribution of non-zero loadings to the algorithm, use nscumcomp. There will be substantial improvements to nscumcomp in the next release, if you want to use it I suggest you wait until then. Regards Christian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sort matrix based on a specific order
Hi I have a character matrix with 2 columns A and B, If I want to sort the matrix based on the column B, but based on a specific order of characters: mat<-cbind(c('w','x','y','z'),c('a','b','c','d')) ind<-c('c','b','d','a') I want "mat" to be sorted by the sequence in "ind": [,1] [,2] [1,] "y" "c" [2,] "x" "b" [3,] "z" "d" [4,] "w" "a" Is there any simple function that can do this? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort matrix based on a specific order
Thank you all for the suggestions, fantastic! From: Rui Barradas Cc: "r-help@r-project.org" Sent: Thursday, January 10, 2013 10:32 AM Subject: Re: [R] sort matrix based on a specific order Hello, Try the following. order() gives you a permutation of the vector 'ind' and to order that permutation gives its inverse. mat <- cbind(c('w','x','y','z'),c('a','b','c','d')) ind <- c('c','b','d','a') ord <- order(ind) mat[order(ord), ] Hope this helps, Rui Barradas Em 10-01-2013 18:21, array chip escreveu: Hi I have a character matrix with 2 columns A and B, If I want to sort the matrix based on the column B, but based on a specific order of characters: mat<-cbind(c('w','x','y','z'),c('a','b','c','d')) ind<-c('c','b','d','a') I want "mat" to be sorted by the sequence in "ind": [,1] [,2] [1,] "y" "c" [2,] "x" "b" [3,] "z" "d" [4,] "w" "a" Is there any simple function that can do this? Thanks John [[alternative HTML version deleted]] > > >__ R-help@r-project.org mailing >list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting >guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weird merge()
Hi James, I am trying to combine 11 data frames by column, not by row. My original message has 11 data text files attached, did they go through so you can try my codes? Thanks John From: J Toll Cc: "r-help@r-project.org" Sent: Friday, January 11, 2013 1:35 PM Subject: Re: [R] weird merge() Hi, > >I have some protein array data, each array in a separate text file. So I read >them in and try to combine them into a single data frame by using merge(). see >code below (If you download the attached data files into a specific folder, >the code below should work): > > >fls<-list.files("C:\\folder_of_download",full.names=T) ## get file names >prot<-list() ## a list to contain individual files >ind<-1 >for (i in fls[c(1:11)]) { > cat(ind, " ") > > tmp<-read.delim(i,header=T,row.names=NULL,na.string='null') > colnames(tmp)[4]<-as.character(tmp$barcode[1]) > prot[[ind]]<-tmp[,-(1:2)] > ind<-ind+1 >} > > ## try to merge them together > ## not do this in a loop so I can see where the problem occurs >pro<-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[3]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[4]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[5]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[6]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[7]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[8]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[9]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[10]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[11]],by.x=1,by.y=1,all=T) > > >I noticed that starting file #8, the merge become more and more slower that >when it's file #11, the computer was stuck! Originally I thought something >wrong with the later files, but when I change the order of merging, the >slow-down still happens at the 8th files to be merged. > >Can anyone suggest what's going on with merging? > > I'm not sure exactly what you're trying to do with all that code, but if you're just trying to get all eleven files into a data.frame, you could do this: allFilesAsList <- lapply(1:11, function(i) read.delim(paste("p", i, ".txt", sep = ""))) oneBigDataFrame <- do.call(rbind, allFilesAsList) You may need to fix the column names. Is that anything like what you were trying to do? James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weird merge()
Hi Dennis, Actually, I am trying to combine them by COLUMN, so that's why I am using merge(). The first loop was to simply read these protein data into R as 11 data frames, each data frame is 165 x 2. Then I use merge() to combine these data frames into 1 big data frame by column with these individual merge() statements. I didn't do it in a loop because I want to see at what point the merge() will start to fail. And it turns out the merge of the first 7 data frames is working fine. Starting from the 8th column, it becomes more and more slow and at the 11th data frame it appeared stuck on my computer. Thanks John From: Dennis Murphy Sent: Friday, January 11, 2013 1:25 PM Subject: Re: [R] weird merge() Hi John: This doesn't look right. What are you trying to do? [BTW, the variable names in the attachments have spaces, so most of R's read functions should choke on them. At the very least, replace the spaces with underscores.] If all you are trying to do is row concatenate them (since the two or three I looked at appear to have the same structure), then it's as simple as pro <- do.call(rbind, prot) If this is what you want along with an indicator for each data file, then the ldply() function in the plyr package might be useful as an alternative to do.call. It should return an additional variable .id whose value corresponds to the number (or name) of the list component. library(plyr) pro2 <- ldply(prot, rbind) If you want something different, then be more explicit about what you want, because your merge() code doesn't make a lot of sense to me. Dennis PS: Just a little hint: if you're using (almost) the same code repeatedly, there's probably a more efficient way to do it in R. CS types call this the DRY principle: Don't Repeat Yourself. I know you know this, but a little reminder doesn't hurt :) > Hi, > > I have some protein array data, each array in a separate text file. So I read > them in and try to combine them into a single data frame by using merge(). > see code below (If you download the attached data files into a specific > folder, the code below should work): > > > fls<-list.files("C:\\folder_of_download",full.names=T) ## get file names > prot<-list() ## a list to contain individual files > ind<-1 > for (i in fls[c(1:11)]) { > cat(ind, " ") > > tmp<-read.delim(i,header=T,row.names=NULL,na.string='null') > colnames(tmp)[4]<-as.character(tmp$barcode[1]) > prot[[ind]]<-tmp[,-(1:2)] > ind<-ind+1 > } > > ## try to merge them together > ## not do this in a loop so I can see where the problem occurs > pro<-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[3]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[4]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[5]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[6]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[7]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[8]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[9]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[10]],by.x=1,by.y=1,all=T) > pro<-merge(pro,prot[[11]],by.x=1,by.y=1,all=T) > > > I noticed that starting file #8, the merge become more and more slower that > when it's file #11, the computer was stuck! Originally I thought something > wrong with the later files, but when I change the order of merging, the > slow-down still happens at the 8th files to be merged. > > Can anyone suggest what's going on with merging? > > Thanks > > John > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weird merge()
I just figured out the reason was the column (the 1st column in each data frame "gene.name") by which to merge each data frame has no unique values, some values were repeated, so when merging, the data frame gets bigger and bigger exponentially. Sorry to bother all. John From: J Toll Cc: "r-help@r-project.org" Sent: Friday, January 11, 2013 1:35 PM Subject: Re: [R] weird merge() Hi, > >I have some protein array data, each array in a separate text file. So I read >them in and try to combine them into a single data frame by using merge(). see >code below (If you download the attached data files into a specific folder, >the code below should work): > > >fls<-list.files("C:\\folder_of_download",full.names=T) ## get file names >prot<-list() ## a list to contain individual files >ind<-1 >for (i in fls[c(1:11)]) { > cat(ind, " ") > > tmp<-read.delim(i,header=T,row.names=NULL,na.string='null') > colnames(tmp)[4]<-as.character(tmp$barcode[1]) > prot[[ind]]<-tmp[,-(1:2)] > ind<-ind+1 >} > > ## try to merge them together > ## not do this in a loop so I can see where the problem occurs >pro<-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[3]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[4]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[5]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[6]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[7]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[8]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[9]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[10]],by.x=1,by.y=1,all=T) >pro<-merge(pro,prot[[11]],by.x=1,by.y=1,all=T) > > >I noticed that starting file #8, the merge become more and more slower that >when it's file #11, the computer was stuck! Originally I thought something >wrong with the later files, but when I change the order of merging, the >slow-down still happens at the 8th files to be merged. > >Can anyone suggest what's going on with merging? > > I'm not sure exactly what you're trying to do with all that code, but if you're just trying to get all eleven files into a data.frame, you could do this: allFilesAsList <- lapply(1:11, function(i) read.delim(paste("p", i, ".txt", sep = ""))) oneBigDataFrame <- do.call(rbind, allFilesAsList) You may need to fix the column names. Is that anything like what you were trying to do? James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error installing KEGGSOAP
Hi, I am new to bioconductor, trying to install KEGGSOAP package, but got warnings() when installing and error message when trying to load the package, can anyone suggest what went wrong? many thanks John > source("http://bioconductor.org/biocLite.R";) Bioconductor version 2.11 (BiocInstaller 1.8.3), ?biocLite for help > biocLite("KEGGSOAP") BioC_mirror: http://bioconductor.org Using Bioconductor version 2.11 (BiocInstaller 1.8.3), R version 2.15. Installing package(s) 'KEGGSOAP' trying URL 'http://bioconductor.org/packages/2.11/bioc/bin/windows/contrib/2.15/KEGGSOAP_1.32.0.zip' Content type 'application/zip' length 69037 bytes (67 Kb) opened URL downloaded 67 Kb package âKEGGSOAPâ successfully unpacked and MD5 sums checked The downloaded binary packages are in        C:\Users\yzhang\AppData\Local\Temp\RtmpawAjwx\downloaded_packages Warning message: installed directory not writable, cannot update packages 'acepack', 'actuar', 'ada', 'ade4', 'ade4TkGUI',  'agricolae', 'akima', 'ape', 'aplpack', 'arules', 'bitops', 'boot', 'Cairo', 'car', 'caTools', 'cba',  'chron', 'class', 'clue', 'cluster', 'coda', 'colorspace', 'CompQuadForm', 'corpcor', 'DAAG', 'date',  'deldir', 'descr', 'deSolve', 'devtools', 'digest', 'diptest', 'doBy', 'DoE.wrapper', 'e1071', 'effects',  'ENmisc', 'epiR', 'eRm', 'evaluate', 'evd', 'FactoMineR', 'fArma', 'fAssets', 'fBasics', 'fdrtool',  'fExoticOptions', 'fExtremes', 'fGarch', 'fields', 'flexclust', 'fMultivar', 'fNonlinear', 'fOptions',  'forecast', 'foreign', 'fpc', 'fracdiff', 'fRegression', 'FrF2', 'FrF2.catlg128', 'fTrading',  'fUnitRoots', 'gamlss', 'gamlss.data', 'gamlss.dist', 'gclus', 'gdata', 'geoR', 'GGally', 'ggm',  'ggplot2', 'gmodels', 'gridBase', 'gWidgets', 'gWidgetstcltk', 'HH', 'Hmisc', 'httr', 'igraph',  'igraph0', 'inline', 'ipred', 'isa2', 'JavaGD', 'JGR', 'kernlab', 'KernSmoot [... truncated] > library(KEGGSOAP) Loading required package: BiocGenerics Attaching package: âBiocGenericsâ The following object(s) are masked from âpackage:statsâ:    xtabs The following object(s) are masked from âpackage:baseâ:    anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, mget, order, paste,    pmax, pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, rownames, sapply, setdiff, table, tapply, union, unique failed to load HTTP resource Error : .onLoad failed in loadNamespace() for 'KEGGSOAP', details:  call: NULL  error: 1: failed to load HTTP resource Error: package/namespace load failed for âKEGGSOAPâ > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252  [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                         [5] LC_TIME=English_United States.1252   attached base packages: [1] stats    graphics grDevices datasets utils    methods  base    other attached packages: [1] BiocGenerics_0.4.0 BiocInstaller_1.8.3 loaded via a namespace (and not attached): [1] codetools_0.2-8 RCurl_1.91-1.1 SSOAP_0.8-0    tools_2.15.1   XML_3.9-4.1    XMLSchema_0.7-2 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cnfidence intervals for survfit()
Hi, I am wondering how the confidence interval for Kaplan-Meier estimator is calculated by survfit(). For example, > summary(survfit(Surv(time,status)~1,data),times=10) Call: survfit(formula = Surv(rtime10, rstat10) ~ 1, data = mgi) time n.risk n.event survival std.err lower 95% CI upper 95% CI 10 168 55 0.761 0.0282 0.707 0.818 I am trying to reproduce the upper and lower CI by using standard error. As far I understand, the default method for survfit() to calculate confidence interval is on the log survival scale, so: upper CI = exp(log(0.761)+qnorm(0.975)*0.0282) = 0.804 lower CI = exp(log(0.761)-qnorm(0.975)*0.0282) = 0.720 they are not the same as the output from survfit(). Am I missing something? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] meta-analysis for sensitivity
Hi all, I am new to meta-analysis. Is there any special package that can calculate "summarized" sensitivity with 95% confidence interval for a diagnostic test, based on sensitivities from several individual studies? Thanks for any suggestions. John From: Jannis To: "r-help@r-project.org" Sent: Monday, April 8, 2013 11:12 AM Subject: [R] checkUsage from codetools shows errors when function uses functions from loaded packages Dear list members, I frequently program small scripts and wrap them into functions to be able to check them with checkUsage. In case these functions (loaded via source or copy pasted to the R console) use functions from other packages, I get this error: no visible global function definition for âxxxâ For example: test = function() {  require(plotrix)  color.legend() } library(codetools) checkUsage(test) Can I tell codetools somehow where to look for these functions without building a full blown package? Cheers Jannis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survival estimates for covariate values
Hi all, let's say we can fit a Cox model with a numeric variable "x" as the independent variable. The we can calculate, say 10-year survival, for any given value of "x" (0 to 10 in increment of 0.1 in the example below): > fit <- coxph(Surv(time, event)~x,dat) > surv10yr<- summary(survfit(fit,newdata=data.frame(x=seq(0,10,0.1))),times=10) I am wondering if anyone can show me how to replicate this in SAS? very much appreciated! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survival estimates for covariate values
Hi all, let's say we can fit a Cox model with a numeric variable "x" as the independent variable. The we can calculate, say 10-year survival, for any given value of "x" (0 to 10 in increment of 0.1 in the example below): > fit <- coxph(Surv(time, event)~x,dat) > surv10yr<- summary(survfit(fit,newdata=data.frame(x=seq(0,10,0.1))),times=10) I am wondering if anyone can show me how to replicate this in SAS? very much appreciated! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] prediction based on conditional logistic regression clogit
Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example: > dat <- data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), > x1=rnorm(150,5,1), x2=rnorm(150,7,1.5)) > dat.test <- data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), > x1=rnorm(90,5,1), x2=rnorm(90,7,1.5)) > fit<-clogit(status~x1+x2+strata(set),dat) > predict(fit,newdata=dat.test,type='expected') Error in Surv(rep(1, 150L), status) : Time and status are different lengths Can anyone suggest what's wrong here? Thanks! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction based on conditional logistic regression clogit
Thank you Peter. Any other suggestions are absolutely welcome!! John From: peter dalgaard Cc: "r-help@r-project.org" Sent: Monday, June 16, 2014 2:22 AM Subject: Re: [R] prediction based on conditional logistic regression clogit > Hi, I am using clogit() from survival package to do conditional logistic > regression. I also need to make prediction on an independent dataset to > calculate predicted probability. Here is an example: > > >> dat <- data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), >> x1=rnorm(150,5,1), x2=rnorm(150,7,1.5)) >> dat.test <- data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), >> x1=rnorm(90,5,1), x2=rnorm(90,7,1.5)) >> fit<-clogit(status~x1+x2+strata(set),dat) >> predict(fit,newdata=dat.test,type='expected') > Error in Surv(rep(1, 150L), status) : > Time and status are different lengths > > Can anyone suggest what's wrong here? > The direct cause is that clogit() works by using the fact that the likelihood is equivalent to a coxph() likelihood with stratification and all observation lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), status) and that goes belly-up if you apply the same formula to a data set of different length. However, notice that there is no such thing as predict.clogit(), so you are attempting predict.coxph() on a purely formal Cox model. It is unclear to what extent predicted values, in the sense of coxph() are compatible with predictions in conditional logit models. I'm rusty on this, but I think what you want is something like m <- model.matrix(~ x1 + x2 - 1, data=dat.test) pp <- exp(m %*% coef(fit)) pps <- ave(pp, dat.test$set, FUN=sum) pp/pps i.e. the conditional probability that an observation is a case given covariates and that there is on case in each set (in the data given, you have sets of three with one being a case, so all predicted probabilities are close to 0.33). For more general matched sets, I'm not really sure what one wants. Real experts are welcome to chime in. -pd > Thanks! > > John > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] standard error of survfit.coxph()
Hi, can anyone help me to understand the standard errors printed in the output of survfit.coxph()? time<-sample(1:15,100,replace=T) status<-as.numeric(runif(100,0,1)<0.2) x<-rnorm(100,10,2) fit<-coxph(Surv(time,status)~x) ### method 1 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$std.err [,1] [,2] [,3] [,4] [,5] [1,] 0.0 0.0 0.0 0.0 0. [2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 [3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 [4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 [5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 [6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 [7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 [8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 [9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394 [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413 [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433 [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860 [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$time [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ### method 2 summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log'),time=10)$std.err 1 2 3 4 5 [1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532 By reading the help of ?survfit.object and ?summary.survfit, the standard error provided in the output of method 1 (survfit()) was for cumulative hazard-log(survival), while the standard error provided in the output of method 2 (summary.survfit()) was for survival itself, regardless of how you choose the value for "conf.type" ('log', 'log-log' or 'plain'). This explains why the standard error output is different between method 1 (10th row) and method 2. My question is how do I get standard error estimates for log(-log(survival))? Thanks! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] standard error of survfit.coxph()
Thank you Terry for the explanation! John From: "Therneau, Terry M., Ph.D." Sent: Monday, June 30, 2014 6:04 AM Subject: Re: standard error of survfit.coxph() 1. The computations "behind the scenes" produce the variance of the cumulative hazard. This is true for both an ordinary Kaplan-Meier and a Cox model. Transformations to other scales are done using simple Taylor series. H = cumulative hazard = log(S); S=survival var(H) = var(log(S)) = the starting point S = exp(log(S)), so var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = S^2 var(H) var(log(log(S)) is approx (1/S^2) var(H) 2. At the time it was written, summary.survfit was used only for printing out the survival curve at selected times, and the audience for the printout wanted std(S). True, that was 20 years ago, but I don't recall anyone ever asking for summary to do anything else. Your request is not a bad idea. Note however that the primary impact of using log(S) or S or log(log(S)) scale is is on the confidence intervals, and they do appear per request in the summary output. Terry T. On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote: > Message: 9 > Date: Fri, 27 Jun 2014 12:39:29 -0700 > To:"r-help@r-project.org" > Subject: [R] standard error of survfit.coxph() > Message-ID: > <1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com> > Content-Type: text/plain > > Hi, can anyone help me to understand the standard errors printed in the > output of survfit.coxph()? > > time<-sample(1:15,100,replace=T) > > status<-as.numeric(runif(100,0,1)<0.2) > x<-rnorm(100,10,2) > > fit<-coxph(Surv(time,status)~x) > ??? ### method 1 > > survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], > conf.type='log')$std.err > > [,1]??? [,2]??? [,3]??? [,4]?? [,5] > ?[1,] 0.0 0.0 0.0 0.0 0. > ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 > ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 > ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 > ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 > ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 > ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 > ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 > ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394 > [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 > [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 > [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413 > [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433 > [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860 > [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111 > > survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], > conf.type='log')$time > ?[1]? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15 > > ??? ### method 2 > > summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], > conf.type='log'),time=10)$std.err > > ? 1? 2? 3? 4? 5 > [1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532 > > By reading the help of ?survfit.object and ?summary.survfit, the standard > error provided in the output of method 1 (survfit()) was for cumulative > hazard-log(survival), while the standard error provided in the output of > method 2 (summary.survfit()) was for survival itself, regardless of how you > choose the value for "conf.type" ('log', 'log-log' or 'plain'). This explains > why the standard error output is different between method 1 (10th row) and > method 2. > > My question is how do I get standard error estimates for log(-log(survival))? > > Thanks! > > John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] standard error of survfit.coxph()
Dear Terry, I was trying to use your explanation of the standard error estimate from survfit.coxph() to verify the standard error estimates for the method of log(log(S)), but couldn't get the estimates correct. Here is an example using the lung dataset: > fit<-coxph(Surv(time,status)~wt.loss,lung) > surv<-survfit(fit,newdata=lung[2:5,])$surv > logstd<-survfit(fit,newdata=lung[2:5,])$std.err > logstd.time<-survfit(fit,newdata=lung[2:5,])$time ## std error of cumulative hazard at time=60 > logstd<-logstd[logstd.time==60,] > logstd [1] 0.01965858 0.01965858 0.01941871 0.01969248 ## survival S at months 60 > surv<-surv[logstd.time==60,] > surv 2 3 4 5 0.9249238 0.9249238 0.9253038 0.9263392 Since survfit()$std.err was for cumulative hazard H=log(S), thus based on your explanation below, the standard error for log-log method is: (1/S^2)var(H) > loglogstd<-sqrt(1/surv^2*(logstd^2)) > loglogstd 2 3 4 5 0.02125427 0.02125427 0.02098631 0.02125839 ## the upper confidence interval should be > exp(-exp(log(-log(surv))-qnorm(0.975)*loglogstd)) 2 3 4 5 0.9278737 0.9278737 0.9282031 0.9292363 But this is different from the output using summary.survfit: > summary(survfit(fit,newdata=lung[2:5,],conf.type='log-log'),times=60)$upper[1,] 2 3 4 5 0.9534814 0.9534814 0.9535646 0.9548478 Can you please suggest what I did wrong in the calculation? Thanks very much! John To: "Therneau, Terry M., Ph.D." ; "r-help@r-project.org" Sent: Monday, June 30, 2014 10:46 AM Subject: Re: standard error of survfit.coxph() [[elided Yahoo spam]] John From: "Therneau, Terry M., Ph.D." Sent: Monday, June 30, 2014 6:04 AM Subject: Re: standard error of survfit.coxph() 1. The computations "behind the scenes" produce the variance of the cumulative hazard. This is true for both an ordinary Kaplan-Meier and a Cox model. Transformations to other scales are done using simple Taylor series. H = cumulative hazard = log(S); S=survival var(H) = var(log(S)) = the starting point S = exp(log(S)), so var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = S^2 var(H) var(log(log(S)) is approx (1/S^2) var(H) 2. At the time it was written, summary.survfit was used only for printing out the survival curve at selected times, and the audience for the printout wanted std(S). True, that was 20 years ago, but I don't recall anyone ever asking for summary to do anything else. Your request is not a bad idea. Note however that the primary impact of using log(S) or S or log(log(S)) scale is is on the confidence intervals, and they do appear per request in the summary output. Terry T. On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote: > Message: 9 > Date: Fri, 27 Jun 2014 12:39:29 -0700 > To:"r-help@r-project.org" > Subject: [R] standard error of survfit.coxph() > Message-ID: > <1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com> > Content-Type: text/plain > > Hi, can anyone help me to understand the standard errors printed in the > output of survfit.coxph()? > > time<-sample(1:15,100,replace=T) > > status<-as.numeric(runif(100,0,1)<0.2) > x<-rnorm(100,10,2) > > fit<-coxph(Surv(time,status)~x) > ??? ### method 1 > > survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], > conf.type='log')$std.err > > [,1]??? [,2]??? [,3]??? [,4]?? [,5] > ?[1,] 0.0 0.0 0.0 0.0 0. > ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 > ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 > ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 > ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 > ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 > ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 > ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 > ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394 > [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 > [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 > [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413 > [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433 > [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860 > [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111 > > survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], > conf.type='log')$time > ?[1]? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15 > > ??? ### method 2 > > summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], > conf.type='log'),time=10)$std.err > > ? 1? 2? 3? 4? 5 > [1,] 0.04061384 0.04106186 0.03963184 0.03715246
Re: [R] standard error of survfit.coxph()
Terry, I figured out that variance of log(-log(S)) should be (1/H^2)var(H), not (1/S^2)var(H)! Thanks John e...@mayo.edu>; "r-help@r-project.org" Sent: Monday, July 21, 2014 11:41 AM Subject: Re: standard error of survfit.coxph() Dear Terry, I was trying to use your explanation of the standard error estimate from survfit.coxph() to verify the standard error estimates for the method of log(log(S)), but couldn't get the estimates correct. Here is an example using the lung dataset: > fit<-coxph(Surv(time,status)~wt.loss,lung) > surv<-survfit(fit,newdata=lung[2:5,])$surv > logstd<-survfit(fit,newdata=lung[2:5,])$std.err > logstd.time<-survfit(fit,newdata=lung[2:5,])$time ## std error of cumulative hazard at time=60 > logstd<-logstd[logstd.time==60,] > logstd [1] 0.01965858 0.01965858 0.01941871 0.01969248 ## survival S at months 60 > surv<-surv[logstd.time==60,] > surv 2 3 4 5 0.9249238 0.9249238 0.9253038 0.9263392 Since survfit()$std.err was for cumulative hazard H=log(S), thus based on your explanation below, the standard error for log-log method is: (1/S^2)var(H) > loglogstd<-sqrt(1/surv^2*(logstd^2)) > loglogstd 2 3 4 5 0.02125427 0.02125427 0.02098631 0.02125839 ## the upper confidence interval should be > exp(-exp(log(-log(surv))-qnorm(0.975)*loglogstd)) 2 3 4 5 0.9278737 0.9278737 0.9282031 0.9292363 But this is different from the output using summary.survfit: > summary(survfit(fit,newdata=lung[2:5,],conf.type='log-log'),times=60)$upper[1,] 2 3 4 5 0.9534814 0.9534814 0.9535646 0.9548478 Can you please suggest what I did wrong in the calculation? Thanks very much! John To: "Therneau, Terry M., Ph.D." ; "r-help@r-project.org" Sent: Monday, June 30, 2014 10:46 AM Subject: Re: standard error of survfit.coxph() [[elided Yahoo spam]] John From: "Therneau, Terry M., Ph.D." Sent: Monday, June 30, 2014 6:04 AM Subject: Re: standard error of survfit.coxph() 1. The computations "behind the scenes" produce the variance of the cumulative hazard. This is true for both an ordinary Kaplan-Meier and a Cox model. Transformations to other scales are done using simple Taylor series. H = cumulative hazard = log(S); S=survival var(H) = var(log(S)) = the starting point S = exp(log(S)), so var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = S^2 var(H) var(log(log(S)) is approx (1/S^2) var(H) 2. At the time it was written, summary.survfit was used only for printing out the survival curve at selected times, and the audience for the printout wanted std(S). True, that was 20 years ago, but I don't recall anyone ever asking for summary to do anything else. Your request is not a bad idea. Note however that the primary impact of using log(S) or S or log(log(S)) scale is is on the confidence intervals, and they do appear per request in the summary output. Terry T. On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote: > Message: 9 > Date: Fri, 27 Jun 2014 12:39:29 -0700 > To:"r-help@r-project.org" > Subject: [R] standard error of survfit.coxph() > Message-ID: > <1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com> > Content-Type: text/plain > > Hi, can anyone help me to understand the standard errors printed in the > output of survfit.coxph()? > > time<-sample(1:15,100,replace=T) > > status<-as.numeric(runif(100,0,1)<0.2) > x<-rnorm(100,10,2) > > fit<-coxph(Surv(time,status)~x) > ??? ### method 1 > > survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], > conf.type='log')$std.err > > [,1]??? [,2]??? [,3]??? [,4]?? [,5] > ?[1,] 0.0 0.0 0.0 0.0 0. > ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 > ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 > ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 > ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 > ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 > ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 > ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 > ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394 > [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 > [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 > [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413 > [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433 > [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860 > [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111 > > survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], > conf.type='log')$time > ?[1]? 1? 2? 3? 4? 5? 6?
Re: [R] standard error of survfit.coxph()
Dear Terry/All, I was trying to use your explanation of the standard error estimate from survfit.coxph() to verify the standard error estimates for the method of log(log(S)), but couldn't get the estimates correct. Here is an example using the lung dataset: > fit<-coxph(Surv(time,status)~wt.loss,lung) > surv<-survfit(fit,newdata=lung[2:5,])$surv > logstd<-survfit(fit,newdata=lung[2:5,])$std.err > logstd.time<-survfit(fit,newdata=lung[2:5,])$time ## std error of cumulative hazard at time=60 > logstd<-logstd[logstd.time==60,] > logstd [1] 0.01965858 0.01965858 0.01941871 0.01969248 ## survival S at months 60 > surv<-surv[logstd.time==60,] > surv 2 3 4 5 0.9249238 0.9249238 0.9253038 0.9263392 Since survfit()$std.err was for cumulative hazard H=log(S), thus based on your explanation below, the standard error for log-log method is: (1/S^2)var(H) > loglogstd<-sqrt(1/surv^2*(logstd^2)) > loglogstd 2 3 4 5 0.02125427 0.02125427 0.02098631 0.02125839 ## the upper confidence interval should be > exp(-exp(log(-log(surv))-qnorm(0.975)*loglogstd)) 2 3 4 5 0.9278737 0.9278737 0.9282031 0.9292363 But this is different from the output using summary.survfit: > summary(survfit(fit,newdata=lung[2:5,],conf.type='log-log'),times=60)$upper[1,] 2 3 4 5 0.9534814 0.9534814 0.9535646 0.9548478 Can you please suggest what I did wrong in the calculation? Thanks very much! John From: "Therneau, Terry M., Ph.D." Sent: Monday, June 30, 2014 6:04 AM Subject: Re: standard error of survfit.coxph() 1. The computations "behind the scenes" produce the variance of the cumulative hazard. This is true for both an ordinary Kaplan-Meier and a Cox model. Transformations to other scales are done using simple Taylor series. H = cumulative hazard = log(S); S=survival var(H) = var(log(S)) = the starting point S = exp(log(S)), so var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = S^2 var(H) var(log(log(S)) is approx (1/S^2) var(H) 2. At the time it was written, summary.survfit was used only for printing out the survival curve at selected times, and the audience for the printout wanted std(S). True, that was 20 years ago, but I don't recall anyone ever asking for summary to do anything else. Your request is not a bad idea. Note however that the primary impact of using log(S) or S or log(log(S)) scale is is on the confidence intervals, and they do appear per request in the summary output. Terry T. On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote: > Message: 9 > Date: Fri, 27 Jun 2014 12:39:29 -0700 > To:"r-help@r-project.org" > Subject: [R] standard error of survfit.coxph() > Message-ID: > <1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com> > Content-Type: text/plain > > Hi, can anyone help me to understand the standard errors printed in the > output of survfit.coxph()? > > time<-sample(1:15,100,replace=T) > > status<-as.numeric(runif(100,0,1)<0.2) > x<-rnorm(100,10,2) > > fit<-coxph(Surv(time,status)~x) > ??? ### method 1 > > survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], > conf.type='log')$std.err > > [,1]??? [,2]??? [,3]??? [,4]?? [,5] > ?[1,] 0.0 0.0 0.0 0.0 0. > ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 > ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 > ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 > ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 > ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 > ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 > ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 > ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394 > [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 > [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 > [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413 > [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433 > [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860 > [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111 > > survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], > conf.type='log')$time > ?[1]? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15 > > ??? ### method 2 > > summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], > conf.type='log'),time=10)$std.err > > ? 1? 2? 3? 4? 5 > [1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532 > > By reading the help of ?survfit.object and ?summary.survfit, the standard > error provided in the output of method 1 (survfit()) was for cumulative > hazard-log(survival), while the standard
[R] NRI or IDI for survival data - Hmisc package
Hi, I am trying to calculate net reclassification improvement (NRI) and Inegrated Discrimination Improvement (IDI) for a survival dataset to compare 2 risk models. It seems that the improveProb() in Hmisc package does this only for binary outcome, while rcorrp.cens() does take survival object, but doesn't output NRI or IDI. Can anyone suggest any other packages that can calculate NRI and IDI for survival data? The reference for NRI extending to survival data is: Pencina MJ, D'Agostino RB, Steyerberg EW (2011): Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat in Med 30:11-21 Thanks! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] estimating survival function from Cox model
Hi, I have some questions on how to estimate the survival function from a Cox model. I know how to do this in R using survfit(). But let's say the model was done is another software, and I was only given the estimate of baseline cumulative hazard "A0(t=10)" at the specified time "t=10" (baseline cumulative hazard refers to when covariate X=0)and the beta estimate "b" for the covariate used in Cox model "X". So the survival function at time 10 for a given covariate value x can be calculated as: A(t=10|X=x) = exp(b*x)*A0(t=10) where A is cumulative hazard when X=x S(t=10|X=x) = exp(-A(t=10|X=x)) where S is survival function to be calculated Now I want to calculate confidence interval for S(t=10|X=x). I think I need to calculate the CI for cumulative hazard A(t=10|X=x) first and then exponentiate it to get CI for S, based on the relationship S = exp(-A). To get CI for A, I need to calculate the estimate of standard error of A. I know the other software can give me the standard error of A0, the baseline cumulative hazard. Based on the relationship A = exp(b*x)*A0, I guess I'll need the standard error for b. But how do I calculate the standard error for A based on standard errors for A0 and b? Any insights would be greatly appreciated! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] estimating survival function from Cox model
Hi, I have some questions on how to estimate the survival function from a Cox model. I know how to do this in R using survfit(). But let's say the model was done is another software, and I was only given the estimate of baseline cumulative hazard "A0(t=10)" at the specified time "t=10" (baseline cumulative hazard refers to when covariate X=0)and the beta estimate "b" for the covariate used in Cox model "X". So the survival function at time 10 for a given covariate value x can be calculated as: A(t=10|X=x) = exp(b*x)*A0(t=10) where A is cumulative hazard when X=x S(t=10|X=x) = exp(-A(t=10|X=x)) where S is survival function to be calculated Now I want to calculate confidence interval for S(t=10|X=x). I think I need to calculate the CI for cumulative hazard A(t=10|X=x) first and then exponentiate it to get CI for S, based on the relationship S = exp(-A). To get CI for A, I need to calculate the estimate of standard error of A. I know the other software can give me the standard error of A0, the baseline cumulative hazard. Based on the relationship A = exp(b*x)*A0, I guess I'll need the standard error for b. But how do I calculate the standard error for A based on standard errors for A0 and b? Any insights would be greatly appreciated! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] estimating survival function from Cox model
Hi, I have some questions on how to estimate the survival function from a Cox model. I know how to do this in R using survfit(). But let's say the model was done is another software, and I was only given the estimate of baseline cumulative hazard "A0(t=10)" at the specified time "t=10" (baseline cumulative hazard refers to when covariate X=0)and the beta estimate "b" for the covariate used in Cox model "X". So the survival function at time 10 for a given covariate value x can be calculated as: A(t=10|X=x) = exp(b*x)*A0(t=10) where A is cumulative hazard when X=x S(t=10|X=x) = exp(-A(t=10|X=x)) where S is survival function to be calculated Now I want to calculate confidence interval for S(t=10|X=x). I think I need to calculate the CI for cumulative hazard A(t=10|X=x) first and then exponentiate it to get CI for S, based on the relationship S = exp(-A). To get CI for A, I need to calculate the estimate of standard error of A. I know the other software can give me the standard error of A0, the baseline cumulative hazard. Based on the relationship A = exp(b*x)*A0, I guess I'll need the standard error for b. But how do I calculate the standard error for A based on standard errors for A0 and b? Any insights would be greatly appreciated! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installation folder
Hi, I noticed that when I install/update packages, the installation folder is C:/User/My Document/R, not in C:/Program Files/R. R itself was still in Program Files folder. Don't know how this has happened. It used to work ok.Any clues or how to correct the problem is appreciated!ThanksJohnhttp://overview.mail.yahoo.com?.src=iOS";>Sent from Yahoo Mail for iPhone [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different colours for LatticeExtra graphs
http://overview.mail.yahoo.com?.src=iOS";>Sent from Yahoo Mail for iPhone [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different colours for LatticeExtra graphs
http://overview.mail.yahoo.com?.src=iOS";>Sent from Yahoo Mail for iPhone [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] median survival
Hi, if 50% survival probability horizontal line in a Kaplan-Meier survival curve overlap one of the step line between 2 time points t1 and t2, the survfit() from survival package estimates median survival as t2 (the longest time point). But I saw some articles (page 23: http://www.amstat.org/chapters/northeasternillinois/pastevents/presentations/summer05_Ibrahim_J.pdf) recommend the smallest time t such time S(t)<=0.5. What is the convention for the definition of median survival? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] unique rows
Hi, I wanted to remove redundant rows (with same entry in columns) in a data frame. For example, with this data frame: > dat<-cbind(x=c('a','a','b','b','c','c'),y=c('x','x','d','s','g','g')) > dat x y [1,] "a" "x" [2,] "a" "x" [3,] "b" "d" [4,] "b" "s" [5,] "c" "g" [6,] "c" "g" after removing the redundancy, the end results should be x y [1,] "a" "x" [2,] "b" "d" [3,] "b" "s" [4,] "c" "g" what is the best way to do this? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique rows
sorry.. don't know unique().. such a great function From: Bert Gunter Cc: "r-help@r-project.org" Sent: Tuesday, January 28, 2014 2:21 PM Subject: Re: [R] unique rows Inline. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." H. Gilbert Welch > Hi, I wanted to remove redundant rows (with same entry in columns) in a data > frame. For example, with this data frame: > >> dat<-cbind(x=c('a','a','b','b','c','c'),y=c('x','x','d','s','g','g')) ## this is not a data frame. And would you kindly explain why you posted here instead of reading ?unique. -- Bert >> dat > x y > [1,] "a" "x" > [2,] "a" "x" > [3,] "b" "d" > [4,] "b" "s" > [5,] "c" "g" > [6,] "c" "g" > > after removing the redundancy, the end results should be > > x y > [1,] "a" "x" > [2,] "b" "d" > [3,] "b" "s" > [4,] "c" "g" > > what is the best way to do this? > > Thanks > > John > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] median survival
please ignore. actually the median survival from survfit() is the mean of the 2 time points. To: R help Sent: Tuesday, January 28, 2014 11:27 AM Subject: [R] median survival Hi, if 50% survival probability horizontal line in a Kaplan-Meier survival curve overlap one of the step line between 2 time points t1 and t2, the survfit() from survival package estimates median survival as t2 (the longest time point). But I saw some articles (page 23: http://www.amstat.org/chapters/northeasternillinois/pastevents/presentations/summer05_Ibrahim_J.pdf) recommend the smallest time t such time S(t)<=0.5. What is the convention for the definition of median survival? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] power analysis is applicable or not
Hi, this is a statistical question rather than a pure R question. I have got many help from R mailing list in the past, so would like to try here and appreciate any input: I conducted Mantel-Haenszel test to show that the performance of a diagnostic test did not show heterogeneity among 4 study sites, i.e. Mantel Haenszel test p value > 0.05, so that I could conduct a meta-analysis by combining data of all 4 study sites. Now one of the reviewers for the manuscript did a powering analysis for Mantel Haneszel test showing that with the sample sizes I have, the power for Mantel Haeszel test was only 50%. So he argued that I did not have enough power for Mantel Haenszel test. My usage of Mantel Haenszel was NOT to show a significant p value, instead a non-sginificant p value was what I was looking for because non-significant p value indicate NO heterogeneity among study sites. Powering analysis in general is to show whether you have enough sample size to ensure a statistical significant difference can be seen with certain likelihood. But this is not how I used Mantel Haenszel test. So I think in my scenario, the power analysis is NOT applicable because I am simply using the test to demonstrate a non-significant p value. Am I correct on this view? Thanks and appreciate any thoughts. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] power analysis is applicable or not
Hi Chris, Thanks for sharing your thoughts. The reviewer used the heterogeneity that I observed in my study for the power analysis. I understand what you have descried. And I agree that with the sample size I have, I do not have enough power to detect the heterogeneity that I observed with significance. But if let's say I have enough sample size as calculated by the power analysis, then I will have 80% power to detect the heterogeneity, would it be true that I will almost very unlikely to declare homogeneity among study sites, so that I would almost never be able to combine study sites? This goes to the general thinking that if you have a sample size large enough, you will always be able to make any difference statistically significant... On the the hand, making a statistical inference using any statistical test (including Mantel Haenszel test), I though, is always valid regardless of sample size. For the heterogeneity test, I am just doing that -- making a statistical inference with the p value from Mantel Haenszel test. I am not sure if it is correct that it is mandatory to perform a power analysis before attempting a statistical test. Please share your thoughts... Thanks John From: Christopher W. Ryan Sent: Tuesday, November 12, 2013 6:53 PM Subject: Re: [R] power analysis is applicable or not John-- Well, my simple-minded way of thinking about these issues goes something like this: You want to know if there is heterogeneity. You gather some data and do your MH analysis. You never know *for sure* whether there is *really* heterogeneity in your population; all you know is whether there is any in your sample--you concluded there was not. Your reviewer calculated that with the sample size you used, *even if there was heterogeneity in your population* (unknowable by you or anyone else) then your sample size only had a 50% probability of detecting it (a 50% probability of coming up with a p < 0.05). Meaning there *could have been* heterogeneity there, at a 0.05 signficance level, and you *would* have seen it, *if* your sample size was larger. It's when you come up with a "non-significant result" that the issue of power is most relevant. If you already have a "significant" result, then yes, your sample size was large enough to show a significant result. An important question is: what *magnitude* of heterogeneity did your reviewer assume he/she was looking for when he/she did the power calculation? And is that magnitude meaningful? All this being said, power calculations are best done before recruiting subjects or gathering data. --Chris Ryan SUNY Upstate Medical University Binghamton, NY array chip wrote: > Hi, this is a statistical question rather than a pure R question. I have got > many help from R mailing list in the past, so would like to try here and > appreciate any input: > > I conducted Mantel-Haenszel test to show that the performance of a diagnostic > test did not show heterogeneity among 4 study sites, i.e. Mantel Haenszel > test p value > 0.05, so that I could conduct a meta-analysis by combining > data of all 4 study sites. > > Now one of the reviewers for the manuscript did a powering analysis for > Mantel Haneszel test showing that with the sample sizes I have, the power for > Mantel Haeszel test was only 50%. So he argued that I did not have enough > power for Mantel Haenszel test. > > My usage of Mantel Haenszel was NOT to show a significant p value, instead a > non-sginificant p value was what I was looking for because non-significant p > value indicate NO heterogeneity among study sites. Powering analysis in > general is to show whether you have enough sample size to ensure a > statistical significant difference can be seen with certain likelihood. But > this is not how I used Mantel Haenszel test. So I think in my scenario, the > power analysis is NOT applicable because I am simply using the test to > demonstrate a non-significant p value. > > Am I correct on this view? > > Thanks and appreciate any thoughts. > > John > [[alternative HTML version deleted]] > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] logistic regression with 50 varaibales
Hi, this is not R technical question per se. I know there are many excellent statisticians in this list, so here my questions: I have dataset with ~1800 observations and 50 independent variables, so there are about 35 samples per variable. Is it wise to build a stable multiple logistic model with 50 independent variables? Any problem with this approach? Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression with 50 varaibales
Thanks Charles for the reproducible codes. I started this question because I was asked to take a look at such dataset, but I have doubt if it's meaningful to do a LR with 50 variables. I haven't got the dataset yet, thus have not tried any code. But again for sharing some simulation code. have one question for your code. When you calculate common standard errors for coefficients using sd(coef(fit)), should you exlude intercept by doing sd(coef(fit)[-1])? Actually after removing intercept, the standard error calculate this way is very similar to one reported from colMeans(coef(summary(fit))), for both scenarios in your example (coef = 0.14 or 1.0) Another question is the 50 simulated variables have very low correlations (ranging from -0.1 to 0.08), that may contribute to the stable model. If some (not all) of the 50 variable have considerable correlation, like 0.7 or 0.8 correlation, How would LR model behave? Thanks John - Original Message From: Charles C. Berry To: Joris Meys Cc: r-help@r-project.org; Marc Schwartz Sent: Mon, June 14, 2010 8:32:02 AM Subject: Re: [R] logistic regression with 50 varaibales On Mon, 14 Jun 2010, Joris Meys wrote: > Hi, > > Marcs explanation is valid to a certain extent, but I don't agree with > his conclusion. I'd like to point out "the curse of > dimensionality"(Hughes effect) which starts to play rather quickly. > Ahem! ... minimal, self-contained, reproducible code ... The OPs situation may not be an impossible one: > set.seed(54321) > dat <- as.data.frame(matrix(rnorm(1800*50),nc=50)) > dat$y <- rbinom(1800,1,plogis(rowSums(dat)/7)) > fit <- glm(y~., dat, family=binomial) > 1/7 # the true coef [1] 0.1428571 > sd(coef(fit)) # roughly, the common standard error [1] 0.05605597 > colMeans(coef(summary(fit))) # what glm() got Estimate Std. Errorz value Pr(>|z|) 0.14590836 0.05380666 2.71067328 0.06354820 > # a trickier case: > set.seed(54321) > dat <- as.data.frame(matrix(rnorm(1800*50),nc=50)) > dat$y <- rbinom(1800,1,plogis(rowSums(dat))) # try again with coef==1 > fit <- glm(y~., dat, family=binomial) > colMeans(coef(summary(fit))) Estimate Std. Error z valuePr(>|z|) 0.982944012 0.119063631 8.222138491 0.008458002 > Finer examination of the latter fit will show some values that differ too far from 1.0 to agree with the asymptotic std err. > sd(coef(fit)) # rather bigger than 0.119 [1] 0.1827462 > range(coef(fit)) [1] -0.08128586 1.25797057 And near separability may be playing here: > cbind( + table( + cut( + plogis(abs(predict(fit))), + c( 0, 0.9, 0.99, 0.999, 0., 0.9, 1 ) ) ) ) [,1] (0,0.9] 453 (0.9,0.99]427 (0.99,0.999] 313 (0.999,0.]251 (0.,0.9] 173 (0.9,1] 183 > Recall that the observed information contains a factor of plogis( predict(fit) )* plogis( -predict(fit)) > hist(plogis( predict(fit) )* plogis( -predict(fit))) So the effective sample size here was much reduced. But to the OP's question, whether what you get is reasonable depends on what the setup is. I wouldn't call the first of the above cases 'highly unstable'. Which is not to say that one cannot generate difficult cases (esp. with correlated covariates and/or one or more highly influential covariates) and that the OPs case is not one of them. HTH, Chuck > The curse of dimensionality is easily demonstrated looking at the > proximity between your datapoints. Say we scale the interval in one > dimension to be 1 unit. If you have 20 evenly-spaced observations, the > distance between the observations is 0.05 units. To have a proximity > like that in a 2-dimensional space, you need 20^2=400 observations. in > a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The > distance between your observations is important, as a sparse dataset > will definitely make your model misbehave. > > Even with about 35 samples per variable, using 50 independent > variables will render a highly unstable model, as your dataspace is > about as sparse as it can get. On top of that, interpreting a model > with 50 variables is close to impossible, and then I didn't even start > on interactions. No point in trying I'd say. If you really need all > that information, you might want to take a look at some dimension > reduction methods first. > > Cheers > Joris > > On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz wrote: >> On Jun 13, 2010, at 10:20 PM, array chip wrote: >> >>> Hi, this is not R technical question per se. I know there are many >>> excellent statisticians in this list, so here my questions: I have dataset >>> with ~1800 observations and 50 independent variables, so there are about 35 >>>
[R] xyplot help
Hi, I am learning xyplot. I have an example dataset attached. plotdata<-read.table("plotdata.txt",sep='\t',header=T,row.names=1) head(plotdata,n=4) y x type 1 -4.309601 -0.7448405A 2 -4.715421 0.7875994A 3 -2.310638 0.5455310A 4 -2.685803 10.4116868A xyplot(y~x,groups=type,plotdata, auto.key=T) This shows different colors for different levels of "type". Now, I want to add a fitted line to the plot, the formula is -1.324+0.1117*x-0.0006357*x*x I tried the following code to do this: xyplot(y~x,groups=type,plotdata, auto.key=T , panel = function(x,y) { panel.xyplot(x,y, type='p') x<-sort(x) panel.lines(x,-1.324+0.1117*x-0.0006357*x*x) }) Now, it doesn't show different colors for different levels of "type". How can I restore that? Also, is there anyway to put the legend at bottom of the plot (instead at the top right now)? And is there anyway to print legend horizontally, instead of vertically as shown right now? Thanks John y x type 1 -4.30960132841301 -0.744840518760792 A 2 -4.71542109959628 0.787599395527709 A 3 -2.31063823031730 0.545530976094587 A 4 -2.68580284811662 10.4116867886136A 5 0.857277739439742 45.8678415780720A 6 2.4572713553700584.8497464051184A 7 2.13169532824448103.133586922502A 8 1.28410512992259105.748839904875A 9 -1.03955702898241 -0.899013850822378 A 10 0.829280624483265 2.98715094752831A 11 -0.50214940111739 1.72294341018182A 12 3.1178629464421320.4231486028650A 13 3.3424685053647269.3541875183234A 14 2.70345650675251111.307639403626A 15 4.00616663408449124.004258887019A 16 5.1806910416212 128.613638097411A 17 -6.26534533776664 -0.944438526313875 A 18 -3.78598472716979 2.03694726901381A 19 -4.21383059434262.33837780519899A 20 1.8936558564966421.6366580986433A 21 4.5840279115829167.5861096660383A 22 2.05571359621679107.949510371836A 23 4.05242444227599112.126993688471A 24 4.34316209268166116.153400981747A 25 -2.12763929077743 -1.08663195011827 A 26 -2.10662282720267 -0.426141043811507 A 27 -3.01995286978557 -1.10752325011792 A 28 -1.64834644316891 7.78787375907454A 29 1.2440114496483745.3160993898581A 30 4.0353582382757787.1430284366417A 31 1.29543626134492105.003743707372A 32 3.84919041637386108.601951968250A 33 -3.2442183465 -0.382454625559666 A 34 -1.37712133953019 0.300186521658450 A 35 -2.12844632825254 -0.814335902708673 A 36 1.373118028256679.791747074437 A 37 -0.393168361803423 52.1011610030969A 38 2.66057348869074101.966194385881A 39 1.91384920016281112.592562878277A 40 2.06188667920977117.376424285145A 41 -3.05678162571978 -0.483215996703479 A 42 -0.184239995971267 1.23596224758922A 43 -1.42855399915072 0.248774451716021 A 44 1.4984400060720517.8847352593804A 45 2.5354258682307960.0336036447702A 46 2.95329003854665102.482588510080A 47 5.0685809676014 110.619630044398A 48 4.06642471090290120.486184775892A 49 -0.503130553392357 0.702688068466441 A 50 -0.860471679766733 1.43490186430460A 51 -0.17788663078253 3.11150418059122A 52 0.986266958945663 17.0902837098301A 53 2.1938933814974557.8826565905442A 54 3.9258915764087198.2558014383485A 55 2.53121102711589108.302821306135A 56 2.34432932634521111.053272678522A 97 -3.42191838986702 -0.885224809610779 A 98 -1.59014967363784 0.109987040681714 A 99 -1.58111592784537 3.78759624156396A 100 0.975980130414523 21.9589171041898A 101 2.6572303639741463.3950961679038A 102 3.3921352446590793.1009307560951A 103 4.2956847168814499.8712158796281A 104 4.84074935161124100.052602454596A 121 -1.68332987752984 0.227299766484946 A 122 0.468472823690819 1.87223738102554A 123 -0.547570024002551 4.37927291184889A 124 0.224584723010114 28.1273391056646A 125 1.9501827239534178.8401126239
Re: [R] xyplot help
Thank you for pointing out panel.superpose. I got it working with panel.xyplot as below: xyplot(y~x,groups=type,plotdata, auto.key=list(space='bottom',points=T) , panel = function(x,y,subscripts, groups) { panel.xyplot(x,y, col=groups[subscripts]) x<-sort(x) panel.lines(x,-1.324+0.1117*x-0.0006357*x*x) }) type='b' doesn't give me a smooth line, so it didn't produce what I want. Thanks John - Original Message ---- From: Bert Gunter To: array chip ; r-help@r-project.org Sent: Thu, July 8, 2010 4:25:51 PM Subject: RE: [R] xyplot help Read the "panel" part of ?xyplot carefully and note ?panel.superpose. Typically, when you have a groups argument your panel function must be of the form panel = function(x,y,...){ panel.superpose(x,y,...) etc. } However, I think it might all happen automagically if you do the call as: xyplot(y~x,groups=type,plotdata, type = "b", auto.key=T) This is also referenced in the "panel" section of the xyplot help, where you are referred to ?panel.xyplot for info on the type argument. So, in brief, read the docs more carefully. Bert Gunter Genentech Nonclinical Statistics > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf Of array chip > Sent: Thursday, July 08, 2010 3:58 PM > To: r-help@r-project.org > Subject: [R] xyplot help > > Hi, I am learning xyplot. I have an example dataset attached. > > plotdata<-read.table("plotdata.txt",sep='\t',header=T,row.names=1) > > head(plotdata,n=4) > y x type > 1 -4.309601 -0.7448405A > 2 -4.715421 0.7875994A > 3 -2.310638 0.5455310A > 4 -2.685803 10.4116868A > > xyplot(y~x,groups=type,plotdata, auto.key=T) > > This shows different colors for different levels of "type". > > Now, I want to add a fitted line to the plot, the formula is > -1.324+0.1117*x-0.0006357*x*x > > I tried the following code to do this: > > xyplot(y~x,groups=type,plotdata, auto.key=T > , panel = function(x,y) { > panel.xyplot(x,y, type='p') > x<-sort(x) > panel.lines(x,-1.324+0.1117*x-0.0006357*x*x) > > }) > > Now, it doesn't show different colors for different levels of "type". How > can I > restore that? > > Also, is there anyway to put the legend at bottom of the plot (instead at > the > top right now)? And is there anyway to print legend horizontally, instead > of > vertically as shown right now? > > Thanks > > John > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] "sequeeze" a data frame
Hi, suppose I have a data frame as below: dat<-cbind(expand.grid(id=c(1,2,3),time=c(0,3,6),mode=c('R','L'),rep=(1:3)),y=rnorm(54)) I kind of want to "squeeze" the data frame into a new one with averaged "y" over "rep" for the same id, time and mode. taking average is easy with tapply: tapply(dat$y, list(dat$id, dat$time, dat$mode), mean) But I want the result to be in the same format as "dat". Certainly, we always can transform the result (array) into the data frame using a lot of codes. But is there a simple way to do this? Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "sequeeze" a data frame
Thank you David. I probably didn't make my question clear enough. In the end, I want a "shorter" version of the original data frame, basically with variable "rep" removed, then just one row per id, time and mode combination. The original data frame has 54 rows, I wish to have a data frame with only 27 rows after average. Using ave() would produce the following 3 rows for example: idtimemode repy avg.y 1 1 0 R 1-1.089540576-0.21924078 19 1 0 R 2 -0.743374679 -0.21924078 37 1 0 R 3 1.175192908 -0.21924078 But I only need one row: id time modeavg.y 110 R-0.21924078 Thanks again, John - Original Message From: David Winsemius To: array chip Cc: r-help@r-project.org Sent: Thu, September 9, 2010 3:21:15 PM Subject: Re: [R] "sequeeze" a data frame On Sep 9, 2010, at 5:47 PM, array chip wrote: > Hi, suppose I have a data frame as below: > >dat<-cbind(expand.grid(id=c(1,2,3),time=c(0,3,6),mode=c('R','L'),rep=(1:3)),y=rnorm(54)) >) > > > I kind of want to "squeeze" the data frame into a new one with averaged "y" >over > "rep" for the same id, time and mode. taking average is easy with tapply: > > tapply(dat$y, list(dat$id, dat$time, dat$mode), mean) Try: dat$avg.y <- ave(dat$y, dat$id, dat$time, dat$mode, FUN=mean) ?ave The syntax of ave is different than "tapply" or "by". The grouping factors are not presented as a list and the FUN argument comes after the ,..., so it needs to be named if different than the default of "mean". > > But I want the result to be in the same format as "dat". Certainly, we always > can transform the result (array) into the data frame using a lot of codes. But > is there a simple way to do this? > > Thanks David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "sequeeze" a data frame
Thank you Peter, yes this is what I need! John - Original Message From: Peter Alspach To: array chip ; David Winsemius Cc: "r-help@r-project.org" Sent: Thu, September 9, 2010 4:26:53 PM Subject: RE: [R] "sequeeze" a data frame Tena koe John ?aggregate maybe? HTH Peter Alspach > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of array chip > Sent: Friday, 10 September 2010 11:13 a.m. > To: David Winsemius > Cc: r-help@r-project.org > Subject: Re: [R] "sequeeze" a data frame > > Thank you David. I probably didn't make my question clear enough. In > the end, I > want a "shorter" version of the original data frame, basically with > variable > "rep" removed, then just one row per id, time and mode combination. The > original > data frame has 54 rows, I wish to have a data frame with only 27 rows > after > average. > > Using ave() would produce the following 3 rows for example: > idtimemode repy avg.y > 1 1 0 R 1-1.089540576-0.21924078 > 19 1 0 R 2 -0.743374679 -0.21924078 > 37 1 0 R 3 1.175192908 -0.21924078 > > But I only need one row: > > id time modeavg.y > 11 0 R-0.21924078 > > Thanks again, > > John > > > > - Original Message > From: David Winsemius > To: array chip > Cc: r-help@r-project.org > Sent: Thu, September 9, 2010 3:21:15 PM > Subject: Re: [R] "sequeeze" a data frame > > > On Sep 9, 2010, at 5:47 PM, array chip wrote: > > > Hi, suppose I have a data frame as below: > > > >dat<- > cbind(expand.grid(id=c(1,2,3),time=c(0,3,6),mode=c('R','L'),rep=(1:3)), > y=rnorm(54)) > >) > > > > > > I kind of want to "squeeze" the data frame into a new one with > averaged "y" > >over > > "rep" for the same id, time and mode. taking average is easy with > tapply: > > > > tapply(dat$y, list(dat$id, dat$time, dat$mode), mean) > > Try: > > dat$avg.y <- ave(dat$y, dat$id, dat$time, dat$mode, FUN=mean) > > ?ave > > The syntax of ave is different than "tapply" or "by". The grouping > factors are > not presented as a list and the FUN argument comes after the ,..., so > it needs > to be named if different than the default of "mean". > > > > > But I want the result to be in the same format as "dat". Certainly, > we always > > can transform the result (array) into the data frame using a lot of > codes. But > > is there a simple way to do this? > > > > Thanks > > David Winsemius, MD > West Hartford, CT > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be ...{{dropped:17}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme, groupedData, random intercept and slope
from the output, I think it's both. - Original Message From: John Sorkin To: r-help@r-project.org Sent: Fri, September 10, 2010 5:25:44 AM Subject: [R] lme, groupedData, random intercept and slope Windows Vista R 2.10.1 Does the following use of groupedData and lme produce an analysis with both random intercept and slope, or only random slope? zz<-groupedData(y~time | Subject,data=data.frame(data), labels = list( x = "Time", y = "y" ), units = list( x = "(yr)", y = "(mm)") ) plot(zz) fit10<-lme(zz) summary(fit10) Linear mixed-effects model fit by REML Data: zz AIC BIC logLik -123.1942 -115.2010 67.5971 Random effects: Formula: ~time | Subject Structure: General positive-definite StdDev Corr (Intercept) 6.054897e+00 (Intr) time4.160662e-05 1 Residual9.775954e-04 Fixed effects: y ~ time Value Std.Error DF t-value p-value (Intercept) 15.000217 1.914727 19 7.834 0 time-1.51 0.000219 19 -4566.598 0 Correlation: (Intr) time 0.059 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -1.73706837 -0.36289558 0.06892484 0.59777067 1.69095476 Number of Observations: 30 Number of Groups: 10 John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:12}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmer fixed effects, SE, t . . . and p
But as far as I know, profile() seems to be de-activated in the lme4 package. - Original Message From: Gavin Simpson To: John Sorkin Cc: r-help@r-project.org; Bert Gunter Sent: Fri, September 10, 2010 2:05:37 AM Subject: Re: [R] lmer fixed effects, SE, t . . . and p On Thu, 2010-09-09 at 23:40 -0400, John Sorkin wrote: > Bert, > I appreciate you comments, and I have read Doug Bates writing about p > values in mixed effects regression. It is precisely because I read > Doug's material that I asked "how are we to interpret the estimates" > rather than "how can we compute a p value". My question is a simple > question whose answer is undoubtedly complex, but one that needs an > answer. Without p values, or confidence intervals, I am not certain > what to make of the results of my analysis. Does my analysis suggest, > or does it not suggest that there is a relation between time and y? If > I can't answer this question after running the analysis, I don't have > any more information than I did before I ran the analysis, and a fair > question would be why did I run the analysis? I am asking for help not > in calculation a p value or a CI, but rather to know what I can and > can't say about the results of the analysis. If this basic question > can not be answered, I am at a loss to interpret my results. > Thank you, > John Doug talks quite a lot about profiling lmer fits using 'profile deviance' to investigate variability in fixed effects. For example, see section 1.5 in the draft of chapter 1 of Doug's book on mixed models: http://lme4.r-forge.r-project.org/book/ HTH G > John David Sorkin M.D., Ph.D. > Chief, Biostatistics and Informatics > University of Maryland School of Medicine Division of Gerontology > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> Bert >Gunter 9/9/2010 11:21 PM >>> > John: > > Search on this issue in the list archives. Doug Bates has addressed it > at length. Basically, he does not calculate CI's or p-values because > he does not know how to reliably do so. > > However, the key remark in your query was: > > > (2) lmer does not give p values or confidence intervals for the fixed >effects. How we are to interpret the estimates given that no p value or CI is >given for the estimates? > > Think about it. A statistical analysis -- ANY statistical analysis -- > treats the data in isolation: it is not informed by physics, > thermodynamics, biology, other similar data, prior experience, or, > indeed, any part of the body of relevant scientific knowledge. Do you > really think that any such analysis, especially when predicated upon > often tenuous or even (necessarily) unverifiable assumptions and > simplifications should be considered authoritative? Classical > statistical inference is just another piece of the puzzle, and not > even particularly useful when, as if typically the case, hypotheses > are formulated AFTER seeing the data (this invalidates the probability > calculations -- hypotheses must be formulated before seeing the data > to be meaningfully assessed). Leo Breiman called this statistics' > "quiet scandal" something like 20 years ago, and he was no dummy. > > It is comforting, perhaps, but illusory to believe that statistical > inference can be relied on to give sound, objective scientific > results. True, without such a framework, science seems rather > subjective, perhaps closer to religion and arbitrary cultural > archetypes than we care to admit. But see Thomas Kuhn and Paul > Feuerabend for why this is neither surprising nor necessarily a bad > thing. > > Cheers, > Bert Gunter > > > > > On Thu, Sep 9, 2010 at 8:00 PM, John Sorkin >wrote: > > windows Vista > > R 2.10.1 > > > > > > (1) How can I get the complete table of for the fixed effects from lmer. As >can be seen from the example below, fixef(fit2) only give the estimates and >not >the SE or t value > > > >> fit3<- lmer(y~time + (1|Subject) + (time|Subject),data=data.frame(data)) > >> summary(fit3) > > Linear mixed model fit by REML > > Formula: y ~ time + (1 | Subject) + (time | Subject) > > Data: data.frame(data) > >AICBIC logLik deviance REMLdev > > -126.2 -116.4 70.1 -152.5 -140.2 > > Random effects: > > Groups NameVariance Std.Dev. Corr > > Subject (Intercept) 2.9311e+01 5.41396385 > > Subject (Intercept) 0.e+00 0. > > time0.e+00 0. NaN > > Residual 8.1591e-07 0.00090328 > > Number of obs: 30, groups: Subject, 10 > > > > Fixed effects: > > Estimate Std. Error t value > > (Intercept) 14.998216 1.712046 9 > > time-0.999779 0.000202 -4950 > > > > Correlation of Fixed Effects: > > (Intr) > > time -0.001 > >> fixef(fit3) > > (Intercept)time > > 14.9982158 -0.9997793 > > > > (2) lmer does not give p values
[R] lme4a package loading error
Thanks for reminding this. So I found lme4a package from Doug's UserR!2010 presentation folder: http://lme4.r-forge.r-project.org/slides/2010-07-20-Gaithersburg/pkg/ However, after installation, I got the following error message when trying to load the library: library(Matrix) > library(Rcpp) > library(minqa) > library(lme4a) Error : classes "modelMatrix", "denseModelMatrix", "sparseModelMatrix", "ddenseModelMatrix", "dsparseModelMatrix", "predModule", "dPredModule", "sPredModule", "respModule", "glmRespMod", "nlsRespMod" are not exported by 'namespace:Matrix' Error: package/namespace load failed for 'lme4a' Here is my sessionInfo() > sessionInfo() R version 2.11.1 (2010-05-31) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] minqa_1.1.9Rcpp_0.8.6 Matrix_0.999375-43 lattice_0.18-8 loaded via a namespace (and not attached): [1] grid_2.11.1nlme_3.1-96splines_2.11.1 stats4_2.11.1 tools_2.11.1 Any suggestions would be appreciated. John - Original Message From: Gavin Simpson To: array chip Cc: John Sorkin ; r-help@r-project.org; Bert Gunter Sent: Fri, September 10, 2010 10:00:15 AM Subject: Re: [R] lmer fixed effects, SE, t . . . and p On Fri, 2010-09-10 at 09:51 -0700, array chip wrote: > But as far as I know, profile() seems to be de-activated in the lme4 package. It is beta software. The lme4a version of the lme4 "package" might have had profile re-enabled, IIRC. G > - Original Message > From: Gavin Simpson > To: John Sorkin > Cc: r-help@r-project.org; Bert Gunter > Sent: Fri, September 10, 2010 2:05:37 AM > Subject: Re: [R] lmer fixed effects, SE, t . . . and p > > On Thu, 2010-09-09 at 23:40 -0400, John Sorkin wrote: > > Bert, > > I appreciate you comments, and I have read Doug Bates writing about p > > values in mixed effects regression. It is precisely because I read > > Doug's material that I asked "how are we to interpret the estimates" > > rather than "how can we compute a p value". My question is a simple > > question whose answer is undoubtedly complex, but one that needs an > > answer. Without p values, or confidence intervals, I am not certain > > what to make of the results of my analysis. Does my analysis suggest, > > or does it not suggest that there is a relation between time and y? If > > I can't answer this question after running the analysis, I don't have > > any more information than I did before I ran the analysis, and a fair > > question would be why did I run the analysis? I am asking for help not > > in calculation a p value or a CI, but rather to know what I can and > > can't say about the results of the analysis. If this basic question > > can not be answered, I am at a loss to interpret my results. > > Thank you, > > John > > Doug talks quite a lot about profiling lmer fits using 'profile > deviance' to investigate variability in fixed effects. For example, see > section 1.5 in the draft of chapter 1 of Doug's book on mixed models: > > http://lme4.r-forge.r-project.org/book/ > > HTH > > G > > > John David Sorkin M.D., Ph.D. > > Chief, Biostatistics and Informatics > > University of Maryland School of Medicine Division of Gerontology > > Baltimore VA Medical Center > > 10 North Greene Street > > GRECC (BT/18/GR) > > Baltimore, MD 21201-1524 > > (Phone) 410-605-7119 > > (Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> Bert > >Gunter 9/9/2010 11:21 PM >>> > > John: > > > > Search on this issue in the list archives. Doug Bates has addressed it > > at length. Basically, he does not calculate CI's or p-values because > > he does not know how to reliably do so. > > > > However, the key remark in your query was: > > > > > (2) lmer does not give p values or confidence intervals for the fixed > >effects. How we are to interpret the estimates given that no p value or CI > >is > >given for the estimates? > > > > Think about it. A statistical analysis -- ANY statistical analysis -- > > treats the data in isolation: it is not informed by physics, > > thermodynamics, biology, other similar data, prior experience, or,
Re: [R] lme4a package loading error
Thank you for pointing ou R-forge. I tried link from R-forge for lme4a, it doesn't work at the time I tried (Returned "PAGE NOT FOUND". However, the link for lme4b worked, and I installed lme4b package which can be loaded successfully. lme4b has lmer1() instead of lmer(). However, when trying to run lmer1(), it will tell you this warning message: Warning message: model.Matrix() has been moved from package 'Matrix' to the new package 'MatrixModels' which is now loaded (if installed). Its use from package 'Matrix' is deprecated. Do use it from 'MatrixModels' instead. I think this may be why I got error message when I try to load lme4a, which may have not been updated to look for MatrixModels package instead of Matrix package. The reason I posted to both R and R-mixed-models mailing list is that it seems that this question is more suitable to R-mixed-models, but response there is pretty slow,.. Thanks, John ----- Original Message From: Gavin Simpson To: array chip Cc: John Sorkin ; r-help@r-project.org; Bert Gunter Sent: Fri, September 10, 2010 10:46:16 AM Subject: Re: lme4a package loading error On Fri, 2010-09-10 at 10:23 -0700, array chip wrote: > Thanks for reminding this. So I found lme4a package from Doug's UserR!2010 > presentation folder: > http://lme4.r-forge.r-project.org/slides/2010-07-20-Gaithersburg/pkg/ What is wrong with the one on the packages tab of the lme4 project page: https://r-forge.r-project.org/R/?group_id=60 ? You might need to make sure you have the latest Matrix as well to run lme4a. Update Matrix via update.packages() or install the latest version from r-forge and see if that helps. Also, try not to cross-post to multiple lists. Stick with one, or move the thread onto the new list. HTH G > However, after installation, I got the following error message when trying to > load the library: > > library(Matrix) > > library(Rcpp) > > library(minqa) > > library(lme4a) > Error : classes "modelMatrix", "denseModelMatrix", "sparseModelMatrix", > "ddenseModelMatrix", "dsparseModelMatrix", "predModule", "dPredModule", > "sPredModule", "respModule", "glmRespMod", "nlsRespMod" are not exported by > 'namespace:Matrix' > Error: package/namespace load failed for 'lme4a' > > Here is my sessionInfo() > > sessionInfo() > R version 2.11.1 (2010-05-31) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 > > > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > > > > > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] minqa_1.1.9 Rcpp_0.8.6 Matrix_0.999375-43 lattice_0.18-8 > > loaded via a namespace (and not attached): > [1] grid_2.11.1 nlme_3.1-96 splines_2.11.1 stats4_2.11.1 tools_2.11.1 > > Any suggestions would be appreciated. > > John > > > > > > - Original Message > From: Gavin Simpson > To: array chip > Cc: John Sorkin ; r-help@r-project.org; Bert >Gunter > > > Sent: Fri, September 10, 2010 10:00:15 AM > Subject: Re: [R] lmer fixed effects, SE, t . . . and p > > On Fri, 2010-09-10 at 09:51 -0700, array chip wrote: > > But as far as I know, profile() seems to be de-activated in the lme4 package. > > It is beta software. The lme4a version of the lme4 "package" might have > had profile re-enabled, IIRC. > > G > > > - Original Message > > From: Gavin Simpson > > To: John Sorkin > > Cc: r-help@r-project.org; Bert Gunter > > Sent: Fri, September 10, 2010 2:05:37 AM > > Subject: Re: [R] lmer fixed effects, SE, t . . . and p > > > > On Thu, 2010-09-09 at 23:40 -0400, John Sorkin wrote: > > > Bert, > > > I appreciate you comments, and I have read Doug Bates writing about p > > > values in mixed effects regression. It is precisely because I read > > > Doug's material that I asked "how are we to interpret the estimates" > > > rather than "how can we compute a p value". My question is a simple > > > question whose answer is undoubtedly complex, but one that needs an > > > answer. Without p values, or confidence intervals, I am not certain > > > what to make of the results of my analysis. Does my analysis suggest, > > > or does it not suggest that there is a relation between time and y? If &
[R] xyplot legends
Hi all, I When I plot both lines and points using type=c('l', 'p') in xyplot(), if I want to include in legend both of them using keys=list(lines=list(col=1:3), points=list(pch=1:3)), the lines and points are plotted side by side in legend. Is there anyway to plot the points in the middle of the lines instead? that is the default is plotting like this: - o but I want something like this: ---o--- Thanks for any suggestions John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xyplot axis line width
Hi, another question: is there any argument that controls the line width of axis box of xyplot()? I tried lwd=2 or lwd.axis=2 in xyplot() or within scales=list() argument, without success. Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xyplot legends
Thanks David. It almost does what I wanted, except it's plotting the point characters 3 time for each line (left, middle and right): o---o---o I can live with that if there is no way to get rid of the point characters at the ends. Thanks very much! John - Original Message From: David Winsemius To: array chip Cc: r-help@r-project.org Sent: Mon, September 13, 2010 4:05:04 PM Subject: Re: [R] xyplot legends On Sep 13, 2010, at 6:25 PM, array chip wrote: > Hi all, I > > When I plot both lines and points using type=c('l', 'p') in xyplot(), if I want > to include in legend both of them using keys=list(lines=list(col=1:3), > points=list(pch=1:3)), the lines and points are plotted side by side in legend. > Is there anyway to plot the points in the middle of the lines instead? It's "key", not "keys" See if this is closer to what you had in mind: key=list(type="o", lines=list(col=1:3,pch=1:3, size=4 ) > > > that is the default is plotting like this: > - o > > but I want something like this: > ---o--- The best I could do was: -o--o--o- --David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xyplot axis line width
Thanks for the hint. But I want to change the line thickness, mgp() concerns the the spacing. Also I think par() doesn't have any effect on lattice plots. It seems to be an easy thing to make axis line thicker, anyone has nay suggestions? Thanks John - Original Message From: Daisy Englert Duursma To: array chip Cc: r-help@r-project.org Sent: Mon, September 13, 2010 8:05:53 PM Subject: Re: [R] xyplot axis line width check out ?par for all the details on plotting ‘mgp’ The margin line (in ‘mex’ units) for the axis title, axis labels and axis line. Note that ‘mgp[1]’ affects ‘title’ whereas ‘mgp[2:3]’ affect ‘axis’. The default is ‘c(3, 1, 0)’. On Tue, Sep 14, 2010 at 8:56 AM, array chip wrote: > Hi, another question: is there any argument that controls the line width of >axis > box of xyplot()? I tried lwd=2 or lwd.axis=2 in xyplot() or within >scales=list() > argument, without success. > > Thanks > > John > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xyplot axis line width
Thank you Deepayan. This is exactly what I needed. John - Original Message From: Deepayan Sarkar To: array chip Cc: r-help@r-project.org Sent: Tue, September 14, 2010 4:52:29 AM Subject: Re: [R] xyplot axis line width On Tue, Sep 14, 2010 at 4:26 AM, array chip wrote: > Hi, another question: is there any argument that controls the line width of >axis > box of xyplot()? I tried lwd=2 or lwd.axis=2 in xyplot() or within >scales=list() > argument, without success. xyplot(1:10 ~ 1:10, par.settings = list(axis.line = list(lwd = 2))) If you do not want this to affect the tick marks as well, then you additionally need scales = list(lwd = 1) -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lmer() vs. lme() gave different variance component estimates
Hi, I asked this on mixed model mailing list, but that list is not very active, so I'd like to try the general R mailing list. Sorry if anyone receives the double post. Hi, I have a dataset of animals receiving some eye treatments. There are 8 treatments, each animal's right and left eye was measured with some scores (ranging from 0 to 7) 4 times after treatment. So there are nesting groups eyes within animal. Dataset attached > dat<-read.table("dat.txt",sep='\t',header=T,row.names=1) > dat$id<-factor(dat$id) > str(dat) 'data.frame': 640 obs. of 5 variables: $ score: int 0 2 0 7 4 7 0 2 0 7 ... $ id : Factor w/ 80 levels "1","3","6","10",..: 7 48 66 54 18 26 38 52 39 63 ... $ rep : int 1 1 1 1 1 1 1 1 1 1 ... $ eye : Factor w/ 2 levels "L","R": 2 2 2 2 2 2 2 2 2 2 ... $ trt : Factor w/ 8 levels "A","B","C","Control",..: 1 1 1 1 1 1 1 1 1 1 ... I fit a mixed model using both lmer() from lme4 package and lme() from nlme package: > lmer(score~trt+(1|id/eye),dat) Linear mixed model fit by REML Formula: score ~ trt + (1 | id/eye) Data: dat AIC BIC logLik deviance REMLdev 446.7 495.8 -212.4430.9 424.7 Random effects: Groups NameVariance Std.Dev. eye:id (Intercept) 6.9208e+00 2.630742315798 id (Intercept) 1.4471e-16 0.00012030 Residual 1.8750e-02 0.136930641909 Number of obs: 640, groups: eye:id, 160; id, 80 > summary(lme(score~trt, random=(~1|id/eye), dat)) Linear mixed-effects model fit by REML Data: dat AIC BIClogLik 425.1569 474.0947 -201.5785 Random effects: Formula: ~1 | id (Intercept) StdDev:1.873576 Formula: ~1 | eye %in% id (Intercept) Residual StdDev:1.896126 0.1369306 As you can see, the variance components estimates of random effects are quite different between the 2 model fits. From the data, I know that the variance component for "id" can't be near 0, which is what lmer() fit produced, so I think the lme() fit is correct while lmer() fit is off. This can also be seen from AIC, BIC etc. lme() fit has better values than lmer() fit. I guess this might be due to lmer() didn't converge very well, is there anyway to adjust to make lmer() converge better to get similar results as lme()? Thanks John score id rep eye trt 1 0 15 1 R A 2 2 72 1 R A 3 0 102 1 R A 4 7 82 1 R A 5 4 28 1 R A 6 7 42 1 R A 7 0 60 1 R A 8 2 79 1 R A 9 0 61 1 R A 10 7 95 1 R A 11 0 15 2 R A 12 2 72 2 R A 13 0 102 2 R A 14 7 82 2 R A 15 4 28 2 R A 16 7 42 2 R A 17 0 60 2 R A 18 2 79 2 R A 19 0 61 2 R A 20 7 95 2 R A 21 0 15 3 R A 22 2 72 3 R A 23 0 102 3 R A 24 7 82 3 R A 25 4 28 3 R A 26 7 42 3 R A 27 0 60 3 R A 28 2 79 3 R A 29 0 61 3 R A 30 7 95 3 R A 31 0 15 4 R A 32 2 72 4 R A 33 0 102 4 R A 34 7 82 4 R A 35 4 28 4 R A 36 7 42 4 R A 37 0 60 4 R A 38 2 79 4 R A 39 0 61 4 R A 40 7 95 4 R A 41 3 15 1 L A 42 0 72 1 L A 43 0 102 1 L A 44 5 82 1 L A 45 0 28 1 L A 46 7 42 1 L A 47 0 60 1 L A 48 0 79 1 L A 49 0 61 1 L A 50 7 95 1 L A 51 3 15 2 L A 52 0 72 2 L A 53 0 102 2 L A 54 5 82 2 L A 55 0 28 2 L A 56 7 42 2 L A 57 0 60 2 L A 58 0 79 2 L A 59 0 61 2 L A 60 7 95 2 L A 61 3 15 3 L A 62 0 72 3 L A 63 0 102 3 L A 64 5 82 3
Re: [R] lmer() vs. lme() gave different variance component estimates
Thank you Peter. Actually 3 people from mixed model mailing list tried my code using lmer(). They got the same results as what I got from lme4(). So they couldn't replicate my lmer() results: Random effects: Groups NameVariance Std.Dev. eye:id (Intercept) 3.59531 1.89613 id (Intercept) 3.51025 1.87357 Residual 0.01875 0.13693 Number of obs: 640, groups: eye:id, 160; id, 80 The only difference they can think of is they are using Mac and FreeBSD while mine is PC. I can't imagine this can be the reason. I re-install lme4 package, but still got weird results with lmer(). Per your suggestion, here is the results for aov() summary(aov(score~trt+Error(id/eye), data=dat)) Error: id Df Sum Sq Mean Sq F valuePr(>F) trt7 1353.6 193.378 4.552 0.0002991 *** Residuals 72 3058.7 42.482 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Error: id:eye Df Sum Sq Mean Sq F value Pr(>F) Residuals 80 115214.4 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Residuals 480 9 0.01875 As can be seen, thr within subject variance estimate (0.01875) is the same between aov, lmer and lme. But I am not sure how to relate results between aov and lmer/lme for the other 2 variance components (id and eye%in%id). Thanks John - Original Message From: Peter Dalgaard To: array chip Cc: r-help@r-project.org Sent: Fri, September 17, 2010 1:05:27 PM Subject: Re: [R] lmer() vs. lme() gave different variance component estimates On 09/17/2010 09:14 PM, array chip wrote: > Hi, I asked this on mixed model mailing list, but that list is not very > active, > > so I'd like to try the general R mailing list. Sorry if anyone receives the > double post. > > > Hi, I have a dataset of animals receiving some eye treatments. There are 8 > > treatments, each animal's right and left eye was measured with some scores > (ranging from 0 to 7) 4 times after treatment. So there are nesting groups > eyes > > within animal. Dataset attached > >> dat<-read.table("dat.txt",sep='\t',header=T,row.names=1) >> dat$id<-factor(dat$id) >> str(dat) > 'data.frame': 640 obs. of 5 variables: > $ score: int 0 2 0 7 4 7 0 2 0 7 ... > $ id : Factor w/ 80 levels "1","3","6","10",..: 7 48 66 54 18 26 38 52 39 > 63 > ... > $ rep : int 1 1 1 1 1 1 1 1 1 1 ... > $ eye : Factor w/ 2 levels "L","R": 2 2 2 2 2 2 2 2 2 2 ... > $ trt : Factor w/ 8 levels "A","B","C","Control",..: 1 1 1 1 1 1 1 1 1 1 ... > > I fit a mixed model using both lmer() from lme4 package and lme() from nlme > package: > >> lmer(score~trt+(1|id/eye),dat) > > Linear mixed model fit by REML > Formula: score ~ trt + (1 | id/eye) >Data: dat >AIC BIC logLik deviance REMLdev > 446.7 495.8 -212.4430.9 424.7 > Random effects: > Groups NameVariance Std.Dev. > eye:id (Intercept) 6.9208e+00 2.630742315798 > id (Intercept) 1.4471e-16 0.00012030 > Residual 1.8750e-02 0.136930641909 > Number of obs: 640, groups: eye:id, 160; id, 80 > >> summary(lme(score~trt, random=(~1|id/eye), dat)) > > Linear mixed-effects model fit by REML > Data: dat >AIC BIClogLik > 425.1569 474.0947 -201.5785 > > Random effects: > Formula: ~1 | id > (Intercept) > StdDev:1.873576 > > Formula: ~1 | eye %in% id > (Intercept) Residual > StdDev:1.896126 0.1369306 > > As you can see, the variance components estimates of random effects are quite > different between the 2 model fits. From the data, I know that the variance > component for "id" can't be near 0, which is what lmer() fit produced, so I > think the lme() fit is correct while lmer() fit is off. This can also be seen > from AIC, BIC etc. lme() fit has better values than lmer() fit. > > > I guess this might be due to lmer() didn't converge very well, is there > anyway > to adjust to make lmer() converge better to get similar results as lme()? That's your guess... I'd be more careful about jumping to conclusions. If this is a balanced data set, perhaps you could supply the result of summary(aov(score~trt+Error(id/eye), data=dat)) The correct estimates should be computable from the ANOVA table. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmer() vs. lme() gave different variance component estimates
Thank you Peter for your explanation of relationship between aov and lme. It makes perfect sense. When you said "you might have computed the average of all 8 measurements on each animal and computed a 1-way ANOVA" for treatment effect, would this be the case for balanced design, or it is also true for unbalanced data? Another question is if 1-way ANOVA is equivalent to mixed model for testing treatment effect, what would be reason why mixed model is used? Just to estimate the variance components? If the interest is not in the estimation of variance components, then there is no need to run mixed models to test treatment effects? And my last question is I am glad to find that glht() from multcomp package works well with a lmer() fit for multiple comparisons. Given Professor Bates's view that denominator degree's of freedom is not well defined in mixed models, are the results from glht() reasonable/meaningful? If not, will the suggested 1-way ANOVA used together with glht() give us correct post-hoc multiple comparsion results? Thank you very much! John - Original Message From: Peter Dalgaard To: array chip Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org Sent: Sat, September 18, 2010 1:35:45 AM Subject: Re: [R] lmer() vs. lme() gave different variance component estimates For a nested design, the relation is quite straightforward: The residual MS are the variances of sample means scaled to be comparable with the residuals (so that in the absense of random components, all MS are equal to within the F-ratio variability). So to get the id:eye variance component, subtract the Within MS from the id:eye MS and divide by the number of replicates (4 in this case since you have 640 observations on 160 eyes) (14.4 - 0.01875)/4 = 3.59, and similarly, the id variance is the MS for id minus that for id:eye scaled by 8: (42.482-14.4)/8 = 3.51. I.e. it is reproducing the lmer results above, but of course not those from your original post. (Notice, by the way, that if you are only interested in the treatment effect, you might as well have computed the average of all 8 measurements on each animal and computed a 1-way ANOVA). -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmer() vs. lme() gave different variance component estimates
Thank you Peter and Ben for your comments. John - Original Message From: Peter Dalgaard To: array chip Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org Sent: Mon, September 20, 2010 12:28:43 PM Subject: Re: [R] lmer() vs. lme() gave different variance component estimates On 09/20/2010 08:09 PM, array chip wrote: > Thank you Peter for your explanation of relationship between aov and lme. It > makes perfect sense. > > > When you said "you might have computed the average of all 8 > measurements on each animal and computed a 1-way ANOVA" for treatment effect, > would this be the case for balanced design, or it is also true for unbalanced > data? It is only exactly true for a balanced design, although it can be a practical expedient in nearly-balanced cases, especially if there is a clearly dominant animal variation. In strongly unbalanced data, you get reduced efficiency because animals with less data should be downweighted (not proportionally if there is substantial between-animal variation, though). And of course the whole thing relies on the fact that you have individuals nested in treatment (no animals had multiple treatments) > > Another question is if 1-way ANOVA is equivalent to mixed model for testing > treatment effect, what would be reason why mixed model is used? Just to >estimate > > the variance components? If the interest is not in the estimation of variance > components, then there is no need to run mixed models to test treatment >effects? Not too far off the mark. In more complex cases, there is the advantage that the mixed model helps figure out a sensible analysis for you. > And my last question is I am glad to find that glht() from multcomp package > works well with a lmer() fit for multiple comparisons. Given Professor > Bates's > view that denominator degree's of freedom is not well defined in mixed > models, > are the results from glht() reasonable/meaningful? If not, will the suggested > 1-way ANOVA used together with glht() give us correct post-hoc multiple > comparsion results? I think Doug's view is that DFs are not _reliably_estimated_ with any of the current procedures. In the balanced cases, they are very well defined (well, give or take the issues with "negative variances"), and I would expect glht() to give meaningful results. Do check the residuals for at least approximate normality, though. > > Thank you very much! > > John > > > > > > - Original Message > From: Peter Dalgaard > To: array chip > Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org > Sent: Sat, September 18, 2010 1:35:45 AM > Subject: Re: [R] lmer() vs. lme() gave different variance component estimates > > > For a nested design, the relation is quite straightforward: The residual > MS are the variances of sample means scaled to be comparable with the > residuals (so that in the absense of random components, all > MS are equal to within the F-ratio variability). So to get the id:eye > variance component, subtract the Within MS from the id:eye MS and divide > by the number of replicates (4 in this case since you have 640 > observations on 160 eyes) (14.4 - 0.01875)/4 = 3.59, and similarly, the > id variance is the MS for id minus that for id:eye scaled by 8: > (42.482-14.4)/8 = 3.51. > > I.e. it is reproducing the lmer results above, but of course not those > from your original post. > > (Notice, by the way, that if you are only interested in the treatment > effect, you might as well have computed the average of all 8 > measurements on each animal and computed a 1-way ANOVA). > -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to make point character thicker in xyplot?
Is there anyway to make plotting point character being thicker in xyplot? I mean not larger which can achieved by "cex=2", but thicker. I tried lwd=2, but it didn't work. I know "lwd" works in regular plot() not only for lines, but also for points. For example plot(1:10, lwd=2) Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to make point character thicker in xyplot?
Thank you Greg. I also got it work by using panel.points (lwd=2) instead of using panel.xyplot() - Original Message From: Greg Snow To: array chip ; "r-help@r-project.org" Sent: Thu, September 23, 2010 2:48:06 PM Subject: RE: [R] how to make point character thicker in xyplot? There is probably a simpler way, but if you want full customization, look at panel.my.symbols in the TeachingDemos package. -Original Message----- From: array chip Sent: Thursday, September 23, 2010 2:52 PM To: r-help@r-project.org Subject: [R] how to make point character thicker in xyplot? Is there anyway to make plotting point character being thicker in xyplot? I mean not larger which can achieved by "cex=2", but thicker. I tried lwd=2, but it didn't work. I know "lwd" works in regular plot() not only for lines, but also for points. For example plot(1:10, lwd=2) Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to make point character thicker in the legend of xyplot?
Now I got point character thicker using panel.points(lwd=2), But I can't make it thicker in the legend of the plot. Here is an example: xyplot(1:10~1:10,groups=rep(1:5,2),col=1:2,pch=0:1,type='p',cex=2,panel=panel.points,lwd=3, key=list(x=0.7,y=0.2,corner=c(0,0),type='p', points=list(col=1:2,pch=0:1,cex=2,lwd=2), text=list(lab=c('A','B'),cex=1.5,font=2))) Any suggestions? Thanks John - Original Message From: array chip To: Greg Snow Cc: r-help@r-project.org Sent: Thu, September 23, 2010 4:03:00 PM Subject: Re: [R] how to make point character thicker in xyplot? Thank you Greg. I also got it work by using panel.points (lwd=2) instead of using panel.xyplot() - Original Message From: Greg Snow To: array chip ; "r-help@r-project.org" Sent: Thu, September 23, 2010 2:48:06 PM Subject: RE: [R] how to make point character thicker in xyplot? There is probably a simpler way, but if you want full customization, look at panel.my.symbols in the TeachingDemos package. -Original Message- From: array chip Sent: Thursday, September 23, 2010 2:52 PM To: r-help@r-project.org Subject: [R] how to make point character thicker in xyplot? Is there anyway to make plotting point character being thicker in xyplot? I mean not larger which can achieved by "cex=2", but thicker. I tried lwd=2, but it didn't work. I know "lwd" works in regular plot() not only for lines, but also for points. For example plot(1:10, lwd=2) Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to make point character thicker in the legend of xyplot?
Yes, it does what I want. Thank you Peter! Just wondering what else grid.pars controls? not just the symbol in legend, right? John - Original Message From: Peter Ehlers To: array chip Cc: "r-help@r-project.org" Sent: Thu, September 23, 2010 4:34:44 PM Subject: Re: [R] how to make point character thicker in the legend of xyplot? On 2010-09-23 17:15, array chip wrote: > Now I got point character thicker using panel.points(lwd=2), But I can't make >it > thicker in the legend of the plot. Here is an example: > >xyplot(1:10~1:10,groups=rep(1:5,2),col=1:2,pch=0:1,type='p',cex=2,panel=panel.points,lwd=3, >, > > key=list(x=0.7,y=0.2,corner=c(0,0),type='p', > points=list(col=1:2,pch=0:1,cex=2,lwd=2), > text=list(lab=c('A','B'),cex=1.5,font=2))) > > Any suggestions? > You can add this line to your xyplot call: par.settings = list(grid.pars = list(lwd = 2)), -Peter Ehlers > Thanks > > John > > > > - Original Message > From: array chip > To: Greg Snow > Cc: r-help@r-project.org > Sent: Thu, September 23, 2010 4:03:00 PM > Subject: Re: [R] how to make point character thicker in xyplot? > > Thank you Greg. I also got it work by using panel.points (lwd=2) instead of > using panel.xyplot() > > > > > > > - Original Message > From: Greg Snow > To: array chip; "r-help@r-project.org" > > Sent: Thu, September 23, 2010 2:48:06 PM > Subject: RE: [R] how to make point character thicker in xyplot? > > There is probably a simpler way, but if you want full customization, look at > panel.my.symbols in the TeachingDemos package. > > -Original Message- > From: array chip > Sent: Thursday, September 23, 2010 2:52 PM > To: r-help@r-project.org > Subject: [R] how to make point character thicker in xyplot? > > > Is there anyway to make plotting point character being thicker in xyplot? I >mean > not larger which can achieved by "cex=2", but thicker. I tried lwd=2, but it > didn't work. I know "lwd" works in regular plot() not only for lines, but also > for points. For example > > plot(1:10, lwd=2) > > Thanks > > John > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] scientific vs. fixed notation in xyplot()
Hi I am using xyplot() to plot on the log scale by using scale=list(log=T) argument. For example: xyplot(1:10~1:10, scales=list(log=T)) But the axis labels are printed as scientific notation (10^0.0, etc), instead of fixed notation. How can I change that to fixed notation? options(scipen=4) doesn't work on xyplot() Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scientific vs. fixed notation in xyplot()
Thanks for the suggestion. But my example is just an example, I would prefer to have some generalized solution, like what options(scipen=4) does in general graphics, which usually gave pretty axis labels as well. Any suggestions? Thanks John From: Henrique Dallazuanna Cc: r-help@r-project.org Sent: Mon, September 27, 2010 12:16:31 PM Subject: Re: [R] scientific vs. fixed notation in xyplot() Try this: xyplot(1:10~1:10, scales=list(log = T, labels = round(log(1:10), 4))) Hi I am using xyplot() to plot on the log scale by using scale=list(log=T) >argument. For example: > >xyplot(1:10~1:10, scales=list(log=T)) > >But the axis labels are printed as scientific notation (10^0.0, etc), instead of >fixed notation. How can I change that to fixed notation? > >options(scipen=4) doesn't work on xyplot() > >Thanks > >John > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scientific vs. fixed notation in xyplot()
Hi, I found an old thread which refered to 2 examples (Fig 8.4 & Fig 8.5) from Dr. Sarkar's lattice book that address this problem using "xscale.components" and "yscale.components": http://lmdvr.r-forge.r-project.org/figures/figures.html I am able to change the label of x-axis, but still can't change the y-axis label. I think it's straight forward, but can't figure out what I am missing in the code, Can anyone suggest? logTicks <- function (lim, loc = c(1, 5)) { ii <- floor(log10(range(lim))) + c(-1, 2) main <- 10^(ii[1]:ii[2]) r <- as.numeric(outer(loc, main, "*")) r[lim[1] <= r & r <= lim[2]] } xscale.components.log10 <- function(lim, ...) { ans <- xscale.components.default(lim = lim, ...) tick.at <- logTicks(10^lim,loc=1) ans$bottom$ticks$at <- log(tick.at, 10) ans$bottom$labels$at <- log(tick.at, 10) ans$bottom$labels$labels <- as.character(tick.at) ans } yscale.components.log10 <- function(lim, ...) { ans <- yscale.components.default(lim = lim, ...) tick.at <- logTicks(10^lim,loc=1) ans$bottom$ticks$at <- log(tick.at, 10) ans$bottom$labels$at <- log(tick.at, 10) ans$bottom$labels$labels <- as.character(tick.at) ans } xyplot(1:1~1:1, scales=list(log=T),xscale.components = xscale.components.log10,yscale.components = yscale.components.log10) Any suggestions? Thanks John - Original Message From: Don McKenzie To: Henrique Dallazuanna Cc: R-help Forum Sent: Mon, September 27, 2010 1:00:04 PM Subject: Re: [R] scientific vs. fixed notation in xyplot() This is quite elegant (thanks) and brings up a problem I could not solve awhile back, although Dr. Sarkar did his best to help. How do I do the same thing in a panel plot? e.g., toy example temp.df <- data.frame(X=seq(1,100,by=1),Y=seq(1,50.5,by=.5),class=rep(c("A","B"),each=50)) xyplot(Y ~ X | class,data=temp.df,scales=list(x=round(log(1:100), 4),y=round(log(1:50.5), 4),log=T)) gives me the right points on the page but still gives axis labels in scientific notation. If I try to specify "labels" as a list I get an error message > xyplot(Y ~ X | class,data=temp.df,scales=list(log = T, labels = >list(x=round(log(1:100), 4),y=round(log(1:50.5), 4 Error in construct.scales(log = TRUE, labels = list(x = c(0, 0.6931, 1.0986, : the at and labels components of scales may not be lists when relation = same Syntax problem in this last command? Thanks On 27-Sep-10, at 12:16 PM, Henrique Dallazuanna wrote: > Try this: > > xyplot(1:10~1:10, scales=list(log = T, labels = round(log(1:10), 4))) > > > On Mon, Sep 27, 2010 at 4:10 PM, array chip wrote: > >> Hi I am using xyplot() to plot on the log scale by using scale=list(log=T) >> argument. For example: >> >> xyplot(1:10~1:10, scales=list(log=T)) >> >> But the axis labels are printed as scientific notation (10^0.0, etc), >> instead of >> fixed notation. How can I change that to fixed notation? >> >> options(scipen=4) doesn't work on xyplot() >> >> Thanks >> >> John > > --Henrique Dallazuanna > Curitiba-Paraná-Brasil > 25° 25' 40" S 49° 16' 22" O > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Don McKenzie, Research Ecologist Pacific WIldland Fire Sciences Lab US Forest Service Affiliate Professor School of Forest Resources, College of the Environment CSES Climate Impacts Group University of Washington desk: 206-732-7824 cell: 206-321-5966 d...@uw.edu donaldmcken...@fs.fed.us __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scientific vs. fixed notation in xyplot()
I found where the problem is with y-axis, the yscale.components.log10 should be yscale.components.log10 <- function(lim, ...) { ans <- yscale.components.default(lim = lim, ...) tick.at <- logTicks(10^lim,loc=1) ans$left$ticks$at <- log(tick.at, 10) ans$left$labels$at <- log(tick.at, 10) ans$left$labels$labels <- as.character(tick.at) ans } - Original Message ---- From: array chip To: Don McKenzie ; Henrique Dallazuanna Cc: R-help Forum Sent: Mon, September 27, 2010 3:09:20 PM Subject: Re: [R] scientific vs. fixed notation in xyplot() Hi, I found an old thread which refered to 2 examples (Fig 8.4 & Fig 8.5) from Dr. Sarkar's lattice book that address this problem using "xscale.components" and "yscale.components": http://lmdvr.r-forge.r-project.org/figures/figures.html I am able to change the label of x-axis, but still can't change the y-axis label. I think it's straight forward, but can't figure out what I am missing in the code, Can anyone suggest? logTicks <- function (lim, loc = c(1, 5)) { ii <- floor(log10(range(lim))) + c(-1, 2) main <- 10^(ii[1]:ii[2]) r <- as.numeric(outer(loc, main, "*")) r[lim[1] <= r & r <= lim[2]] } xscale.components.log10 <- function(lim, ...) { ans <- xscale.components.default(lim = lim, ...) tick.at <- logTicks(10^lim,loc=1) ans$bottom$ticks$at <- log(tick.at, 10) ans$bottom$labels$at <- log(tick.at, 10) ans$bottom$labels$labels <- as.character(tick.at) ans } yscale.components.log10 <- function(lim, ...) { ans <- yscale.components.default(lim = lim, ...) tick.at <- logTicks(10^lim,loc=1) ans$bottom$ticks$at <- log(tick.at, 10) ans$bottom$labels$at <- log(tick.at, 10) ans$bottom$labels$labels <- as.character(tick.at) ans } xyplot(1:1~1:1, scales=list(log=T),xscale.components = xscale.components.log10,yscale.components = yscale.components.log10) Any suggestions? Thanks John - Original Message From: Don McKenzie To: Henrique Dallazuanna Cc: R-help Forum Sent: Mon, September 27, 2010 1:00:04 PM Subject: Re: [R] scientific vs. fixed notation in xyplot() This is quite elegant (thanks) and brings up a problem I could not solve awhile back, although Dr. Sarkar did his best to help. How do I do the same thing in a panel plot? e.g., toy example temp.df <- data.frame(X=seq(1,100,by=1),Y=seq(1,50.5,by=.5),class=rep(c("A","B"),each=50)) xyplot(Y ~ X | class,data=temp.df,scales=list(x=round(log(1:100), 4),y=round(log(1:50.5), 4),log=T)) gives me the right points on the page but still gives axis labels in scientific notation. If I try to specify "labels" as a list I get an error message > xyplot(Y ~ X | class,data=temp.df,scales=list(log = T, labels = >list(x=round(log(1:100), 4),y=round(log(1:50.5), 4 Error in construct.scales(log = TRUE, labels = list(x = c(0, 0.6931, 1.0986, : the at and labels components of scales may not be lists when relation = same Syntax problem in this last command? Thanks On 27-Sep-10, at 12:16 PM, Henrique Dallazuanna wrote: > Try this: > > xyplot(1:10~1:10, scales=list(log = T, labels = round(log(1:10), 4))) > > > On Mon, Sep 27, 2010 at 4:10 PM, array chip wrote: > >> Hi I am using xyplot() to plot on the log scale by using scale=list(log=T) >> argument. For example: >> >> xyplot(1:10~1:10, scales=list(log=T)) >> >> But the axis labels are printed as scientific notation (10^0.0, etc), >> instead of >> fixed notation. How can I change that to fixed notation? >> >> options(scipen=4) doesn't work on xyplot() >> >> Thanks >> >> John > > --Henrique Dallazuanna > Curitiba-Paraná-Brasil > 25° 25' 40" S 49° 16' 22" O > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Don McKenzie, Research Ecologist Pacific WIldland Fire Sciences Lab US Forest Service Affiliate Professor School of Forest Resources, College of the Environment CSES Climate Impacts Group University of Washington desk: 206-732-7824 cell: 206-321-5966 d...@uw.edu donaldmcken...@fs.fed.us __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-
Re: [R] scientific vs. fixed notation in xyplot()
Thank you Dr. Sarkar. yscale.components.log10.3 is pretty good choice. John - Original Message From: Deepayan Sarkar To: array chip Cc: Don McKenzie ; Henrique Dallazuanna ; R-help Forum Sent: Mon, September 27, 2010 8:07:15 PM Subject: Re: [R] scientific vs. fixed notation in xyplot() On Mon, Sep 27, 2010 at 5:28 PM, array chip wrote: > I found where the problem is with y-axis, the yscale.components.log10 should be > > yscale.components.log10 <- function(lim, ...) { > ans <- yscale.components.default(lim = lim, ...) > tick.at <- logTicks(10^lim,loc=1) > ans$left$ticks$at <- log(tick.at, 10) > ans$left$labels$at <- log(tick.at, 10) > ans$left$labels$labels <- as.character(tick.at) > ans } A few such functions are now available in latticeExtra. E.g., library(latticeExtra) xyplot(1:10~1:10, scales=list(log=T), yscale.components = yscale.components.log10.3) xyplot(1:10~1:10, scales=list(log=T), yscale.components = yscale.components.log10ticks) -Deepayan > > - Original Message > From: array chip > To: Don McKenzie ; Henrique Dallazuanna > > Cc: R-help Forum > Sent: Mon, September 27, 2010 3:09:20 PM > Subject: Re: [R] scientific vs. fixed notation in xyplot() > > Hi, I found an old thread which refered to 2 examples (Fig 8.4 & Fig 8.5) from > Dr. Sarkar's lattice book that address this problem using "xscale.components" > and "yscale.components": > > http://lmdvr.r-forge.r-project.org/figures/figures.html > > I am able to change the label of x-axis, but still can't change the y-axis > label. I think it's straight forward, but can't figure out what I am missing in > the code, Can anyone suggest? > > logTicks <- function (lim, loc = c(1, 5)) { > ii <- floor(log10(range(lim))) + c(-1, 2) > main <- 10^(ii[1]:ii[2]) > r <- as.numeric(outer(loc, main, "*")) > r[lim[1] <= r & r <= lim[2]] } > > > xscale.components.log10 <- function(lim, ...) { > ans <- xscale.components.default(lim = lim, ...) > tick.at <- logTicks(10^lim,loc=1) > ans$bottom$ticks$at <- log(tick.at, 10) > ans$bottom$labels$at <- log(tick.at, 10) > ans$bottom$labels$labels <- as.character(tick.at) > ans } > > yscale.components.log10 <- function(lim, ...) { > ans <- yscale.components.default(lim = lim, ...) > tick.at <- logTicks(10^lim,loc=1) > ans$bottom$ticks$at <- log(tick.at, 10) > ans$bottom$labels$at <- log(tick.at, 10) > ans$bottom$labels$labels <- as.character(tick.at) > ans } > > xyplot(1:1~1:1, scales=list(log=T),xscale.components = > xscale.components.log10,yscale.components = yscale.components.log10) > > Any suggestions? > > Thanks > > John > > > > > > > > - Original Message > From: Don McKenzie > To: Henrique Dallazuanna > Cc: R-help Forum > Sent: Mon, September 27, 2010 1:00:04 PM > Subject: Re: [R] scientific vs. fixed notation in xyplot() > > This is quite elegant (thanks) and brings up a problem I could not solve awhile > back, although Dr. Sarkar did his best to help. > How do I do the same thing in a panel plot? > > e.g., toy example > > temp.df <- > data.frame(X=seq(1,100,by=1),Y=seq(1,50.5,by=.5),class=rep(c("A","B"),each=50)) > xyplot(Y ~ X | class,data=temp.df,scales=list(x=round(log(1:100), > 4),y=round(log(1:50.5), 4),log=T)) > > gives me the right points on the page but still gives axis labels in scientific > notation. > > If I try to specify "labels" as a list I get an error message > >> xyplot(Y ~ X | class,data=temp.df,scales=list(log = T, labels = >>list(x=round(log(1:100), 4),y=round(log(1:50.5), 4 > > Error in construct.scales(log = TRUE, labels = list(x = c(0, 0.6931, 1.0986, : > the at and labels components of scales may not be lists when relation = same > > Syntax problem in this last command? > > Thanks > > > On 27-Sep-10, at 12:16 PM, Henrique Dallazuanna wrote: > >> Try this: >> >> xyplot(1:10~1:10, scales=list(log = T, labels = round(log(1:10), 4))) >> >> >> On Mon, Sep 27, 2010 at 4:10 PM, array chip wrote: >> >>> Hi I am using xyplot() to plot on the log scale by using scale=list(log=T) >>> argument. For example: >>> >>> xyplot(1:10~1:10, scales=list(log=T)) >>> >>> But the axis labels are printed as scientific notation (10^0.0, etc), >>> instead of >>> fixed notation. How can I change that to fixed notation? >>> >>> options(scipen=4) doesn't work on xyplot()
[R] add a new column to data frame
Hi, I am wondering if anyone can propose a simple/best way to do the following: Let's say I have a data frame dat <- cbind(expand.grid(mode=c('right','left'),time=0:3,id=c('p1','p2','p3')),y=c(3,5,rep(4,6),6,2,rep(3,6),4,4,rep(2,6))) dat mode time id y 1 right0 p1 3 2 left0 p1 5 3 right1 p1 4 4 left1 p1 4 5 right2 p1 4 6 left2 p1 4 7 right0 p2 6 8 left0 p2 2 9 right1 p2 3 10 left1 p2 3 11 right2 p2 3 12 left2 p2 3 13 right0 p3 4 14 left0 p3 4 15 right1 p3 2 16 left1 p3 2 17 right2 p3 2 18 left2 p3 2 Now I want to add a new column "type" to this data frame with values of either "left" or "right" for each "id" based on the following logic: For each "id", the value of "type" column is the value of "mode" for which the value of "y" is the maximum of "y" based on time=0. For example for id=p1, the value of "type" is "left" because for the 2 "y" values (3 & 5) based on time=0, mode=left has the bigger "y" value (5). But if the 2 "y" values are the same for "mode", then let type=right. In the end the new data frame is: mode time id y type 1 right0 p1 3 left 2 left0 p1 5 left 3 right1 p1 4 left 4 left1 p1 4 left 5 right2 p1 4 left 6 left2 p1 4 left 7 right0 p2 6 right 8 left0 p2 2 right 9 right1 p2 3 right 10 left1 p2 3 right 11 right2 p2 3 right 12 left2 p2 3 right 13 right0 p3 4 right 14 left0 p3 4 right 15 right1 p3 2 right 16 left1 p3 2 right 17 right2 p3 2 right 18 left2 p3 2 right Any suggestions would be appreciated. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] add a new column to data frame
Thank you Phil. The only problem with the code is when the 2 y values are equal, it should default to "right". Your code would default to the first maximum. But it's ok, I see the logic here and can figure it out. I wondered whether there is some one-line thing that can do this, I guess not possible. Thank you John From: Phil Spector Cc: r-help@r-project.org Sent: Fri, October 1, 2010 10:58:55 AM Subject: Re: [R] add a new column to data frame Here are two ways: dat0 = subset(dat,time==0) one = as.data.frame(as.table(by(dat0,dat0$id, function(x)as.character(x$mode)[which.max(x$y)]))) names(one) = c('id','type') merge(dat,one) two = sapply(split(dat0,dat0$id), function(x)as.character(x$mode)[which.max(x$y)]) merge(dat,data.frame(id=names(two),type=two)) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Fri, 1 Oct 2010, array chip wrote: > Hi, I am wondering if anyone can propose a simple/best way to do the following: > > Let's say I have a data frame > > dat <- >cbind(expand.grid(mode=c('right','left'),time=0:3,id=c('p1','p2','p3')),y=c(3,5,rep(4,6),6,2,rep(3,6),4,4,rep(2,6))) >) > > > dat >mode time id y > 1 right0 p1 3 > 2 left0 p1 5 > 3 right1 p1 4 > 4 left1 p1 4 > 5 right2 p1 4 > 6 left2 p1 4 > 7 right0 p2 6 > 8 left0 p2 2 > 9 right1 p2 3 > 10 left1 p2 3 > 11 right2 p2 3 > 12 left2 p2 3 > 13 right0 p3 4 > 14 left0 p3 4 > 15 right1 p3 2 > 16 left1 p3 2 > 17 right2 p3 2 > 18 left2 p3 2 > > Now I want to add a new column "type" to this data frame with values of either > "left" or "right" for each "id" based on the following logic: > > For each "id", the value of "type" column is the value of "mode" for which the > value of "y" is the maximum of "y" based on time=0. For example for id=p1, the > value of "type" is "left" because for the 2 "y" values (3 & 5) based on time=0, > mode=left has the bigger "y" value (5). But if the 2 "y" values are the same >for > "mode", then let type=right. > > In the end the new data frame is: > >mode time id y type > 1 right0 p1 3 left > 2 left0 p1 5 left > 3 right1 p1 4 left > 4 left1 p1 4 left > 5 right2 p1 4 left > 6 left2 p1 4 left > 7 right0 p2 6 right > 8 left0 p2 2 right > 9 right1 p2 3 right > 10 left1 p2 3 right > 11 right2 p2 3 right > 12 left2 p2 3 right > 13 right0 p3 4 right > 14 left0 p3 4 right > 15 right1 p3 2 right > 16 left1 p3 2 right > 17 right2 p3 2 right > 18 left2 p3 2 right > > Any suggestions would be appreciated. > > John > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to retrieve user coordinates in xyplot
Hi, is there a way to retrieve the extremes of the user coordinates of the plotting region, like what par("usr") does in general graphics? I'd like to use them to print additional texts at certain place inside each panel. Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] point characters THICKER in xyplot()
Hi, how can I make the point characters thicker (NOT larger) in xyplot when groups= argument is used? dat<-data.frame(x=1:100,y=1:100,group=rep(LETTERS[1:5],each=20)) ### lwd=2 doesn't work here xyplot(y~x,groups=group,data=dat,col=1:4,pch=1:4,lwd=2) ### lwd=2 works with panel.points(), but grouping is messed up! xyplot(y~x,groups=group,data=dat,col=1:4,pch=1:4, panel=function(...) {panel.points(...,lwd=2)}) ### group is correct with panel.superpose(), but lwd=2 doesn't work! xyplot(y~x,groups=group,data=dat,col=1:4,pch=1:4, panel=function(...) {panel.superpose(...,lwd=2)}) Any suggestions? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] type II & III test for mixed model
Hi, is there a package for getting type II or type III tests on mixed models (lme or lmer), just like what Anova() in car package does for aov, lm, etc.? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] clustering on scaled dataset or not?
Hi, just a general question: when we do hierarchical clustering, should we compute the dissimilarity matrix based on scaled dataset or non-scaled dataset? daisy() in cluster package allow standardizing the variables before calculating dissimilarity matrix; but dist() doesn't have that option at all. Appreciate if you can share your thoughts? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] logistic regression or not?
Hi, I have a dataset where the response for each person on one of the 2 treatments was a proportion (percentage of certain number of markers being positive), I also have the number of positive & negative markers available for each person. what is the best way to analyze this kind of data? I can think of analyzing this data using glm() with the attached dataset: test<-read.table('test.txt',sep='\t') fit<-glm(cbind(positive,total-positive)~treatment,test,family=binomial) summary(fit) anova(fit, test='Chisq') First, is this still called logistic regression or something else? I thought with logistic regression, the response variable is a binary factor? Second, then summary(fit) and anova(fit, test='Chisq') gave me different p values, why is that? which one should I use? Third, is there an equivalent model where I can use variable "percentage" instead of "positive" & "total"? Finally, what is the best way to analyze this kind of dataset where it's almost the same as ANOVA except that the response variable is a proportion (or success and failure)? Thanks John "treatment" "total" "positive" "percentage" "1" "exposed" 11 4 0.363636363636364 "2" "exposed" 10 4 0.4 "3" "exposed" 9 4 0.444 "4" "exposed" 7 4 0.571428571428571 "5" "exposed" 7 4 0.571428571428571 "6" "exposed" 6 5 0.833 "8" "exposed" 12 7 0.583 "9" "exposed" 8 5 0.625 "10""exposed" 13 12 0.923076923076923 "11""exposed" 10 5 0.5 "12""control" 10 1 0.1 "13""control" 11 2 0.181818181818182 "14""control" 8 0 0 "16""control" 12 1 0.0833 "15""control" 8 0 0 "17""control" 10 1 0.1 "18""control" 10 1 0.1 "19""control" 8 1 0.125 "20""control" 8 0 0 "21""control" 9 1 0.111 "22""control" 10 1 0.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression or not?
Thank you Ben, Steve and Peter. Ben, my last question was to see if there are other ways of analyzing this type of data where the response variable is a proportion, in addition to binomial regression. BTW, I also found the following is also an equivalent model directly using percentage: glm(log(percentage/(1-percentage))~treatment,data=test) Thanks John From: Ben Bolker To: r-h...@stat.math.ethz.ch Sent: Tue, December 21, 2010 5:08:34 AM Subject: Re: [R] logistic regression or not? array chip yahoo.com> writes: [snip] > I can think of analyzing this data using glm() with the attached dataset: > > test<-read.table('test.txt',sep='\t') > fit<-glm(cbind(positive,total-positive)~treatment,test,family=binomial) > summary(fit) > anova(fit, test='Chisq') > First, is this still called logistic regression or something else? I thought > with logistic regression, the response variable is a binary factor? Sometimes I've seen it called "binomial regression", or just "a binomial generalized linear model" > Second, then summary(fit) and anova(fit, test='Chisq') gave me different p > values, why is that? which one should I use? summary(fit) gives you p-values from a Wald test. anova() gives you tests based on the Likelihood Ratio Test. In general the LRT is more accurate. > Third, is there an equivalent model where I can use variable "percentage" > instead of "positive" & "total"? glm(percentage~treatment,weights=total,data=tests,family=binomial) is equivalent to the model you fitted above. > > Finally, what is the best way to analyze this kind of dataset > where it's almost the same as ANOVA except that the response variable > is a proportion (or success and failure)? Don't quite know what you mean here. How is the situation "almost the same as ANOVA" different from the situation you described above? Do you mean when there are multiple factors? or ??? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression or not?
Ben, thanks again. John From: Ben Bolker Cc: r-h...@stat.math.ethz.ch; S Ellison ; peter dalgaard Sent: Tue, December 21, 2010 9:26:29 AM Subject: Re: [R] logistic regression or not? On 10-12-21 12:20 PM, array chip wrote: > Thank you Ben, Steve and Peter. > > Ben, my last question was to see if there are other ways of analyzing > this type of data where the response variable is a proportion, in > addition to binomial regression. > > BTW, I also found the following is also an equivalent model directly > using percentage: > > glm(log(percentage/(1-percentage))~treatment,data=test) > > Thanks > > John > Yes, but this is a different model. The model you have here uses Gaussian errors (it is in fact an identical model, although not necessarily quite an identical algorithm (?), to just using lm(). It will fail if you have any percentages that are 0 or 1. See Stuart's comment about how things were done in the "old days". Beta regression (see e.g. the betareg package) is another way of handling analysis of proportions. > > > *From:* Ben Bolker > *To:* r-h...@stat.math.ethz.ch > *Sent:* Tue, December 21, 2010 5:08:34 AM > *Subject:* Re: [R] logistic regression or not? > > array chip yahoo.com <http://yahoo.com/>> writes: > > [snip] > >> I can think of analyzing this data using glm() with the attached dataset: >> >> test<-read.table('test.txt',sep='\t') >> fit<-glm(cbind(positive,total-positive)~treatment,test,family=binomial) >> summary(fit) >> anova(fit, test='Chisq') > >> First, is this still called logistic regression or something else? I > thought >> with logistic regression, the response variable is a binary factor? > > Sometimes I've seen it called "binomial regression", or just > "a binomial generalized linear model" > >> Second, then summary(fit) and anova(fit, test='Chisq') gave me > different p >> values, why is that? which one should I use? > > summary(fit) gives you p-values from a Wald test. > anova() gives you tests based on the Likelihood Ratio Test. > In general the LRT is more accurate. > >> Third, is there an equivalent model where I can use variable "percentage" >> instead of "positive" & "total"? > > glm(percentage~treatment,weights=total,data=tests,family=binomial) > > is equivalent to the model you fitted above. >> >> Finally, what is the best way to analyze this kind of dataset >> where it's almost the same as ANOVA except that the response variable >> is a proportion (or success and failure)? > > Don't quite know what you mean here. How is the situation "almost > the same as ANOVA" different from the situation you described above? > Do you mean when there are multiple factors? or ??? > > __ > R-help@r-project.org <mailto:R-help@r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RData size
Hi, I noticed a Rdata size issue that's puzzling to me. Attached find 2 example datasets in text file. Both are 100x5, so the sizes for both text file are the same. However, when I read them into R, the sizes are much different: tt<-as.matrix(read.table("tt.txt",header=T,row.names=1)) save(tt,file='tt.RData') tt.big<-as.matrix(read.table("tt.big.txt",header=T,row.names=1)) save(tt.big,file='tt.big.RData') "tt.RData" is 2KB while "tt.big.RData" is 5KB. This is not a big deal with the example datasets, but my real datasets are much larger, the difference is 35MB vs. 1MB for the RData objects. The difference between the 2 datasets above is that "tt.big" is a smoothed version of "tt", so there are a lot less unique values in tt.big than tt, I guess this is the reason for the difference in sizes of RData objects, can anyone confirm? Thanks John PC1 PC2 PC3 PC4 PC5 SNP_A-1677174 1.6838481.9758262.3556821.587903 1.157294 SNP_A-1718890 2.0614551.9265522.19278 2.0716532.712994 SNP_A-1678466 2.1051241.9758262.1284371.587903 1.157294 SNP_A-1676440 2.1051241.9758262.1284371.587903 1.157294 SNP_A-1662392 2.1051241.9758262.1284371.587903 1.157294 SNP_A-1685736 2.1051241.9758262.1284371.587903 1.157294 SNP_A-1681384 2.1051241.9758262.1284371.587903 1.157294 SNP_A-1668776 2.0614551.9265522.19278 1.5546990.616379 SNP_A-1642581 1.5606471.2978831.4581621.587903 1.157294 SNP_A-1669029 1.5606471.2978831.4581621.587903 1.157294 SNP_A-1723597 2.0614551.9265522.19278 1.5546990.616379 SNP_A-1718237 1.5606471.2978831.4581621.016043 1.157294 SNP_A-1748467 1.5606471.2978831.4581621.016043 1.157294 SNP_A-1728870 2.0614551.9265522.19278 1.5546990.616379 SNP_A-1669946 2.0614551.9265522.19278 1.5546990.616379 SNP_A-1708099 2.0614551.9265522.19278 1.5546990.616379 SNP_A-1716798 2.0614551.9265522.19278 1.5546990.616379 SNP_A-1701872 2.0614551.9265522.19278 1.5546990.616379 SNP_A-1705537 2.3656062.6521552.5829832.002516 1.023379 SNP_A-1683756 2.3656062.6521552.5829832.002516 1.023379 SNP_A-1697748 1.6250971.5435591.5434831.554699 0.616379 SNP_A-1699138 1.6250971.5435591.5434831.554699 0.616379 SNP_A-1696782 2.3656062.6521552.5829832.002516 1.398077 SNP_A-1750020 1.6250971.5435591.5434831.554699 0.616379 SNP_A-1673422 2.3656062.6521552.5829831.672339 1.398077 SNP_A-1670878 1.5691111.53458 1.3282691.6723392.015989 SNP_A-1743511 1.5691111.53458 1.3282691.6723392.015989 SNP_A-1646481 1.9624921.9290512.4172741.554699 2.752331 SNP_A-1736635 1.5691111.53458 1.3282691.6723393.106939 SNP_A-1666738 1.5691111.53458 2.1444731.6723393.106939 SNP_A-1721407 1.5691111.53458 2.1444731.6723393.106939 SNP_A-1686722 1.5691111.53458 2.1444731.6723393.106939 SNP_A-1701882 1.9624921.9290512.4172741.554699 2.752331 SNP_A-1701982 1.9624921.9290512.4172741.554699 2.752331 SNP_A-1705703 2.2070682.522.1444731.672339 3.106939 SNP_A-1737197 2.2070682.522.1444731.672339 3.106939 SNP_A-1754224 2.2070682.522.1444731.672339 3.106939 SNP_A-1749896 1.9624921.9290512.4172741.554699 1.85133 SNP_A-1750090 1.9624921.9290512.4172741.554699 1.85133 SNP_A-1642231 2.2070682.522.1444731.672339 2.196275 SNP_A-1714480 1.9624921.9290512.4172741.554699 1.85133 SNP_A-1712765 2.2070682.522.1444731.672339 2.196275 SNP_A-1710770 2.2070682.522.1444731.672339 1.518637 SNP_A-1649717 1.9624921.9290511.8893781.554699 1.85133 SNP_A-1684884 2.2070682.522.1444731.672339 1.518637 SNP_A-1658786 2.2070682.522.1444731.672339 1.518637 SNP_A-1688090 2.2070682.522.1444731.672339
[R] algorithm help
Hi, I am seeking help on designing an algorithm to identify the locations of stretches of 1s in a vector of 0s and 1s. Below is an simple example: > dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T) ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97))) > dat a b 1 0 4 2 0 12 3 1 13 4 1 16 5 1 18 6 1 20 7 0 28 8 0 30 9 1 34 10 1 46 11 0 47 12 1 49 13 1 61 14 1 73 15 1 77 16 0 84 17 0 87 18 0 90 19 0 95 20 1 97 In this dataset, "b" is sorted and denotes the location for each number in "a". So I would like to find the starting & ending locations for each stretch of 1s within "a", also counting the number of 1s in each stretch as well. Hope the results from the algorithm would be: stretch start end No.of.1s 1 13 204 2 34 462 3 49 774 4 97 971 I can imagine using for loops can do the job, but I feel it's not a clever way to do this. Is there an efficient algorithm that can do this fast? Thanks for any suggestions. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] algorithm help
Thanks very much, Ted. Yes, it does what I need! I made a routine to do this: f.fragment<-function(a,b) { dat<-as.data.frame(cbind(a,b)) L <- rle(dat$a)$lengths V <- rle(dat$a)$values pos <- c(1,cumsum(L)) V1 <- c(-1,V) start<-1+pos[V1==0] end<-pos[V1==1] cbind(stretch=1:length(start),start=dat$b[start] ,end=dat$b[end],no.of.1s=L[V==1]) } f.fragment(dat$a,dat$b) stretch start end no.of.1s [1,] 113 204 [2,] 234 462 [3,] 349 774 [4,] 497 971 John From: "ted.hard...@wlandres.net" Cc: r-h...@stat.math.ethz.ch Sent: Thu, January 6, 2011 2:57:47 PM Subject: RE: [R] algorithm help On 06-Jan-11 22:16:38, array chip wrote: > Hi, I am seeking help on designing an algorithm to identify the > locations of stretches of 1s in a vector of 0s and 1s. Below is > an simple example: > >> dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T) > ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97))) > >> dat >a b > 1 0 4 > 2 0 12 > 3 1 13 > 4 1 16 > 5 1 18 > 6 1 20 > 7 0 28 > 8 0 30 > 9 1 34 > 10 1 46 > 11 0 47 > 12 1 49 > 13 1 61 > 14 1 73 > 15 1 77 > 16 0 84 > 17 0 87 > 18 0 90 > 19 0 95 > 20 1 97 > > In this dataset, "b" is sorted and denotes the location for each > number in "a". > So I would like to find the starting & ending locations for each > stretch of 1s within "a", also counting the number of 1s in each > stretch as well. > Hope the results from the algorithm would be: > > stretch start end No.of.1s > 1 13 204 > 2 34 462 > 3 49 774 > 4 97 971 > > I can imagine using for loops can do the job, but I feel it's not a > clever way to do this. Is there an efficient algorithm that can do > this fast? > > Thanks for any suggestions. > John The basic information you need can be got using rle() ("run length encoding"). See '?rle'. In your example: rle(dat$a) # Run Length Encoding # lengths: int [1:8] 2 4 2 2 1 4 4 1 # values : num [1:8] 0 1 0 1 0 1 0 1 ## Note: F -> 0, T -> 1 The following has a somewhat twisted logic at the end, and may [[elided Yahoo spam]] L <- rle(dat$a)$lengths V <- rle(dat$a)$values pos <- c(1,cumsum(L)) V1 <- c(-1,V) 1+pos[V1==0] # [1] 3 9 12 20 ## Positions in the series dat$a where each run of "T" (i.e. 1) ## starts Hoping this helps, Ted. E-Mail: (Ted Harding) Fax-to-email: +44 (0)870 094 0861 Date: 06-Jan-11 Time: 22:57:44 -- XFMail -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] algorithm help
Thanks very much Bill, good catch! John From: William Dunlap Cc: r-h...@stat.math.ethz.ch Sent: Thu, January 6, 2011 3:52:47 PM Subject: RE: [R] algorithm help > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of array chip > Sent: Thursday, January 06, 2011 3:29 PM > To: ted.hard...@wlandres.net > Cc: r-h...@stat.math.ethz.ch > Subject: Re: [R] algorithm help > [[elided Yahoo spam]] > > I made a routine to do this: > > f.fragment<-function(a,b) { > dat<-as.data.frame(cbind(a,b)) > > L <- rle(dat$a)$lengths > V <- rle(dat$a)$values > pos <- c(1,cumsum(L)) > V1 <- c(-1,V) > start<-1+pos[V1==0] > end<-pos[V1==1] > > cbind(stretch=1:length(start),start=dat$b[start] > ,end=dat$b[end],no.of.1s=L[V==1]) > > } > > f.fragment(dat$a,dat$b) > > stretch start end no.of.1s > [1,] 113 204 > [2,] 234 462 > [3,] 349 774 > [4,] 497 971 You need to be more careful about the first and last rows in the dataset. I think yours only works when a starts with 0 and ends with 1. > f.fragment(c(1,1,0,0), c(11,12,13,14)) stretch start end no.of.1s [1,] 1NA 122 > f.fragment(c(1,1,0,1), c(11,12,13,14)) stretch start end no.of.1s [1,] 114 122 [2,] 114 141 > f.fragment(c(0,1,0,1), c(11,12,13,14)) stretch start end no.of.1s [1,] 112 121 [2,] 214 141 > f.fragment(c(0,1,0,0), c(11,12,13,14)) stretch start end no.of.1s [1,] 112 121 [2,] 2NA 121 > f.fragment(c(1,1,1,1), c(11,12,13,14)) stretch end no.of.1s [1,] 1 144 [2,] 0 144 > f.fragment(c(0,0,0,0), c(11,12,13,14)) stretch start [1,] 1NA The following does better. It keeps things as logical vectors as long as possible, which tends to work better when dealing with runs. f <- function(a, b) { isFirstIn1Run <- c(TRUE, a[-1] != a[-length(a)]) & a==1 isLastIn1Run <- c(a[-1] != a[-length(a)], TRUE) & a==1 data.frame(stretch=seq_len(sum(isFirstIn1Run)), start = b[isFirstIn1Run], end = b[isLastIn1Run], no.of.1s = which(isLastIn1Run) - which(isFirstIn1Run) + 1) } > f(c(1,1,0,0), c(11,12,13,14)) stretch start end no.of.1s 1 111 122 > f(c(1,1,0,1), c(11,12,13,14)) stretch start end no.of.1s 1 111 122 2 214 141 > f(c(0,1,0,1), c(11,12,13,14)) stretch start end no.of.1s 1 112 121 2 214 141 > f(c(0,1,0,0), c(11,12,13,14)) stretch start end no.of.1s 1 112 121 > f(c(1,1,1,1), c(11,12,13,14)) stretch start end no.of.1s 1 111 144 > f(c(0,0,0,0), c(11,12,13,14)) [1] stretch startend no.of.1s <0 rows> (or 0-length row.names) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > John > > > > > ____ > From: "ted.hard...@wlandres.net" > > Cc: r-h...@stat.math.ethz.ch > Sent: Thu, January 6, 2011 2:57:47 PM > Subject: RE: [R] algorithm help > > On 06-Jan-11 22:16:38, array chip wrote: > > Hi, I am seeking help on designing an algorithm to identify the > > locations of stretches of 1s in a vector of 0s and 1s. Below is > > an simple example: > > > >> > dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T) > > ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97))) > > > >> dat > >a b > > 1 0 4 > > 2 0 12 > > 3 1 13 > > 4 1 16 > > 5 1 18 > > 6 1 20 > > 7 0 28 > > 8 0 30 > > 9 1 34 > > 10 1 46 > > 11 0 47 > > 12 1 49 > > 13 1 61 > > 14 1 73 > > 15 1 77 > > 16 0 84 > > 17 0 87 > > 18 0 90 > > 19 0 95 > > 20 1 97 > > > > In this dataset, "b" is sorted and denotes the location for each > > number in "a". > > So I would like to find the starting & ending locations for each > > stretch of 1s within "a", also counting the number of 1s in each > > stretch as well. > > Hope the results from the algorithm would be: > > > > stretch start end No.of.1s > > 1 13 204 > > 2 34 462 > > 3 49 774 > > 4
[R] effects packages for mixed model?
Hi, I am wondering if there is a similar effects package for mixed models, just like what effects package does for linear, generalized linear models? Specifically I am looking for a way to calculate the SAS-co-called least squared means (LS means) in mixed models (I understand there is a substantial debate on whether such adjusted means should be computed in the first place). Thank you, John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] analysis strategy - baseline and repeated measure
Hi, assume that I have a repeated measure dataset with 3 time points: baseline, day 5 and day 10. There are 4 treatment groups (vehicle, treatment 1, treatment 2 and treatment 3). 20 subjects per treatment group. A simple straight-forward way to analyze the data is to use mixed model: model 1: obj <- lmer(y ~ treatment * time +(time|subject)) where time is numeric with value 0,5 and 10. The problem with this approach is that this model does not account for baseline imbalance between treatment groups. But if I want to include baseline value of the response variable in the model, then I think I have to exclude the baseline data from the rows of the dataset (so that baseline will become one variable, i.e. one column of the dataset, correct me if I am wrong). With this dataset tranformation, I only end up with 2 time points left in the dataset (day 5 and day 10). Then a linear term on the numeric time variable is not possible in lmer(). In this situation, what I can think of is to treatment time variable as a factor (say named as "time.f"), and run the following model: model 2: obj<- lmer(y ~ treatment * time.f +(1|subject)) where time.f is a factor with value 5 and 10. Couple of questions: 1. Should we really concern about the baseline imbalance by including baseline as a variable in the model? What's the advantage of doing so versus not doing so? 2. If the objective of the study is to evaluate at the end of the study (day 10), which treatment group produces significantly difference from the vehicle group, is model 2 a reasonable model to do that? 3. In general, with a repeated measures of 2 to 3 time points, is mixed models really necessary? In mixed-model mailing list, I realized that there is concerns about running mixed models on just a few time points. But I feel uncomfortable to run simple ANOVA (or ANCOVA) while completely ignore the fact the data arecorrelated among time points. 4. What are the better alternatives analyzing such datasets? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read SAS dataset using read.ssd()
Hi, I am using read.ssd() from foreign package to read some SAS datasets. I have 2 types of SAS datasets, one with "sas7bdat" extension, the other with "ssd01" extension. I have no problem with the first dataset type, but got the following error message with the 2nd dataset type (with "ssd01" extension): test<-read.ssd("C:/Documents and Settings/Desktop","test",sascmd="C:/Program Files/SAS/SASFoundation/9.2/sas.exe") SAS failed. SAS program at C:\DOCUME~1\LOCALS~1\Temp\Rtmp08209e\file65b5504.sas The log file will be file65b5504.log in the current directory Warning message: In read.ssd("C:/Documents and Settings/Desktop", "test", sascmd = "C:/Program Files/SAS/SASFoundation/9.2/sas.exe") : SAS return code was 2 Attached please find the log file. Thanks for any suggestions. John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read SAS dataset using read.ssd()
Looks like the log file is not appropriately attached. Here it is again. Thanks for any suggestions. John - Original Message From: array chip To: r-help@r-project.org Sent: Mon, August 2, 2010 2:18:32 PM Subject: [R] read SAS dataset using read.ssd() Hi, I am using read.ssd() from foreign package to read some SAS datasets. I have 2 types of SAS datasets, one with "sas7bdat" extension, the other with "ssd01" extension. I have no problem with the first dataset type, but got the following error message with the 2nd dataset type (with "ssd01" extension): test<-read.ssd("C:/Documents and Settings/Desktop","test",sascmd="C:/Program Files/SAS/SASFoundation/9.2/sas.exe") SAS failed. SAS program at C:\DOCUME~1\LOCALS~1\Temp\Rtmp08209e\file65b5504.sas The log file will be file65b5504.log in the current directory Warning message: In read.ssd("C:/Documents and Settings/Desktop", "test", sascmd = "C:/Program Files/SAS/SASFoundation/9.2/sas.exe") : SAS return code was 2 Attached please find the log file. Thanks for any suggestions. John 1 The SAS System 14:10 Monday, August 2, 2010 NOTE: Unable to open SASUSER.REGSTRY. WORK.REGSTRY will be opened instead. NOTE: All registry changes will be lost at the end of the session. WARNING: Unable to copy SASUSER registry to WORK registry. Because of this, you will not see registry customizations during this session. NOTE: Unable to open SASUSER.PROFILE. WORK.PROFILE will be opened instead. NOTE: All profile changes will be lost at the end of the session. NOTE: Copyright (c) 2002-2008 by SAS Institute Inc., Cary, NC, USA. NOTE: This session is executing on the XP_PRO platform. NOTE: SAS initialization used: real time 0.98 seconds cpu time0.35 seconds 1 option validvarname = v6;libname src2rd 'C:/Documents and Settings/Desktop'; NOTE: Libref SRC2RD was successfully assigned as follows: Engine:V9 Physical Name: C:\Documents and Settings\Desktop 2 libname rd xport 'C:\DOCUME~1\LOCALS~1\Temp\Rtmp08209e\file25af6ba1'; NOTE: Libref RD was successfully assigned as follows: Engine:XPORT Physical Name: C:\DOCUME~1\LOCALS~1\Temp\Rtmp08209e\file25af6ba1 3 proc copy in=src2rd out=rd; 4 select test ; ERROR: The file SRC2RD.TEST (memtype=ALL) was not found, but appears on a SELECT statement. WARNING: Input library SRC2RD is empty. NOTE: Statements not processed because of errors noted above. NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE COPY used (Total process time): real time 0.06 seconds cpu time0.04 seconds ERROR: Errors printed on page 1. NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414 NOTE: The SAS System used: real time 1.06 seconds cpu time0.40 seconds __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xlsx package
HI, I was trying to install xlsx package for reading in Excel 2007 files. The installation went smoothly. But when I tried to load the library, I got the following error message: > library(xlsx) Loading required package: xlsxjars Loading required package: rJava Error : .onLoad failed in loadNamespace() for 'xlsxjars', details: call: .jinit() error: cannot obtain Class.getSimpleName method ID Error: package 'xlsxjars' could not be loaded The help page of the xlsx package has a note "NOTE: You may need to add to your PATH variable the location of your JVM (e.g. C:/Program Files/Java/jre6/bin/client) to get rJava working. " I did add that to the PATH variable using: > Sys.setenv(PATH=paste(Sys.getenv("PATH"),"C:\\Program >Files\\Java\\jre6\\bin\\client",collapse=';')) But the error still persists. Any suggestions? Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.