[R] testing parallelism of does-response curves using nls()

2010-03-12 Thread array chip
Hi, I am trying to use F test or Chi-square test to test if 2 5-parameter (A, 
B, xmid, scal and H) logistic curves are parallel based on residual sum of 
squares.

What's usually done is to first fit the 2 curves using a constraint (or global) 
model where all parameters are kept the same except for "xmid"; then fit 2 
independent curves using unconstraint models where all 5 parameters are allowed 
for change. The extra residual sum of squares can then be used to test 
parallelism using either Chi-square or F test.

Now, fitting 2 separate curves are straight forward using nls(), but how to fit 
a global model to only allow "xmid" to be different among the 2 curves?

BTW, is there a selfStart function for 5-parameter logistic like SSfpl for 
4-parameter logistic function?

Thanks very much,

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] testing parallelism of does-response curves using package "drc"

2010-03-16 Thread array chip
Hi Eik,

Thanks for pointing out this great package "drc". 

Can you or anyone please help me to specify a constrained 5 PL model where only 
the transition parameter ("e" in L.5()) are allowed to differ?

BTW, what does the argument "pmodels=" do in the model specification? And what 
about arguments "lower=" and "upper="? I feel they might be related to 
specifying constrained models. 

Thanks

John



--- On Sat, 3/13/10, Eik Vettorazzi  wrote:

> From: Eik Vettorazzi 
> Subject: Re: [R] testing parallelism of does-response curves using nls()
> To: "array chip" 
> Cc: r-help@r-project.org
> Date: Saturday, March 13, 2010, 9:12 AM
> Hi John,
> you may have a look at the drc-package (or look at the
> maintainers web 
> site www.bioassay.dk)
> I think it does exactly what you want.
> HTH
> Eik
> 
> array chip schrieb:
> > Hi, I am trying to use F test or Chi-square test to
> test if 2 5-parameter (A, B, xmid, scal and H) logistic
> curves are parallel based on residual sum of squares.
> >
> > What's usually done is to first fit the 2 curves using
> a constraint (or global) model where all parameters are kept
> the same except for "xmid"; then fit 2 independent curves
> using unconstraint models where all 5 parameters are allowed
> for change. The extra residual sum of squares can then be
> used to test parallelism using either Chi-square or F test.
> >
> > Now, fitting 2 separate curves are straight forward
> using nls(), but how to fit a global model to only allow
> "xmid" to be different among the 2 curves?
> >
> > BTW, is there a selfStart function for 5-parameter
> logistic like SSfpl for 4-parameter logistic function?
> >
> > Thanks very much,
> >
> > John
> >
> > __
> > R-help@r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >   
> 
> 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] datadist() in Design library

2009-07-09 Thread array chip

Hi I got an error message using datadist() from Design package:

> library(Design,T)
> dd <- datadist(beta.final)  
> options(datadist="dd") 
> lrm(Disease ~ gsct+apcct+rarct, x=TRUE, y=TRUE)
Error in eval(expr, envir, enclos) : object "Disease" not found

All variables inclduing response variable "Disease" are in the data frame 
"beta.final", why then "Disease" can not be found? I thought with datadist(), 
all variables are presented to the model fitting functions. maybe I am wrong?

thanks

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] validate() in Design library

2009-07-09 Thread array chip

Hi, another question about validate() in Design library. The arugment "B" of 
this function is number of repetition for method="bootstrap", which is easy to 
understand; but for method="crossvalidation", B is the number of groups of 
omitted observations. This is confusing, I don't understand what it means. 
Let's say 5-fold cross validation, all samples are divided into 5 groups of 
equal number of samples, 4 groups will be used as training and the model 
developed there will be tested in the 1 group left-over. And the process 
circulate for all 5 groups. What does the "B" argument mean in this example? 
B=5? or B=1 because 1 group of samples omitted from model development?

Thanks

Yi

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] datadist() in Design library

2009-07-09 Thread array chip

Dear Jorge, Yes, with "data=beta.final" in the lrm(), it worked. But I thought 
one of the reasons for datadist() is to make it unnecessary to specify the data 
frame in the lrm(). Maybe I am completely wrong here.

Thanks

John

--- On Thu, 7/9/09, Jorge Ivan Velez  wrote:

> From: Jorge Ivan Velez 
> Subject: Re: [R] datadist() in Design library
> To: "array chip" 
> Cc: "R mailing list" 
> Date: Thursday, July 9, 2009, 6:55 PM
> Dear John,
> Have you tried it specifying the 'data'
> argument as suggested in lrm help?
> Try this:
>  lrm(Disease ~ gsct +
> apcct + rarct,  data = beta.final, x
> = TRUE,  y = TRUE)
> 
> 
> HTH,
> Jorge
> 
> On Thu, Jul 9, 2009 at 6:46 PM,
> array chip 
> wrote:
> 
> 
> 
> 
> Hi I got an error message using datadist() from Design
> package:
> 
> 
> 
> > library(Design,T)
> 
> > dd <- datadist(beta.final)
> 
> > options(datadist="dd")
> 
> > lrm(Disease ~ gsct+apcct+rarct, x=TRUE, y=TRUE)
> 
> Error in eval(expr, envir, enclos) : object
> "Disease" not found
> 
> 
> 
> All variables inclduing response variable
> "Disease" are in the data frame
> "beta.final", why then "Disease" can not
> be found? I thought with datadist(), all variables are
> presented to the model fitting functions. maybe I am wrong?
> 
> 
> 
> 
> 
> thanks
> 
> 
> 
> John
> 
> 
> 
> __
> 
> R-help@r-project.org
> mailing list
> 
> https://stat.ethz.ch/mailman/listinfo/r-help
> 
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> 
> and provide commented, minimal, self-contained,
> reproducible code.
> 
> 
> 
> 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] prevalence in logistic regression lrm()

2009-07-10 Thread array chip

Hi, I am wondering if there is a way to specify the prevalence of events in 
logistic regression using lrm() from Design package? Linear Discriminant 
Analysis using lda() from MASS library has an argument "prior=" that we can use 
to specify the prevalent of events when the actual dataset being analyzed does 
not have a representative prevalence. How can we incorporate this information 
in lrm()? The concern I have is if my dataset does not have a representative 
prevalence, then the probability generated by lrm() will not be meaningful. 
Thanks

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] trying to understand lme() results

2010-03-31 Thread array chip
Hi, I have very simple balanced randomized block design where I total have 48 
observations of a measure of weights of a product, the product was manufactured 
at 4 sites, so each site has 12 observations. I want to use lme() from nlme 
package to estimate the standard error of the product weight.

So the data look like:

  MW site
1  540311
2  552861
3  543962
4  523272
5  559633
6  548933
7  573384
8  555974
:
:
:

The random effect model is: Y = mu + b + e where b is random block effect and e 
is model error.

so I fitted a lme model as:

obj<-lme(MW~1, random=~1|site, data=dat)

summary(obj)
Linear mixed-effects model fit by REML
Random effects:
 Formula: ~1 | site
(Intercept) Residual
StdDev:2064.006 1117.567

Fixed effects: MW ~ 1 
   Value Std.Error DF  t-value p-value
(Intercept) 55901.31  1044.534 44 53.51796   0
:
:
Number of Observations: 48
Number of Groups: 4 

I also did:
anova(obj)
numDF denDF  F-value p-value
(Intercept) 144 2864.173  <.0001

I believe my standard error estimate is from "Residual" under "Random Effects" 
part of summary(), which is 1117.567. 

Now my question is regarding t test under "Fixed effects". I think it's testing 
whether the over mean weight is 0 or not, which is not interesting anyway. But 
how the standard error of 1044.534 is calculated? I thought it should be 
sqrt(MSE)=1117.567 instead. anyone can explain?

Same goes to the F test using anova(obj). The F test statistic is equal to 
square of the t test statistic because of 1 df of numerator. But what's the 
mean sum of squares of numerator and denominator, where to find them? BTW, I 
think denominator mean sum of squares (MSE) should be 1117.567^2, but this is 
not consistent to the standard error in the t test (1044.534).

Thanks a lot for any help

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] trying to understand lme() results

2010-04-01 Thread array chip
Thanks Dennis for the thorough explanation and correction on the design.

John

--- On Thu, 4/1/10, Dennis Murphy  wrote:

From: Dennis Murphy 
Subject: Re: [R] trying to understand lme() results
To: "array chip" 
Cc: R-help@r-project.org
Date: Thursday, April 1, 2010, 12:33 AM

Hi:


On Wed, Mar 31, 2010 at 2:31 PM, array chip  wrote:

Hi, I have very simple balanced randomized block design where I total have 48 
observations of a measure of weights of a product, the product was manufactured 
at 4 sites, so each site has 12 observations. I want to use lme() from nlme 
package to estimate the standard error of the product weight.


It's a balanced one-way design where site is assumed to be a random factor.
If you want to call it a block, fine, but if this were a genuine RCBD, there 
would be
treatments randomly assigned to 'units' within site, and that's not present 
here. 




So the data look like:



      MW site

1  54031    1

2  55286    1

3  54396    2

4  52327    2

5  55963    3

6  54893    3

7  57338    4

8  55597    4

:

:

:



The random effect model is: Y = mu + b + e where b is random block effect and e 
is model error.



so I fitted a lme model as:



obj<-lme(MW~1, random=~1|site, data=dat)



summary(obj)

Linear mixed-effects model fit by REML

Random effects:

 Formula: ~1 | site

        (Intercept) Residual

StdDev:    2064.006 1117.567



Fixed effects: MW ~ 1

               Value Std.Error DF  t-value p-value

(Intercept) 55901.31  1044.534 44 53.51796       0

:

:

Number of Observations: 48

Number of Groups: 4



I also did:

anova(obj)

            numDF denDF  F-value p-value

(Intercept)     1    44 2864.173  <.0001



I believe my standard error estimate is from "Residual" under "Random Effects" 
part of summary(), which is 1117.567.

Yes. 




Now my question is regarding t test under "Fixed effects". I think it's testing 
whether the over mean weight is 0 or not, which is not interesting anyway. But 
how the standard error of 1044.534 is calculated? I thought it should be 
sqrt(MSE)=1117.567 instead. anyone can explain?


When the data are balanced, 
the population variance of \bar{y}.., the sample grand mean, is E(MSA)/N, where
MSA is the between-site mean square and N is the total sample size (Searle, 
Casella

and McCulloch, _Variance Components_, p. 54, formula (37) derived for the 
balanced
data case - the corresponding ANOVA table, with expected mean squares, would be
on p. 60). The plug-in estimate of E(MSA) is


MSA = n * s^2(Intercept) + s^2(error) = 12 * (2064.006)^2 + 1117.567^2,

where n = 12 = number of observations per site. The standard error for 
\bar{y}.. is then
sqrt(MSA/N). Doing these calculations in R,


xx <- 12 * (2064.006)^2 + (1117.567)^2
sqrt(xx/48)
[1] 1044.533

which, within rounding error, is what lme() gives you in the test for fixed 
effects.





Same goes to the F test using anova(obj). The F test statistic is equal to 
square of the t test statistic because of 1 df of numerator. But what's the 
mean sum of squares of numerator and denominator, where to find them? BTW, I 
think denominator mean sum of squares (MSE) should be 1117.567^2, but this is 
not consistent to the standard error in the t test (1044.534).


lme() fits by ML or REML, so it doesn't output a conventional ANOVA table as 
part of
the output. If you want to see the sums of squares and mean squares, use aov(). 
In the
balanced one-way model, the observed df, SS and MS are the same in both the 
fixed

effects and random effects models, but the expected mean square for treatments 
differs
between the two models.

HTH,
Dennis




Thanks a lot for any help



John



__

R-help@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.






  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] can't read excel file with read.xls()

2010-04-02 Thread array chip
Hi, I encountered a problem of not being able to read in an excel spreadsheet 
using read.xls() in the xlsReadWrite package. can anyone help? Here is an 
example code


write.xls(matrix(1:9,nrow=3),"ttt.xls")
read.xls("ttt.xls")
Error in read.xls("ttt.xls") : 
  Unexpected error. Message: Can't find the file "ttt.xls"

The "ttt.xls" file was created in the current working folder successfully. 

Thanks

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] can't read excel file with read.xls()

2010-04-02 Thread array chip
Thanks for the link, very useful. A simple fix to the code using 
normalizePath(), or full path name.

read.xls(normalizePath("ttt.xls"))



--- On Fri, 4/2/10, Gabor Grothendieck  wrote:

> From: Gabor Grothendieck 
> Subject: Re: [R] can't read excel file with read.xls()
> To: "array chip" 
> Cc: r-h...@stat.math.ethz.ch
> Date: Friday, April 2, 2010, 7:05 PM
> Read about the bugs and workarounds
> on this page:
> http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows&s=excel
> 
> On Fri, Apr 2, 2010 at 6:56 PM, array chip 
> wrote:
> > Hi, I encountered a problem of not being able to read
> in an excel spreadsheet using read.xls() in the xlsReadWrite
> package. can anyone help? Here is an example code
> >
> >
> > write.xls(matrix(1:9,nrow=3),"ttt.xls")
> > read.xls("ttt.xls")
> > Error in read.xls("ttt.xls") :
> >  Unexpected error. Message: Can't find the file
> "ttt.xls"
> >
> > The "ttt.xls" file was created in the current working
> folder successfully.
> >
> > Thanks
> >
> > John
> >
> > __
> > R-help@r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >
> 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 3-D response surface using wireframe()

2010-04-06 Thread array chip
I am working on plotting a response surface using wireframe(). The default 
style/orientation is

z
|
|
y   |
\   |
 \  |
  \ |
   \|
\   |
 \  |
  \ |
   \|x
0

Now what I want the orientation of axes is:

  z
  |
  |
  |
  |
  |
 /0\
/   \
   / \
  /   \
 / \
/   \
   y z

My understanding is that the screen=list(z=,y=,x=) control the orientation of 
axes, but even after reading the help page of screen argument, I still don't 
understand how to use it. 

screen: "A list determining the sequence of rotations to be applied to the data 
before being plotted. The initial position starts with the viewing point along 
the positive z-axis, and the x and y axes in the usual position. Each component 
of the list should be named one of "x", "y" or "z" (repititions are allowed), 
with their values indicating the amount of rotation about that axis in degrees."

Can anyone explain to me how the screen argument works? And what values (x,y,z) 
I should choose for the orientation that I want?

Another question is wireframe(0 will draw all 8 edges of the cubic by default, 
is there anyway that I can control what edges I can draw, what I can hide?

thanks very much!

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 3-D response surface using wireframe()

2010-04-07 Thread array chip
Thank you David. The document is very helpful. 

I had a typo in my 2nd example plot, one of the "z" is "x", "z" is the vertical 
one.

Thanks

John

--- On Wed, 4/7/10, David Winsemius  wrote:

> From: David Winsemius 
> Subject: Re: [R] 3-D response surface using wireframe()
> To: "array chip" 
> Cc: r-help@r-project.org
> Date: Wednesday, April 7, 2010, 8:07 AM
> A search with the following
> strategy:
> 
> RSiteSearch("lattice wireframe rotate axes")
> 
> Followed by adding requests to search earlier years'
> archives produced this link which has a further link to a
> document that answers most of your questions, at least the
> ones that are comprehensible:
> 
> http://tolstoy.newcastle.edu.au/R/e2/help/07/03/12534.html
> 
> --David.
> 
> On Apr 6, 2010, at 7:12 PM, array chip wrote:
> 
> > I am working on plotting a response surface using
> wireframe(). The default style/orientation is
> > 
> >        z
> >        |
> >        |
> > y       |
> > \       |
> > \      |
> >  \     |
> >   \    |
> >    \   |
> >     \  |
> >      \ |
> >       \|x
> >        0
> > 
> > Now what I want the orientation of axes is:
> > 
> >               
>   z
> >               
>   |
> >               
>   |
> >               
>   |
> >               
>   |
> >               
>   |
> >             
>    /0\
> >               
> /   \
> >           
>    /     \
> >             
> /       \
> >         
>    /         \
> >            / 
>          \
> >           y 
>            z
> 
> Two z axes? How interesting!
> 
> > 
> > My understanding is that the screen=list(z=,y=,x=)
> control the orientation of axes, but even after reading the
> help page of screen argument, I still don't understand how
> to use it.
> > 
> > screen: "A list determining the sequence of rotations
> to be applied to the data before being plotted. The initial
> position starts with the viewing point along the positive
> z-axis, and the x and y axes in the usual position. Each
> component of the list should be named one of "x", "y" or "z"
> (repititions are allowed), with their values indicating the
> amount of rotation about that axis in degrees."
> > 
> > Can anyone explain to me how the screen argument
> works? And what values (x,y,z) I should choose for the
> orientation that I want?
> > 
> > Another question is wireframe(0 will draw all 8 edges
> of the cubic by default, is there anyway that I can control
> what edges I can draw, what I can hide?
> > 
> > thanks very much!
> > 
> > John
> > 
> > __
> > R-help@r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> 
> David Winsemius, MD
> West Hartford, CT
> 
> 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 3-D response surface using wireframe()

2010-04-07 Thread array chip
With the help document, i finally find a set of values of for x=,y= and z= in 
"screen" argument that gives me the correct rotation of the plot. But now it 
plots x and y axis (tick marks and labels) along the top of the plot. Is there 
one way to plot x and y axis on the bottom of the plot?

Thanks

John

--- On Wed, 4/7/10, David Winsemius  wrote:

> From: David Winsemius 
> Subject: Re: [R] 3-D response surface using wireframe()
> To: "array chip" 
> Cc: r-help@r-project.org
> Date: Wednesday, April 7, 2010, 8:07 AM
> A search with the following
> strategy:
> 
> RSiteSearch("lattice wireframe rotate axes")
> 
> Followed by adding requests to search earlier years'
> archives produced this link which has a further link to a
> document that answers most of your questions, at least the
> ones that are comprehensible:
> 
> http://tolstoy.newcastle.edu.au/R/e2/help/07/03/12534.html
> 
> --David.
> 
> On Apr 6, 2010, at 7:12 PM, array chip wrote:
> 
> > I am working on plotting a response surface using
> wireframe(). The default style/orientation is
> > 
> >        z
> >        |
> >        |
> > y       |
> > \       |
> > \      |
> >  \     |
> >   \    |
> >    \   |
> >     \  |
> >      \ |
> >       \|x
> >        0
> > 
> > Now what I want the orientation of axes is:
> > 
> >               
>   z
> >               
>   |
> >               
>   |
> >               
>   |
> >               
>   |
> >               
>   |
> >             
>    /0\
> >               
> /   \
> >           
>    /     \
> >             
> /       \
> >         
>    /         \
> >            / 
>          \
> >           y 
>            z
> 
> Two z axes? How interesting!
> 
> > 
> > My understanding is that the screen=list(z=,y=,x=)
> control the orientation of axes, but even after reading the
> help page of screen argument, I still don't understand how
> to use it.
> > 
> > screen: "A list determining the sequence of rotations
> to be applied to the data before being plotted. The initial
> position starts with the viewing point along the positive
> z-axis, and the x and y axes in the usual position. Each
> component of the list should be named one of "x", "y" or "z"
> (repititions are allowed), with their values indicating the
> amount of rotation about that axis in degrees."
> > 
> > Can anyone explain to me how the screen argument
> works? And what values (x,y,z) I should choose for the
> orientation that I want?
> > 
> > Another question is wireframe(0 will draw all 8 edges
> of the cubic by default, is there anyway that I can control
> what edges I can draw, what I can hide?
> > 
> > thanks very much!
> > 
> > John
> > 
> > __
> > R-help@r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> 
> David Winsemius, MD
> West Hartford, CT
> 
> 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 3-D response surface using wireframe()

2010-04-08 Thread array chip
Scott,

This is a good explanation and a good practice.

Thank you,

John

--- On Thu, 4/8/10, Waichler, Scott R  wrote:

> From: Waichler, Scott R 
> Subject: Re: 3-D response surface using wireframe()
> To: "arrayprof...@yahoo.com" 
> Cc: "r-help@r-project.org" 
> Date: Thursday, April 8, 2010, 9:51 AM
> Regarding the screen argument in
> wireframe(), here is what I understand about how it works
> after much trial-and-error:
> 
> After each rotation, new axes are in effect defined for the
> next rotation as at the start:  x is to the right of
> the 2D view, y is towards the top, and z is positive out of
> the page towards you.  Understanding this "reset" of
> coordinate system after each rotation is key to
> understanding how the sequence of rotations will be
> done.  Rotations follow the right-hand rule: 
> positive angles follow curved fingers of right hand, with
> thumb pointing in positive direction of associated axis.
> 
> I labeled a wooden block with axes and turned it in my hand
> to help me make the initial guess at the sequence of
> rotations I would want for a given view.  
> 
> Scott Waichler
> Pacific Northwest National Laboratory
> P.O. Box 999, Richland, WA  99352
> scott.waich...@pnl.gov
> 509-372-4423, 509-341-4051 (cell)
> 
> 
> 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 3-D response surface using wireframe()

2010-04-08 Thread array chip
David,

That does the job! Thanks a lot.

Now I am very very close to what I want. Still have a couple of small 
adjustments to make. 

1. I use drape=TRUE to draw grid and color on the surface, is there a parameter 
to adjust the density of the grid?

2. Is there a way that I can add grid to the axis surface? I mean the sides of 
the box, between x & y, between x & z, and between y & z? And I need to choose 
which 3 side of the box that I want to add grid?

Thank you all for the help. It's fun to play with wireframe

John

--- On Wed, 4/7/10, David Winsemius  wrote:

> From: David Winsemius 
> Subject: Re: [R] 3-D response surface using wireframe()
> To: "array chip" 
> Cc: r-help@r-project.org
> Date: Wednesday, April 7, 2010, 9:22 PM
> 
> On Apr 7, 2010, at 8:58 PM, array chip wrote:
> 
> > With the help document, i finally find a set of values
> of for x=,y=  
> > and z= in "screen" argument that gives me the correct
> rotation of  
> > the plot. But now it plots x and y axis (tick marks
> and labels)  
> > along the top of the plot. Is there one way to plot x
> and y axis on  
> > the bottom of the plot?
> 
> Look at the scpos argument to specify the scales location.
> (Still  
> lacking an example and therrefore doing this from memory.)
> 
> -- 
> David
> >
> > Thanks
> >
> > John
> >
> > --- On Wed, 4/7/10, David Winsemius 
> wrote:
> >
> >> From: David Winsemius 
> >> Subject: Re: [R] 3-D response surface using
> wireframe()
> >> To: "array chip" 
> >> Cc: r-help@r-project.org
> >> Date: Wednesday, April 7, 2010, 8:07 AM
> >> A search with the following
> >> strategy:
> >>
> >> RSiteSearch("lattice wireframe rotate axes")
> >>
> >> Followed by adding requests to search earlier
> years'
> >> archives produced this link which has a further
> link to a
> >> document that answers most of your questions, at
> least the
> >> ones that are comprehensible:
> >>
> >> http://tolstoy.newcastle.edu.au/R/e2/help/07/03/12534.html
> >>
> >> --David.
> >>
> >> On Apr 6, 2010, at 7:12 PM, array chip wrote:
> >>
> >>> I am working on plotting a response surface
> using
> >> wireframe(). The default style/orientation is
> >>>
> >>>         z
> >>>         |
> >>>         |
> >>> y       |
> >>> \       |
> >>> \      |
> >>>   \     |
> >>>    \    |
> >>>     \   |
> >>>      \  |
> >>>       \ |
> >>>       
> \|x
> >>>         0
> >>>
> >>> Now what I want the orientation of axes is:
> >>>
> >>>
> >>   z
> >>>
> >>   |
> >>>
> >>   |
> >>>
> >>   |
> >>>
> >>   |
> >>>
> >>   |
> >>>
> >>    /0\
> >>>
> >> /   \
> >>>
> >>    /     \
> >>>
> >> /       \
> >>>
> >>    /     
>    \
> >>>         
>    /
> >>          \
> >>>            y
> >>            z
> >>
> >> Two z axes? How interesting!
> >>
> >>>
> >>> My understanding is that the
> screen=list(z=,y=,x=)
> >> control the orientation of axes, but even after
> reading the
> >> help page of screen argument, I still don't
> understand how
> >> to use it.
> >>>
> >>> screen: "A list determining the sequence of
> rotations
> >> to be applied to the data before being plotted.
> The initial
> >> position starts with the viewing point along the
> positive
> >> z-axis, and the x and y axes in the usual
> position. Each
> >> component of the list should be named one of "x",
> "y" or "z"
> >> (repititions are allowed), with their values
> indicating the
> >> amount of rotation about that axis in degrees."
> >>>
> >>> Can anyone explain to me how the screen
> argument
> >> works? And what values (x,y,z) I should choose for
> the
> >> orientation that I want?
> >>>
> >>> Another question is wireframe(0 will draw all
> 8 edges
> >> of the cubic by default, is there anyway that I
> can control
> >> what edges I can draw, what I can hide?
> >>>
> >>> thanks very much!
> >>>
> >>> John
> >>>
> >>>
> __
> >>> R-help@r-project.org
> >> mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide 
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal,
> self-contained,
> >> reproducible code.
> >>
> >> David Winsemius, MD
> >> West Hartford, CT
> >>
> >>
> >
> >
> >
> >
> 
> David Winsemius, MD
> West Hartford, CT
> 
> 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 3-D response surface using wireframe()

2010-04-08 Thread array chip
David,

Thansk again! Sarkar's Lattice book is excellent source for lattice. Here is a 
link for all the figures and codes used in the book. You example is figure 13.7.

http://lmdvr.r-forge.r-project.org/figures/figures.html

I got the first point! For the second point below, Figure 13.7 an excellent 
example for projecting the 3D dataset onto the bounding surface, but it's not 
what I meant. I think I didn't explain what I wanted clearly. What I really 
wanted is a simple straight grid lines across the tick marks for 3 bounding 
surfaces of the box, not a projection of the 3D dataset. Hope I have explained 
clearly this time.

Many thanks

John


--- On Thu, 4/8/10, David Winsemius  wrote:

> From: David Winsemius 
> Subject: Re: [R] 3-D response surface using wireframe()
> To: "array chip" 
> Cc: r-help@r-project.org
> Date: Thursday, April 8, 2010, 3:46 PM
> 
> On Apr 8, 2010, at 3:13 PM, array chip wrote:
> 
> > David,
> >
> > That does the job! Thanks a lot.
> >
> > Now I am very very close to what I want. Still have a
> couple of  
> > small adjustments to make.
> >
> > 1. I use drape=TRUE to draw grid and color on the
> surface, is there  
> > a parameter to adjust the density of the grid?
> 
> If you mean the spacing between points, then isn't that
> determined by  
> the density of the gridded data arguments before they get
> to the  
> wireframe function?
> 
> >
> > 2. Is there a way that I can add grid to the axis
> surface? I mean  
> > the sides of the box, between x & y, between x
> & z, and between y &  
> > z? And I need to choose which 3 side of the box that I
> want to add  
> > grid?
> 
> See Figure 13.7 of Sarkar's Lattice text for an example of
> a panel  
> function that collapses the contourLines of the volcano
> dataset at the  
> top bounding surface by using ltransform3dto3d with a z
> argument of  
> zlim.scaled[2]. I would think that a grid could be 3dto3d
> transformed  
> similarly.
> 
> -- 
> David.
> 
> >
> > Thank you all for the help. It's fun to play with
> wireframe
> >
> > John
> >
> > --- On Wed, 4/7/10, David Winsemius 
> wrote:
> >
> >> From: David Winsemius 
> >> Subject: Re: [R] 3-D response surface using
> wireframe()
> >> To: "array chip" 
> >> Cc: r-help@r-project.org
> >> Date: Wednesday, April 7, 2010, 9:22 PM
> >>
> >> On Apr 7, 2010, at 8:58 PM, array chip wrote:
> >>
> >>> With the help document, i finally find a set
> of values
> >> of for x=,y=
> >>> and z= in "screen" argument that gives me the
> correct
> >> rotation of
> >>> the plot. But now it plots x and y axis (tick
> marks
> >> and labels)
> >>> along the top of the plot. Is there one way to
> plot x
> >> and y axis on
> >>> the bottom of the plot?
> >>
> >> Look at the scpos argument to specify the scales
> location.
> >> (Still
> >> lacking an example and therrefore doing this from
> memory.)
> >>
> >> -- 
> >> David
> >>>
> >>> Thanks
> >>>
> >>> John
> >>>
> >>> --- On Wed, 4/7/10, David Winsemius 
> >> wrote:
> >>>
> >>>> From: David Winsemius 
> >>>> Subject: Re: [R] 3-D response surface
> using
> >> wireframe()
> >>>> To: "array chip" 
> >>>> Cc: r-help@r-project.org
> >>>> Date: Wednesday, April 7, 2010, 8:07 AM
> >>>> A search with the following
> >>>> strategy:
> >>>>
> >>>> RSiteSearch("lattice wireframe rotate
> axes")
> >>>>
> >>>> Followed by adding requests to search
> earlier
> >> years'
> >>>> archives produced this link which has a
> further
> >> link to a
> >>>> document that answers most of your
> questions, at
> >> least the
> >>>> ones that are comprehensible:
> >>>>
> >>>> http://tolstoy.newcastle.edu.au/R/e2/help/07/03/12534.html
> >>>>
> >>>> --David.
> >>>>
> >>>> On Apr 6, 2010, at 7:12 PM, array chip
> wrote:
> >>>>
> >>>>> I am working on plotting a response
> surface
> >> using
> >>>> wireframe(). The default style/orientation
> is
> >>>>>
> >>>>>          z

Re: [R] 3-D response surface using wireframe()

2010-04-09 Thread array chip
Hi David and Felix,

Thank you very much for your suggestions. To be honest, this has become beyond 
my understanding of lattice plots now. I am relatively new to lattice plots, so 
have no idea how function within function works (for example, how does 
panel.3dpolygon() within panel.3d.wireframe() within wirefarme() works, totally 
have no clue.

Felix, your example code of panel.3dpolygon() for volcano plot does what I 
want, but again, I don't know how to tweak your example to suit my case.

I attached an example dataset, and an example of the plot that I wanted to make 
(especially those grid lines on the 3 bounding surfaces of the box, and if 
possible remove those front edges of the box to make it look like open). 

dat<-read.table("dat.txt",sep='\t',header=T,row.names=1)

library(lattice)
wireframe(z ~ x*y, data = dat,
scales = list(arrows = FALSE, cex=0.9, col="black",font=3, tick.number=6, 
z=list(tick.number=10, 
tck=0.8,distance=0.8),x=list(tck=0.8,distance=0.6),y=list(tck=0.7,distance=0.6)),
zlim=seq(-14,4,by=2),
zlab=list(label="Z", rot=90,cex=0.9),
xlab=list(label="X", rot=15.5),
ylab=list(label="Y", rot=-33),
drape = T, 
at=seq(min(dat$z),max(dat$z),length=50), 
col.regions=rgb(colorRamp(c("white", "red"))(seq(0, 1, length = 50)), max = 
255),
colorkey = F,
par.box=list(lwd=2), ## line width of box 
screen = list(z = 210, x = -75, y = 5),
scpos=list(x=9,y=5,z=2) ## where axes are draw
)

Thank you all very much for the help. It's fun to learn.

John

--- On Thu, 4/8/10, Felix Andrews  wrote:

> From: Felix Andrews 
> Subject: Re: [R] 3-D response surface using wireframe()
> To: "David Winsemius" 
> Cc: "array chip" , r-help@r-project.org
> Date: Thursday, April 8, 2010, 9:56 PM
> On 9 April 2010 11:18, David
> Winsemius 
> wrote:
> > On Apr 8, 2010, at 8:29 PM, array chip wrote:
> >
> >> David,
> >>
> >> Thansk again! Sarkar's Lattice book is excellent
> source for lattice. Here
> >> is a link for all the figures and codes used in
> the book. You example is
> >> figure 13.7.
> >>
> >> http://lmdvr.r-forge.r-project.org/figures/figures.html
> >>
> >> I got the first point! For the second point below,
> Figure 13.7 an
> >> excellent example for projecting the 3D dataset
> onto the bounding surface,
> >> but it's not what I meant. I think I didn't
> explain what I wanted clearly.
> >> What I really wanted is a simple straight grid
> lines across the tick marks
> >> for 3 bounding surfaces of the box, not a
> projection of the 3D dataset. Hope
> >> I have explained clearly this time.
> >
> > You have not convinced me that I misunderstood what
> you wanted. I figured
> > that you would use something other than transforming
> the data driven contour
> > lines. But if you want to use a lattice function there
> is a panel.grid, but
> > I still suspect it will need to be 3dto3d transformed
> onto one of the "lim"
> > extremes.
> 
> Might be a little easier to use panel.3dpolygon from
> latticeExtra.
> (or not)
> e.g. something like
> 
> wireframe(volcano, drape = TRUE, scales = list(arrows =
> FALSE),
>   panel.3d.wireframe = function(x,y,z,...) {
>     panel.3dwire(x,y,z,...)
>     panel.3dpolygon(x = rep(pretty(x), each = 3),
> y = min(y), z =
> c(range(z),NA),
>                
>     ..., border="grey", lwd=2)
>   })
> 
> 
> >
> >>
> >> Many thanks
> >>
> >> John
> >>
> >>
> >> --- On Thu, 4/8/10, David Winsemius 
> wrote:
> >>
> >>> From: David Winsemius 
> >>> Subject: Re: [R] 3-D response surface using
> wireframe()
> >>> To: "array chip" 
> >>> Cc: r-help@r-project.org
> >>> Date: Thursday, April 8, 2010, 3:46 PM
> >>>
> >>> On Apr 8, 2010, at 3:13 PM, array chip wrote:
> >>>
> >>>> David,
> >>>>
> >>>> That does the job! Thanks a lot.
> >>>>
> >>>> Now I am very very close to what I want.
> Still have a
> >>>
> >>> couple of
> >>>>
> >>>> small adjustments to make.
> >>>>
> >>>> 1. I use drape=TRUE to draw grid and color
> on the
> >>>
> >>> surface, is there
> >>>>
> >>>> a parameter to adjust the density of the
> grid?
> >>>
> >>> If you mean the spacing between points, then
> isn't that
> >>> determined by
>

Re: [R] 3-D response surface using wireframe()

2010-04-09 Thread array chip
Sorry the example plot didn't go through last time, here it is:

Thanks

John

--- On Fri, 4/9/10, array chip  wrote:

> From: array chip 
> Subject: Re: [R] 3-D response surface using wireframe()
> To: "David Winsemius" , "Felix Andrews" 
> 
> Cc: r-help@r-project.org
> Date: Friday, April 9, 2010, 1:09 PM
> Hi David and Felix,
> 
> Thank you very much for your suggestions. To be honest,
> this has become beyond my understanding of lattice plots
> now. I am relatively new to lattice plots, so have no idea
> how function within function works (for example, how does
> panel.3dpolygon() within panel.3d.wireframe() within
> wirefarme() works, totally have no clue.
> 
> Felix, your example code of panel.3dpolygon() for volcano
> plot does what I want, but again, I don't know how to tweak
> your example to suit my case.
> 
> I attached an example dataset, and an example of the plot
> that I wanted to make (especially those grid lines on the 3
> bounding surfaces of the box, and if possible remove those
> front edges of the box to make it look like open). 
> 
> dat<-read.table("dat.txt",sep='\t',header=T,row.names=1)
> 
> library(lattice)
> wireframe(z ~ x*y, data = dat,
> scales = list(arrows = FALSE, cex=0.9, col="black",font=3,
> tick.number=6, z=list(tick.number=10,
> tck=0.8,distance=0.8),x=list(tck=0.8,distance=0.6),y=list(tck=0.7,distance=0.6)),
> zlim=seq(-14,4,by=2),
> zlab=list(label="Z", rot=90,cex=0.9),
> xlab=list(label="X", rot=15.5),
> ylab=list(label="Y", rot=-33),
> drape = T, 
> at=seq(min(dat$z),max(dat$z),length=50), 
> col.regions=rgb(colorRamp(c("white", "red"))(seq(0, 1,
> length = 50)), max = 255),
> colorkey = F,
> par.box=list(lwd=2), ## line width of box 
> screen = list(z = 210, x = -75, y = 5),
> scpos=list(x=9,y=5,z=2) ## where axes are draw
> )
> 
> Thank you all very much for the help. It's fun to learn.
> 
> John
> 
> --- On Thu, 4/8/10, Felix Andrews 
> wrote:
> 
> > From: Felix Andrews 
> > Subject: Re: [R] 3-D response surface using
> wireframe()
> > To: "David Winsemius" 
> > Cc: "array chip" ,
> r-help@r-project.org
> > Date: Thursday, April 8, 2010, 9:56 PM
> > On 9 April 2010 11:18, David
> > Winsemius 
> > wrote:
> > > On Apr 8, 2010, at 8:29 PM, array chip wrote:
> > >
> > >> David,
> > >>
> > >> Thansk again! Sarkar's Lattice book is
> excellent
> > source for lattice. Here
> > >> is a link for all the figures and codes used
> in
> > the book. You example is
> > >> figure 13.7.
> > >>
> > >> http://lmdvr.r-forge.r-project.org/figures/figures.html
> > >>
> > >> I got the first point! For the second point
> below,
> > Figure 13.7 an
> > >> excellent example for projecting the 3D
> dataset
> > onto the bounding surface,
> > >> but it's not what I meant. I think I didn't
> > explain what I wanted clearly.
> > >> What I really wanted is a simple straight
> grid
> > lines across the tick marks
> > >> for 3 bounding surfaces of the box, not a
> > projection of the 3D dataset. Hope
> > >> I have explained clearly this time.
> > >
> > > You have not convinced me that I misunderstood
> what
> > you wanted. I figured
> > > that you would use something other than
> transforming
> > the data driven contour
> > > lines. But if you want to use a lattice function
> there
> > is a panel.grid, but
> > > I still suspect it will need to be 3dto3d
> transformed
> > onto one of the "lim"
> > > extremes.
> > 
> > Might be a little easier to use panel.3dpolygon from
> > latticeExtra.
> > (or not)
> > e.g. something like
> > 
> > wireframe(volcano, drape = TRUE, scales = list(arrows
> =
> > FALSE),
> >   panel.3d.wireframe = function(x,y,z,...) {
> >     panel.3dwire(x,y,z,...)
> >     panel.3dpolygon(x = rep(pretty(x), each = 3),
> > y = min(y), z =
> > c(range(z),NA),
> >                
> >     ..., border="grey", lwd=2)
> >   })
> > 
> > 
> > >
> > >>
> > >> Many thanks
> > >>
> > >> John
> > >>
> > >>
> > >> --- On Thu, 4/8/10, David Winsemius 
> > wrote:
> > >>
> > >>> From: David Winsemius 
> > >>> Subject: Re: [R] 3-D response surf

Re: [R] 3-D response surface using wireframe()

2010-04-09 Thread array chip
David, 

Thanks for the 2 previous posts from Sarkar. Actually, I am now one step 
closer. I am now able to remove the 3 outer lines of the bounding box using 
par.box argument, even Sarkar said in his 2008 post that par.box() does not 
control different boundaries, so maybe it was fixed. 

Replacing "par.box=list(lwd=2)" in my original code with 
"par.box=list(lwd=2,col=c(1,1,1,NA,1,1,NA,NA,1))" will now remove the 3 outer 
lines of the bounding box. The only thing missing here is the 3 inner lines of 
the box (behind the plot) are dashed lines, not solid. And par.box argument 
only control those 9 visible lines of the bounding box.

As for how to draw grid lines onto the 3 surfaces, I still have no clue. But as 
you pointed out Sarkar indicated in his 2007 post that it might be possible. 

Thanks

John

--- On Fri, 4/9/10, David Winsemius  wrote:

> From: David Winsemius 
> Subject: Re: [R] 3-D response surface using wireframe()
> To: "array chip" 
> Cc: r-help@r-project.org
> Date: Friday, April 9, 2010, 3:48 PM
> I do not think the mail server
> accepts .jpg formats which was the  
> format in which I got your attachment the first time
> (because of your  
> having copied me directly.)  I don't see much need to
> send a pdf  
> because the code you offered does work and the data made it
> through  
> (because .txt and .pdf are types that the mailserver
> accepts.)
> 
> Back in 2007 Sarkar suggested that it would be possible to
> project  
> grids on the walls of the bounding box but since the
> original poster  
> did not reply, it appears Sarkar did not deliver a worked
> solution.
> 
> http://finzi.psych.upenn.edu/R/Rhelp02/archive/95759.html
> 
> And then in 2008 he referred the questioner to the section
> of the   
> Lattice examples I earlier cited:
> 
> http://finzi.psych.upenn.edu/Rhelp10/2008-October/176466.html
> 
> -- 
> David.
> 
> On Apr 9, 2010, at 3:27 PM, array chip wrote:
> 
> > Sorry the example plot didn't go through last time,
> here it is:
> >
> > Thanks
> >
> > John
> >
> > --- On Fri, 4/9/10, array chip 
> wrote:
> >
> >> From: array chip 
> >> Subject: Re: [R] 3-D response surface using
> wireframe()
> >> To: "David Winsemius" ,
> "Felix Andrews"  >> >
> >> Cc: r-help@r-project.org
> >> Date: Friday, April 9, 2010, 1:09 PM
> >> Hi David and Felix,
> >>
> >> Thank you very much for your suggestions. To be
> honest,
> >> this has become beyond my understanding of lattice
> plots
> >> now. I am relatively new to lattice plots, so have
> no idea
> >> how function within function works (for example,
> how does
> >> panel.3dpolygon() within panel.3d.wireframe()
> within
> >> wirefarme() works, totally have no clue.
> >>
> >> Felix, your example code of panel.3dpolygon() for
> volcano
> >> plot does what I want, but again, I don't know how
> to tweak
> >> your example to suit my case.
> >>
> >> I attached an example dataset, and an example of
> the plot
> >> that I wanted to make (especially those grid lines
> on the 3
> >> bounding surfaces of the box, and if possible
> remove those
> >> front edges of the box to make it look like
> open).
> >>
> >>
> dat<-read.table("dat.txt",sep='\t',header=T,row.names=1)
> >>
> >> library(lattice)
> >> wireframe(z ~ x*y, data = dat,
> >> scales = list(arrows = FALSE, cex=0.9,
> col="black",font=3,
> >> tick.number=6, z=list(tick.number=10,
> >> tck 
> >> = 
> >> 0.8 
> >> ,distance 
> >>
> =0.8),x=list(tck=0.8,distance=0.6),y=list(tck=0.7,distance=0.6)),
> >> zlim=seq(-14,4,by=2),
> >> zlab=list(label="Z", rot=90,cex=0.9),
> >> xlab=list(label="X", rot=15.5),
> >> ylab=list(label="Y", rot=-33),
> >> drape = T,
> >> at=seq(min(dat$z),max(dat$z),length=50),
> >> col.regions=rgb(colorRamp(c("white",
> "red"))(seq(0, 1,
> >> length = 50)), max = 255),
> >> colorkey = F,
> >> par.box=list(lwd=2), ## line width of box
> >> screen = list(z = 210, x = -75, y = 5),
> >> scpos=list(x=9,y=5,z=2) ## where axes are draw
> >> )
> >>
> >> Thank you all very much for the help. It's fun to
> learn.
> >>
> >> John
> >>
> >> --- On Thu, 4/8/10, Felix Andrews 
> >> wrote:
> >>
> >>> From

[R] smooth.spline() fucntion

2009-06-10 Thread array chip

Hi, all, I found that the smooth.spline() function produces different results 
between R and S-Plus. I was trying to play different parameters of the function 
without any success. The script of the function contains Fortran code, so it 
seems impossible to port the code from S-Plus to R (or I may be wrong). Can 
anyone suggest anything so that I can produce the same result in R as in 
S-plus? 

Thanks

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sparse PCA using nsprcomp package

2013-09-05 Thread array chip
Hi all, I am using nsprcomp() from nsprcomp package to run sparse PCA. The 
output is very much like regular PCA by prcomp() in that it provides "sdev" for 
standard deviation of principle components (PC). 


For regular PCA by prcomp(), we can easily calculate the percent of total 
variance explained by the first k PCs by using cumsum(obj$sdev^2) because these 
PCs are independent of each other so you can simply add up the variance of 
these PCs. For sparse PCA, as far as I understand, the generated PCs are not 
independent of each other anymore, so you can not simply add up variances to 
calculate percentage of variance explained by the first k PCs. For example, in 
the package of elasticnet where spca() also performs sparse PCA, one of the 
output from spca() is "pev" for percent explained variation which is based on 
so-called "adjusted" variance that adjusted for the fact that these variances 
of PCs are not independent anymore.

My question is for nsprcomp, how can I calculate percent explained variation by 
using "sdev" when I know these PCs are not independent of each other?

Thanks!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sparse PCA using nsprcomp package

2013-09-05 Thread array chip
HI Christian,

Thanks so much for the detailed explanation! I look forward to the new release 
of nsprcomp package! At the meantime, I will use the function below for 
calculation of "adjusted" standard deviation. I have 2 more questions, hope you 
can shed some lights on:

1). Assume now I can calculate these "adjusted" standard deviation from sparse 
PCA, should the percent variation explained by each sparse PC be calculated 
using the sum of all these "adjusted" variance (i.e. square of the "adjusted" 
standard deviation) as the denominator (then these percent variation explained 
will always add up to 1 if all sparse PCs are counted, or using the sum of the 
PC variances estimated by REGULAR PCA as the denominator (then, adding up all 
PCs may not be equal to 1)?

2). How do you choose the 2 important parameters in nsprcomp(), ncomp and k? If 
for example, my regular PCA showed that I need 20 PCs to account for 80% of the 
variation in my dataset, does it mean I should set ncomp=20? And then what 
about any rules setting the value of "k"?

3). Would you recommend nscumcomp() or nsprcomp() in general?

Thanks so much again,

John



 From: Christian Sigg 

Cc: r-help@r-project.org 
Sent: Thursday, September 5, 2013 2:43 PM
Subject: Re: [R] sparse PCA using nsprcomp package


Hi John

I am currently traveling and have sporadic net access, I therefore can only 
answer briefly. It's also quite late, I hope what follows still makes sense...

> For regular PCA by prcomp(), we can easily calculate the percent of total 
> variance explained by the first k PCs by using cumsum(obj$sdev^2) because 
> these PCs are independent of each other so you can simply add up the variance 
> of these PCs. For sparse PCA, as far as I understand, the generated PCs are 
> not independent of each other anymore, so you can not simply add up variances 
> to calculate percentage of variance explained by the first k PCs. For 
> example, in the package of elasticnet where spca() also performs sparse PCA, 
> one of the output from spca() is "pev" for percent explained variation which 
> is based on so-called "adjusted" variance that adjusted for the fact that 
> these variances of PCs are not independent anymore.

You are correct that measuring explained variance is more involved with sparse 
(or non-negative) PCA, because the principal axes no longer correspond to 
eigenvectors of the covariance matrix, and are usually not even orthogonal.

The next update for the 'nsprcomp' package is almost done, and one of the 
changes will concern the reported standard deviations. In the current version 
(0.3), the standard deviations are computed from the scores matrix X*W, where X 
is the data matrix and W is the (pseudo-)rotation matrix consisting of the 
sparse loadings. Computing variance this way has the advantage that 'sdev' is 
consistent with the scores matrix, but it has the disadvantage that some of the 
explained variance is counted more than once because of the non-orthogonality 
of the principal axes. One of the symptoms of this counting is that the 
variance of a later component can actually exceed the variance of an earlier 
component, which is not possible in regular PCA.

In the new version of the package, 'sdev' will report the _additional_ standard 
deviation of each component, i.e. the variance not explained by the previous 
components. Given a basis of the space spanned by the previous PAs, the 
variance of the PC is computed after projecting the current PA to the 
ortho-complement space of the basis. This procedure reverts back to standard 
PCA if no sparsity or non-negativity constraints are enforced on the PAs.

> My question is for nsprcomp, how can I calculate percent explained variation 
> by using "sdev" when I know these PCs are not independent of each other?

The new version of the package will do it for you. Until then, you can use 
something like the following function

asdev <- function(X, W) {
    nc <- ncol(W)
    sdev <- numeric(nc)
    Q <- qr.Q(qr(W))
    Xp <- X
    for (cc in seq_len(nc)) {
        sdev[cc] <- sd(Xp%*%W[ , cc])
        Xp <- Xp - Xp%*%Q[ , cc]%*%t(Q[ , cc])  
    }
    return(sdev)
}

to compute the additional variances for given X and W.

The package documentation will explain the above in some more detail, and I 
will also have a small blog post which compares the 'nsprcomp' and 'spca' 
routine from the 'elasticnet' package on the 'marty' data from the EMA package.

Best regards
Christian
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sparse PCA using nsprcomp package

2013-09-11 Thread array chip
Hi Christian,

Thank you so much for sharing your thoughts, I was a real pleasure to read and 
learn! Approximately when do you expect the new release of the package?


Best,

John




 From: Christian Sigg 

Cc: "r-help@r-project.org"  
Sent: Monday, September 9, 2013 8:06 AM
Subject: Re: [R] sparse PCA using nsprcomp package
 

Hi John

> 1). Assume now I can calculate these "adjusted" standard deviation from 
> sparse PCA, should the percent variation explained by each sparse PC be 
> calculated using the sum of all these "adjusted" variance (i.e. square of the 
> "adjusted" standard deviation) as the denominator (then these percent 
> variation explained will always add up to 1 if all sparse PCs are counted, or 
> using the sum of the PC variances estimated by REGULAR PCA as the denominator 
> (then, adding up all PCs may not be equal to 1)?

It depends on what you want to do with this percentage, but to me the second 
would be more meaningful. A sparse PCA will usually be truncated (fewer than 
all possible components are computed), and due to the additional constraints on 
the principal axes you will usually explain less variance than with standard 
PCA. I would want to know what I lose in a sparse PCA w.r.t. a standard PCA.

Note that you don't actually have to compute the standard PCA if you are only 
interested in the total variance of the data, i.e. the sum of all variances. 
The total variance 

1/(n-1)*sum(diag(t(X)%*%X)) 

for the zero-mean data matrix X is invariant to a rotation of the coordinate 
system and therefore identical to

Z <- X%*%W
1/(n-1)*sum(diag(t(Z)%*%Z)) 

so you can skip computing the PCA rotation matrix W. The fastest way to compute 
the total variance is probably

1/(n-1)*sum(X^2) 

because all expressions compute the squared Frobenius norm of X. 

If you want to compare variances of individual components, then compute a 
regular PCA.

I also had a look how the spca function computes the "percentage explained 
variation". I don't yet entirely understand what is going on, but the results 
differ from using the "asdev" function I mentioned in my previous reply. Keep 
that in mind if you want to compare nsprcomp to spca.

> 2). How do you choose the 2 important parameters in nsprcomp(), ncomp and k? 
> If for example, my regular PCA showed that I need 20 PCs to account for 80% 
> of the variation in my dataset, does it mean I should set ncomp=20? And then 
> what about any rules setting the value of "k"?

I don't have any hard answers for this question. 

There are a number of heuristics for choosing the number of components in 
regular PCA (e.g. the PCA book by Jolliffe presents several), and some of them 
should translate to sparse PCA. If you think that 20 PCs or 80% explained 
variance works well for regular PCA, I suggest also using 20 components in 
sparse PCA, then measure the explained variance and then increase the number of 
components (if necessary) to again achieve 80% explained variance.

Same for setting the cardinality parameter k. You could use a criterion such as 
BIC to optimize the trade-off between model fidelity and complexity, but I 
don't have any experience how well this works in practice. What I did so far 
was to check for loadings with small magnitudes. I set k such that all loadings 
have "substantial" magnitudes.

In the end what matters is what follows after running the algorithm. Are you 
directly interpreting the sparse PCA result, or is this an intermediate step in 
a complete data processing pipeline with a measurable goal? If the latter, 
choose the algortihm parameters that give you the best results at the end of 
the pipeline.

> 3). Would you recommend nscumcomp() or nsprcomp() in general?

It depends whether you want to perform a sequential or cumulative analysis of 
your data. If you want maximum variance in the first (second, third, ...) PC, 
and specify the precise cardinality of each principal axis, then use nsprcomp. 
If instead you want to only specify the total cardinality of all loadings and 
leave the distribution of non-zero loadings to the algorithm, use nscumcomp.

There will be substantial improvements to nscumcomp in the next release, if you 
want to use it I suggest you wait until then.

Regards
Christian
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sort matrix based on a specific order

2013-01-10 Thread array chip
Hi I have a character matrix with 2 columns A and B, If I want to sort the 
matrix based on the column B, but based on a specific order of characters:

mat<-cbind(c('w','x','y','z'),c('a','b','c','d'))
ind<-c('c','b','d','a')

I want "mat" to be sorted by the sequence in "ind":

 [,1] [,2]
[1,] "y"  "c" 
[2,] "x"  "b" 
[3,] "z"  "d" 
[4,] "w"  "a"

Is there any simple function that can do this?

Thanks

John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sort matrix based on a specific order

2013-01-10 Thread array chip
Thank you all for the suggestions, fantastic!





 From: Rui Barradas 

Cc: "r-help@r-project.org"  
Sent: Thursday, January 10, 2013 10:32 AM
Subject: Re: [R] sort matrix based on a specific order


Hello,

Try the following. order() gives you a permutation of the vector
'ind' and to order that permutation gives its inverse.


mat <- cbind(c('w','x','y','z'),c('a','b','c','d'))
ind <- c('c','b','d','a')

ord <- order(ind)
mat[order(ord), ]


Hope this helps,

Rui Barradas

Em 10-01-2013 18:21, array chip escreveu:

Hi I have a character matrix with 2 columns A and B, If I want to sort the 
matrix based on the column B, but based on a specific order of characters: 
mat<-cbind(c('w','x','y','z'),c('a','b','c','d'))
ind<-c('c','b','d','a') I want "mat" to be sorted by the sequence in "ind": 
 [,1] [,2]
[1,] "y"  "c" 
[2,] "x"  "b" 
[3,] "z"  "d" 
[4,] "w"  "a" Is there any simple function that can do this? Thanks John 
[[alternative HTML version deleted]] 
>
>
>__ R-help@r-project.org mailing 
>list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting 
>guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] weird merge()

2013-01-11 Thread array chip
Hi James,

I am trying to combine 11 data frames by column, not by row. My original 
message has 11 data text files attached, did they go through so you can try my 
codes?

Thanks

John





 From: J Toll 

Cc: "r-help@r-project.org"  
Sent: Friday, January 11, 2013 1:35 PM
Subject: Re: [R] weird merge()




Hi,
>
>I have some protein array data, each array in a separate text file. So I read 
>them in and try to combine them into a single data frame by using merge(). see 
>code below (If you download the attached data files into a specific folder, 
>the code below should work):
>
>
>fls<-list.files("C:\\folder_of_download",full.names=T) ## get file names
>prot<-list() ## a list to contain individual files
>ind<-1
>for (i in fls[c(1:11)]) {
>    cat(ind, " ")
>   
>    tmp<-read.delim(i,header=T,row.names=NULL,na.string='null')
>    colnames(tmp)[4]<-as.character(tmp$barcode[1])
>    prot[[ind]]<-tmp[,-(1:2)]
>    ind<-ind+1
>}
>
>        ## try to merge them together
>        ## not do this in a loop so I can see where the problem occurs
>pro<-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[3]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[4]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[5]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[6]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[7]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[8]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[9]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[10]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[11]],by.x=1,by.y=1,all=T)
>
>
>I noticed that starting file #8, the merge become more and more slower that 
>when it's file #11, the computer was stuck!  Originally I thought something 
>wrong with the later files, but when I change the order of merging, the 
>slow-down still happens at the 8th files to be merged.
>
>Can anyone suggest what's going on with merging?
>
>



I'm not sure exactly what you're trying to do with all that code, but if you're 
just trying to get all eleven files into a data.frame, you could do this:

allFilesAsList <- lapply(1:11, function(i) read.delim(paste("p", i, ".txt", sep 
= "")))
oneBigDataFrame <- do.call(rbind, allFilesAsList)

You may need to fix the column names.  Is that anything like what you were 
trying to do?

James
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] weird merge()

2013-01-11 Thread array chip
Hi Dennis,

Actually, I am trying to combine them by COLUMN, so 
that's why I am using merge(). The first loop was to simply read these 
protein data into R as 11 data frames, each data frame is 165 x 2. Then I
 use merge() to combine these data frames into 1 big data frame by 
column with these individual merge() statements. I didn't do it in a 
loop because I want to see at what point the merge() will start to fail.
 And it turns out the merge of the first 7 data frames is working fine. 
Starting from the 8th column, it becomes more and more slow and at the 
11th data frame it appeared stuck on my computer.

Thanks

John







 From: Dennis Murphy 

Sent: Friday, January 11, 2013 1:25 PM
Subject: Re: [R] weird merge()

Hi John:

This doesn't look right. What are you trying to do? [BTW, the variable
names in the attachments have spaces, so most of R's read functions
should choke on them. At the very least, replace the spaces with
underscores.]

If all you are trying to do is row concatenate them (since the two or
three I looked at appear to have the same structure), then it's as
simple as

pro <- do.call(rbind, prot)

If this is what you want along with an indicator for each data file,
then the ldply() function in the plyr package might be useful as an
alternative to do.call. It should return an additional variable .id
whose value corresponds to the number (or name) of the list component.

library(plyr)
pro2 <- ldply(prot, rbind)

If you want something different, then be more explicit about what you
want, because your merge() code doesn't make a lot of sense to me.


Dennis

PS: Just a little hint: if you're using (almost) the same code
repeatedly, there's probably a more efficient way to do it in R. CS
types call this the DRY principle: Don't Repeat Yourself. I know you
know this, but a little reminder doesn't hurt :)



> Hi,
>
> I have some protein array data, each array in a separate text file. So I read 
> them in and try to combine them into a single data frame by using merge(). 
> see code below (If you download the attached data files into a specific 
> folder, the code below should work):
>
>
> fls<-list.files("C:\\folder_of_download",full.names=T) ## get file names
> prot<-list() ## a list to contain individual files
> ind<-1
> for (i in fls[c(1:11)]) {
>     cat(ind, " ")
>
>     tmp<-read.delim(i,header=T,row.names=NULL,na.string='null')
>     colnames(tmp)[4]<-as.character(tmp$barcode[1])
>     prot[[ind]]<-tmp[,-(1:2)]
>     ind<-ind+1
> }
>
>         ## try to merge them together
>         ## not do this in a loop so I can see where the problem occurs
> pro<-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[3]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[4]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[5]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[6]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[7]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[8]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[9]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[10]],by.x=1,by.y=1,all=T)
> pro<-merge(pro,prot[[11]],by.x=1,by.y=1,all=T)
>
>
> I noticed that starting file #8, the merge become more and more slower that 
> when it's file #11, the computer was stuck!  Originally I thought something 
> wrong with the later files, but when I change the order of merging, the 
> slow-down still happens at the 8th files to be merged.
>
> Can anyone suggest what's going on with merging?
>
> Thanks
>
> John
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] weird merge()

2013-01-11 Thread array chip
I just figured out the reason was the column (the 1st column in each data frame 
"gene.name") by which to merge each data frame has no unique values, some 
values were repeated, so when merging, the data frame gets bigger and bigger 
exponentially.

Sorry to bother all.

John




 From: J Toll 

Cc: "r-help@r-project.org"  
Sent: Friday, January 11, 2013 1:35 PM
Subject: Re: [R] weird merge()




Hi,
>
>I have some protein array data, each array in a separate text file. So I read 
>them in and try to combine them into a single data frame by using merge(). see 
>code below (If you download the attached data files into a specific folder, 
>the code below should work):
>
>
>fls<-list.files("C:\\folder_of_download",full.names=T) ## get file names
>prot<-list() ## a list to contain individual files
>ind<-1
>for (i in fls[c(1:11)]) {
>    cat(ind, " ")
>   
>    tmp<-read.delim(i,header=T,row.names=NULL,na.string='null')
>    colnames(tmp)[4]<-as.character(tmp$barcode[1])
>    prot[[ind]]<-tmp[,-(1:2)]
>    ind<-ind+1
>}
>
>        ## try to merge them together
>        ## not do this in a loop so I can see where the problem occurs
>pro<-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[3]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[4]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[5]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[6]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[7]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[8]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[9]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[10]],by.x=1,by.y=1,all=T)
>pro<-merge(pro,prot[[11]],by.x=1,by.y=1,all=T)
>
>
>I noticed that starting file #8, the merge become more and more slower that 
>when it's file #11, the computer was stuck!  Originally I thought something 
>wrong with the later files, but when I change the order of merging, the 
>slow-down still happens at the 8th files to be merged.
>
>Can anyone suggest what's going on with merging?
>
>



I'm not sure exactly what you're trying to do with all that code, but if you're 
just trying to get all eleven files into a data.frame, you could do this:

allFilesAsList <- lapply(1:11, function(i) read.delim(paste("p", i, ".txt", sep 
= "")))
oneBigDataFrame <- do.call(rbind, allFilesAsList)

You may need to fix the column names.  Is that anything like what you were 
trying to do?

James
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error installing KEGGSOAP

2013-01-16 Thread array chip
Hi, I am new to bioconductor, trying to install KEGGSOAP package, but got 
warnings() when installing and error message when trying to load the package, 
can anyone suggest what went wrong?

many thanks

John



> source("http://bioconductor.org/biocLite.R";)
Bioconductor version 2.11 (BiocInstaller 1.8.3), ?biocLite for help
> biocLite("KEGGSOAP")
BioC_mirror: http://bioconductor.org
Using Bioconductor version 2.11 (BiocInstaller 1.8.3), R version 2.15.
Installing package(s) 'KEGGSOAP'
trying URL 
'http://bioconductor.org/packages/2.11/bioc/bin/windows/contrib/2.15/KEGGSOAP_1.32.0.zip'
Content type 'application/zip' length 69037 bytes (67 Kb)
opened URL
downloaded 67 Kb

package ‘KEGGSOAP’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
        C:\Users\yzhang\AppData\Local\Temp\RtmpawAjwx\downloaded_packages
Warning message:
installed directory not writable, cannot update packages 'acepack', 'actuar', 
'ada', 'ade4', 'ade4TkGUI',
  'agricolae', 'akima', 'ape', 'aplpack', 'arules', 'bitops', 'boot', 'Cairo', 
'car', 'caTools', 'cba',
  'chron', 'class', 'clue', 'cluster', 'coda', 'colorspace', 'CompQuadForm', 
'corpcor', 'DAAG', 'date',
  'deldir', 'descr', 'deSolve', 'devtools', 'digest', 'diptest', 'doBy', 
'DoE.wrapper', 'e1071', 'effects',
  'ENmisc', 'epiR', 'eRm', 'evaluate', 'evd', 'FactoMineR', 'fArma', 
'fAssets', 'fBasics', 'fdrtool',
  'fExoticOptions', 'fExtremes', 'fGarch', 'fields', 'flexclust', 'fMultivar', 
'fNonlinear', 'fOptions',
  'forecast', 'foreign', 'fpc', 'fracdiff', 'fRegression', 'FrF2', 
'FrF2.catlg128', 'fTrading',
  'fUnitRoots', 'gamlss', 'gamlss.data', 'gamlss.dist', 'gclus', 'gdata', 
'geoR', 'GGally', 'ggm',
  'ggplot2', 'gmodels', 'gridBase', 'gWidgets', 'gWidgetstcltk', 'HH', 
'Hmisc', 'httr', 'igraph',
  'igraph0', 'inline', 'ipred', 'isa2', 'JavaGD', 'JGR', 'kernlab', 'KernSmoot 
[... truncated]


> library(KEGGSOAP)
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following object(s) are masked from ‘package:stats’:

    xtabs

The following object(s) are masked from ‘package:base’:

    anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, get, 
intersect, lapply, Map, mapply, mget, order, paste,
    pmax, pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, 
rownames, sapply, setdiff, table, tapply, union, unique

failed to load HTTP resource
Error : .onLoad failed in loadNamespace() for 'KEGGSOAP', details:
  call: NULL
  error: 1: failed to load HTTP resource

Error: package/namespace load failed for ‘KEGGSOAP’


> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C            
              
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   
base     

other attached packages:
[1] BiocGenerics_0.4.0  BiocInstaller_1.8.3

loaded via a namespace (and not attached):
[1] codetools_0.2-8 RCurl_1.91-1.1  SSOAP_0.8-0     tools_2.15.1    
XML_3.9-4.1     XMLSchema_0.7-2
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cnfidence intervals for survfit()

2013-03-14 Thread array chip
Hi, I am wondering how the confidence interval for Kaplan-Meier estimator is 
calculated by survfit(). For example, 

> summary(survfit(Surv(time,status)~1,data),times=10)
Call: survfit(formula = Surv(rtime10, rstat10) ~ 1, data = mgi)

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   10    168  55    0.761  0.0282    0.707    0.818


I am trying to reproduce the upper and lower CI by using standard error. As far 
I understand, the default method for survfit() to calculate confidence interval 
is on the log survival scale, so:


upper CI = exp(log(0.761)+qnorm(0.975)*0.0282) = 0.804
lower CI = exp(log(0.761)-qnorm(0.975)*0.0282) = 0.720


they are not the same as the output from survfit().

Am I missing something?

Thanks

John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] meta-analysis for sensitivity

2013-04-08 Thread array chip
Hi all, I am new to meta-analysis. Is there any special package that can 
calculate "summarized" sensitivity with 95% confidence interval for a 
diagnostic test, based on sensitivities from several individual studies? 

Thanks for any suggestions.

John



 From: Jannis 
To: "r-help@r-project.org"  
Sent: Monday, April 8, 2013 11:12 AM
Subject: [R] checkUsage from codetools shows errors when function uses 
functions from loaded packages

Dear list members,


I frequently program small scripts and wrap them into functions to be able to 
check them with checkUsage. In case these functions (loaded via source or copy 
pasted to the R console) use functions from other packages, I get this error:


no visible global function definition for ‘xxx’

For example:



test = function() {
  require(plotrix)
  color.legend()
}

library(codetools)
checkUsage(test)


Can I tell codetools somehow where to look for these functions without building 
a full blown package?



Cheers
Jannis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survival estimates for covariate values

2014-05-14 Thread array chip
Hi all, let's say we can fit a Cox model with a numeric variable "x" as the 
independent variable. The we can calculate, say 10-year survival, for any given 
value of "x" (0 to 10 in increment of 0.1 in the example below):

> fit <- coxph(Surv(time, event)~x,dat)
> surv10yr<-
summary(survfit(fit,newdata=data.frame(x=seq(0,10,0.1))),times=10)

I am wondering if anyone can show me how to replicate this in SAS?

very much appreciated!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survival estimates for covariate values

2014-05-15 Thread array chip
Hi all, let's say we can fit a Cox model with a numeric variable "x" as the 
independent variable. The we can calculate, say 10-year survival, for any given 
value of "x" (0 to 10 in increment of 0.1 in the example below):


> fit <- coxph(Surv(time, event)~x,dat)
> surv10yr<-
summary(survfit(fit,newdata=data.frame(x=seq(0,10,0.1))),times=10)

I am wondering if anyone can show me how to replicate this in SAS?

very much appreciated!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] prediction based on conditional logistic regression clogit

2014-06-15 Thread array chip
Hi, I am using clogit() from survival package to do conditional logistic 
regression. I also need to make prediction on an independent dataset to 
calculate predicted probability. Here is an example:


> dat <- data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), 
> x1=rnorm(150,5,1), x2=rnorm(150,7,1.5))
> dat.test <- data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), 
> x1=rnorm(90,5,1), x2=rnorm(90,7,1.5))
> fit<-clogit(status~x1+x2+strata(set),dat)
> predict(fit,newdata=dat.test,type='expected')
Error in Surv(rep(1, 150L), status) : 
  Time and status are different lengths

Can anyone suggest what's wrong here?

Thanks!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prediction based on conditional logistic regression clogit

2014-06-16 Thread array chip
Thank you Peter. Any other suggestions are absolutely welcome!!

John




 From: peter dalgaard 

Cc: "r-help@r-project.org"  
Sent: Monday, June 16, 2014 2:22 AM
Subject: Re: [R] prediction based on conditional logistic regression clogit





> Hi, I am using clogit() from survival package to do conditional logistic 
> regression. I also need to make prediction on an independent dataset to 
> calculate predicted probability. Here is an example:
> 
> 
>> dat <- data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), 
>> x1=rnorm(150,5,1), x2=rnorm(150,7,1.5))
>> dat.test <- data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), 
>> x1=rnorm(90,5,1), x2=rnorm(90,7,1.5))
>> fit<-clogit(status~x1+x2+strata(set),dat)
>> predict(fit,newdata=dat.test,type='expected')
> Error in Surv(rep(1, 150L), status) : 
>   Time and status are different lengths
> 
> Can anyone suggest what's wrong here?
> 


The direct cause is that clogit() works by using the fact that the likelihood 
is equivalent to a coxph() likelihood with stratification and all observation 
lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), 
status) and that goes belly-up if you apply the same formula to a data set of 
different length. 

However, notice that there is no such thing as predict.clogit(), so you are 
attempting predict.coxph() on a purely formal Cox model. It is unclear to what 
extent predicted values, in the sense of coxph() are compatible with 
predictions in conditional logit models.

I'm rusty on this, but I think what you want is something like

m <- model.matrix(~ x1 + x2 - 1, data=dat.test)
pp <- exp(m %*% coef(fit))
pps <- ave(pp, dat.test$set, FUN=sum)
pp/pps

i.e. the conditional probability that an observation is a case given covariates 
and that there is on case in each set (in the data given, you have sets of 
three with one being a case, so all predicted probabilities are close to 0.33). 
For more general matched sets, I'm not really sure what one wants. Real experts 
are welcome to chime in.

-pd




> Thanks!
> 
> John
>     [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] standard error of survfit.coxph()

2014-06-27 Thread array chip
Hi, can anyone help me to understand the standard errors printed in the output 
of survfit.coxph()?

time<-sample(1:15,100,replace=T)

status<-as.numeric(runif(100,0,1)<0.2)
x<-rnorm(100,10,2)

fit<-coxph(Surv(time,status)~x)
    ### method 1

survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
conf.type='log')$std.err

 [,1]    [,2]    [,3]    [,4]   [,5]
 [1,] 0.0 0.0 0.0 0.0 0.
 [2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
 [3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
 [4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 [5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 [6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 [7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
 [8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
 [9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394
[10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
[11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
[12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413
[13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433
[14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860
[15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111

survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
conf.type='log')$time
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

    ### method 2

summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
conf.type='log'),time=10)$std.err

  1  2  3  4  5
[1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532

By reading the help of ?survfit.object and ?summary.survfit, the standard error 
provided in the output of method 1 (survfit()) was for cumulative 
hazard-log(survival), while the standard error provided in the output of method 
2 (summary.survfit()) was for survival itself, regardless of how you choose the 
value for "conf.type" ('log', 'log-log' or 'plain'). This explains why the 
standard error output is different between method 1 (10th row) and method 2.

My question is how do I get standard error estimates for log(-log(survival))?

Thanks!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] standard error of survfit.coxph()

2014-06-30 Thread array chip
Thank you Terry for the explanation!

John




 From: "Therneau, Terry M., Ph.D." 

Sent: Monday, June 30, 2014 6:04 AM
Subject: Re:  standard error of survfit.coxph()


1. The computations "behind the scenes" produce the variance of the cumulative 
hazard. 
This is true for both an ordinary Kaplan-Meier and a Cox model.  
Transformations to other 
scales are done using simple Taylor series.

   H = cumulative hazard = log(S);  S=survival
   var(H) = var(log(S))  = the starting point
   S = exp(log(S)), so  var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = 
S^2 var(H)
   var(log(log(S)) is approx (1/S^2) var(H)

2. At the time it was written, summary.survfit was used only for printing out 
the survival 
curve at selected times, and the audience for the printout wanted std(S).   
True, that was 
20 years ago, but I don't recall anyone ever asking for summary to do anything 
else.  Your 
request is not a bad idea.
   Note however that the primary impact of using log(S) or S or log(log(S)) 
scale is is on 
the confidence intervals, and they do appear per request in the summary output.

Terry T.


On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote:
> Message: 9
> Date: Fri, 27 Jun 2014 12:39:29 -0700

> To:"r-help@r-project.org"  
> Subject: [R] standard error of survfit.coxph()
> Message-ID:
>     <1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com>
> Content-Type: text/plain
>
> Hi, can anyone help me to understand the standard errors printed in the 
> output of survfit.coxph()?
>
> time<-sample(1:15,100,replace=T)
>
> status<-as.numeric(runif(100,0,1)<0.2)
> x<-rnorm(100,10,2)
>
> fit<-coxph(Surv(time,status)~x)
> ??? ### method 1
>
> survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
> conf.type='log')$std.err
>
>  [,1]??? [,2]??? [,3]??? [,4]?? [,5]
> ?[1,] 0.0 0.0 0.0 0.0 0.
> ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
> ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
> ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
> ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
> ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
> ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
> ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
> ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394
> [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
> [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
> [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413
> [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433
> [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860
> [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111
>
> survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
> conf.type='log')$time
> ?[1]? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15
>
> ??? ### method 2
>
> summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
> conf.type='log'),time=10)$std.err
>
> ? 1? 2? 3? 4? 5
> [1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532
>
> By reading the help of ?survfit.object and ?summary.survfit, the standard 
> error provided in the output of method 1 (survfit()) was for cumulative 
> hazard-log(survival), while the standard error provided in the output of 
> method 2 (summary.survfit()) was for survival itself, regardless of how you 
> choose the value for "conf.type" ('log', 'log-log' or 'plain'). This explains 
> why the standard error output is different between method 1 (10th row) and 
> method 2.
>
> My question is how do I get standard error estimates for log(-log(survival))?
>
> Thanks!
>
> John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] standard error of survfit.coxph()

2014-07-21 Thread array chip
Dear Terry,

I was trying to use your explanation of the standard error estimate from 
survfit.coxph() to verify the standard error estimates for the method of 
log(log(S)), but couldn't get the estimates correct. Here is an example using 
the lung dataset:

> fit<-coxph(Surv(time,status)~wt.loss,lung)
> surv<-survfit(fit,newdata=lung[2:5,])$surv

> logstd<-survfit(fit,newdata=lung[2:5,])$std.err  
> logstd.time<-survfit(fit,newdata=lung[2:5,])$time


    ## std error of cumulative hazard at time=60

> logstd<-logstd[logstd.time==60,]
> logstd
[1] 0.01965858 0.01965858 0.01941871 0.01969248

    ## survival S at months 60

> surv<-surv[logstd.time==60,]  
> surv
    2 3 4 5 
0.9249238 0.9249238 0.9253038 0.9263392

Since survfit()$std.err was for cumulative hazard H=log(S), thus based on your 
explanation below, the standard error for log-log method is: (1/S^2)var(H)

> loglogstd<-sqrt(1/surv^2*(logstd^2))
> loglogstd

 2  3  4  5 
0.02125427 0.02125427 0.02098631 0.02125839

    ## the upper confidence interval should be
> exp(-exp(log(-log(surv))-qnorm(0.975)*loglogstd))
    2 3 4 5 
0.9278737 0.9278737 0.9282031 0.9292363

But this is different from the output using summary.survfit:

> summary(survfit(fit,newdata=lung[2:5,],conf.type='log-log'),times=60)$upper[1,]
    2 3 4 5 
0.9534814 0.9534814 0.9535646 0.9548478

Can you please suggest what I did wrong in the calculation?

Thanks very much!

John





To: "Therneau, Terry M., Ph.D." ; "r-help@r-project.org" 
 
Sent: Monday, June 30, 2014 10:46 AM
Subject: Re:  standard error of survfit.coxph()



[[elided Yahoo spam]]

John




 From: "Therneau, Terry M., Ph.D." 

Sent: Monday, June 30, 2014 6:04 AM
Subject: Re:  standard error of survfit.coxph()


1. The computations "behind the scenes" produce the variance of the cumulative 
hazard. 
This is true for both an ordinary Kaplan-Meier and a Cox model.  
Transformations to other 
scales are done using simple Taylor series.

   H = cumulative hazard = log(S);  S=survival
   var(H) = var(log(S))  = the starting point
   S = exp(log(S)), so  var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = 
S^2 var(H)
   var(log(log(S)) is approx (1/S^2) var(H)

2. At the time it was written, summary.survfit was used only for
 printing out the survival 
curve at selected times, and the audience for the printout wanted std(S).   
True, that was 
20 years ago, but I don't recall anyone ever asking for summary to do anything 
else.  Your 
request is not a bad idea.
   Note however that the primary impact of using log(S) or S or log(log(S)) 
scale is is on 
the confidence intervals, and they do appear per request in the summary output.

Terry T.


On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote:
> Message: 9
> Date: Fri, 27 Jun 2014 12:39:29 -0700

> To:"r-help@r-project.org"  
> Subject: [R] standard error of survfit.coxph()
> Message-ID:
>     <1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com>
> Content-Type: text/plain
>
> Hi, can anyone help me to understand the standard errors printed in the 
> output of survfit.coxph()?
>
> time<-sample(1:15,100,replace=T)
>
> status<-as.numeric(runif(100,0,1)<0.2)
> x<-rnorm(100,10,2)
>
> fit<-coxph(Surv(time,status)~x)
> ??? ### method 1
>
> survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
> conf.type='log')$std.err
>
>  [,1]??? [,2]??? [,3]???
 [,4]?? [,5]
> ?[1,] 0.0 0.0 0.0 0.0 0.
> ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
> ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
> ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
> ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
> ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
> ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
> ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
> ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394
> [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
> [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
> [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413
> [13,] 0.070665160 0.071363749 0.069208056 0.066655730
 0.14976433
> [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860
> [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111
>
> survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
> conf.type='log')$time
> ?[1]? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15
>
> ??? ### method 2
>
> summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
> conf.type='log'),time=10)$std.err
>
> ? 1? 2? 3? 4? 5
> [1,] 0.04061384 0.04106186 0.03963184 0.03715246

Re: [R] standard error of survfit.coxph()

2014-07-21 Thread array chip
Terry, I figured out that variance of log(-log(S)) should be (1/H^2)var(H), not 
(1/S^2)var(H)!

Thanks

John






e...@mayo.edu>; "r-help@r-project.org"  
Sent: Monday, July 21, 2014 11:41 AM
Subject: Re:  standard error of survfit.coxph()



Dear Terry,

I was trying to use your explanation of the standard error estimate from 
survfit.coxph() to verify the standard error estimates for the method of 
log(log(S)), but couldn't get the estimates correct. Here is an example using 
the lung dataset:

> fit<-coxph(Surv(time,status)~wt.loss,lung)
> surv<-survfit(fit,newdata=lung[2:5,])$surv

> logstd<-survfit(fit,newdata=lung[2:5,])$std.err  
> logstd.time<-survfit(fit,newdata=lung[2:5,])$time


    ## std error of cumulative hazard at time=60

> logstd<-logstd[logstd.time==60,]
> logstd
[1] 0.01965858 0.01965858 0.01941871 0.01969248

    ## survival S at months 60

> surv<-surv[logstd.time==60,]  
> surv
    2 3 4 5 
0.9249238 0.9249238 0.9253038 0.9263392

Since survfit()$std.err was for cumulative hazard H=log(S), thus based on your 
explanation below, the standard error for log-log method is: (1/S^2)var(H)

> loglogstd<-sqrt(1/surv^2*(logstd^2))
> loglogstd

 2  3  4  5 
0.02125427 0.02125427 0.02098631 0.02125839

    ## the upper confidence interval should be
> exp(-exp(log(-log(surv))-qnorm(0.975)*loglogstd))
    2
 3 4 5 
0.9278737 0.9278737 0.9282031 0.9292363

But this is different from the output using summary.survfit:

> summary(survfit(fit,newdata=lung[2:5,],conf.type='log-log'),times=60)$upper[1,]
    2 3 4 5 
0.9534814 0.9534814 0.9535646 0.9548478

Can you please suggest what I did wrong in the calculation?

Thanks very much!

John





To: "Therneau, Terry M., Ph.D." ; "r-help@r-project.org" 
 
Sent: Monday, June 30, 2014 10:46 AM
Subject: Re:  standard error of survfit.coxph()



[[elided Yahoo spam]]

John




 From: "Therneau, Terry M., Ph.D." 

Sent: Monday, June 30, 2014 6:04 AM
Subject: Re:  standard error of survfit.coxph()


1. The computations "behind the scenes" produce the variance of the cumulative 
hazard. 
This is true for both an ordinary Kaplan-Meier and a Cox model.  
Transformations to other 
scales are done using simple Taylor series.

   H = cumulative hazard = log(S);  S=survival
   var(H) = var(log(S))  = the starting point
   S = exp(log(S)), so  var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = 
S^2 var(H)
   var(log(log(S)) is approx (1/S^2) var(H)

2. At the time it was written, summary.survfit was used only for
 printing out the survival 
curve at selected times, and the audience for the printout wanted std(S).   
True, that was 
20 years ago, but I don't recall anyone ever asking for summary to do anything 
else.  Your 
request is not a bad idea.
   Note however that the primary impact of using log(S) or S or log(log(S)) 
scale is is on 
the confidence intervals, and they do appear per request in the summary output.

Terry T.


On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote:
> Message: 9
> Date: Fri, 27 Jun 2014 12:39:29 -0700

> To:"r-help@r-project.org"  
> Subject: [R] standard error of survfit.coxph()
> Message-ID:
>     <1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com>
> Content-Type: text/plain
>
> Hi, can anyone help me to understand the standard errors printed in the 
> output of survfit.coxph()?
>
> time<-sample(1:15,100,replace=T)
>
> status<-as.numeric(runif(100,0,1)<0.2)
> x<-rnorm(100,10,2)
>
>
 fit<-coxph(Surv(time,status)~x)
> ??? ### method 1
>
> survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
> conf.type='log')$std.err
>
>  [,1]??? [,2]??? [,3]???
 [,4]?? [,5]
> ?[1,] 0.0 0.0 0.0 0.0 0.
> ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
> ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
> ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
> ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
> ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
> ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
> ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
> ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394
> [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
> [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
> [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413
> [13,] 0.070665160 0.071363749 0.069208056 0.066655730
 0.14976433
> [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860
> [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111
>
> survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
> conf.type='log')$time
> ?[1]? 1? 2? 3? 4? 5? 6?

Re: [R] standard error of survfit.coxph()

2014-07-21 Thread array chip


Dear Terry/All,

I was trying to use your explanation of the standard error estimate from 
survfit.coxph() to verify the standard error estimates for the method of 
log(log(S)), but couldn't get the estimates correct. Here is an example using 
the lung dataset:

> fit<-coxph(Surv(time,status)~wt.loss,lung)
> surv<-survfit(fit,newdata=lung[2:5,])$surv

> logstd<-survfit(fit,newdata=lung[2:5,])$std.err  
> logstd.time<-survfit(fit,newdata=lung[2:5,])$time


    ## std error of cumulative hazard at time=60

> logstd<-logstd[logstd.time==60,]
> logstd
[1] 0.01965858 0.01965858 0.01941871 0.01969248

    ## survival S at months 60

> surv<-surv[logstd.time==60,]  
> surv
    2 3 4 5 
0.9249238 0.9249238 0.9253038 0.9263392

Since survfit()$std.err was for cumulative hazard H=log(S), thus based on your 
explanation below, the standard error for log-log method is: (1/S^2)var(H)

> loglogstd<-sqrt(1/surv^2*(logstd^2))
> loglogstd

 2  3  4  5 
0.02125427 0.02125427 0.02098631 0.02125839

    ## the upper confidence interval should be
> exp(-exp(log(-log(surv))-qnorm(0.975)*loglogstd))
    2
 3 4 5 
0.9278737 0.9278737 0.9282031 0.9292363

But this is different from the output using summary.survfit:

> summary(survfit(fit,newdata=lung[2:5,],conf.type='log-log'),times=60)$upper[1,]
    2 3 4 5 
0.9534814 0.9534814 0.9535646 0.9548478

Can you please suggest what I did wrong in the calculation?

Thanks very much!

John




 From: "Therneau, Terry M., Ph.D." 

Sent: Monday, June 30, 2014 6:04 AM
Subject: Re:  standard error of survfit.coxph()


1. The computations "behind the scenes" produce the variance of the cumulative 
hazard. 
This is true for both an ordinary Kaplan-Meier and a Cox model.  
Transformations to other 
scales are done using simple Taylor series.

   H = cumulative hazard = log(S);  S=survival
   var(H) = var(log(S))  = the starting point
   S = exp(log(S)), so  var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = 
S^2 var(H)
   var(log(log(S)) is approx (1/S^2) var(H)

2. At the time it was written, summary.survfit was used only for printing out 
the survival 
curve at selected times, and the audience for the printout wanted std(S).   
True, that was 
20 years ago, but I don't recall anyone ever asking for summary to do anything 
else.  Your 
request is not a bad idea.
   Note however that the primary impact of using log(S) or S or log(log(S)) 
scale is is on 
the confidence intervals, and they do appear per request in the summary output.

Terry T.


On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote:
> Message: 9
> Date: Fri, 27 Jun 2014 12:39:29 -0700

> To:"r-help@r-project.org"  
> Subject: [R] standard error of survfit.coxph()
> Message-ID:
>     <1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com>
> Content-Type: text/plain
>
> Hi, can anyone help me to understand the standard errors printed in the 
> output of survfit.coxph()?
>
> time<-sample(1:15,100,replace=T)
>
> status<-as.numeric(runif(100,0,1)<0.2)
> x<-rnorm(100,10,2)
>
> fit<-coxph(Surv(time,status)~x)
> ??? ### method 1
>
> survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
> conf.type='log')$std.err
>
>  [,1]??? [,2]??? [,3]??? [,4]?? [,5]
> ?[1,] 0.0 0.0 0.0 0.0 0.
> ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
> ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
> ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
> ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
> ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
> ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
> ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
> ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394
> [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
> [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
> [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413
> [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433
> [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860
> [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111
>
> survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
> conf.type='log')$time
> ?[1]? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15
>
> ??? ### method 2
>
> summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
> conf.type='log'),time=10)$std.err
>
> ? 1? 2? 3? 4? 5
> [1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532
>
> By reading the help of ?survfit.object and ?summary.survfit, the standard 
> error provided in the output of method 1 (survfit()) was for cumulative 
> hazard-log(survival), while the standard

[R] NRI or IDI for survival data - Hmisc package

2012-11-27 Thread array chip
Hi, I am trying to calculate net reclassification improvement (NRI) and 
Inegrated Discrimination Improvement (IDI) for a survival dataset to compare 2 
risk models. It seems that the improveProb() in Hmisc package does this only 
for binary outcome, while rcorrp.cens() does take survival object, but doesn't 
output NRI or IDI. Can anyone suggest any other packages that can calculate NRI 
and IDI for survival data?

The reference for NRI extending to survival data is:
Pencina MJ, D'Agostino RB, Steyerberg EW (2011): Extensions of net 
reclassification improvement calculations to measure usefulness of new 
biomarkers. Stat in Med 30:11-21

Thanks!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] estimating survival function from Cox model

2013-12-17 Thread array chip
Hi, I have some questions on how to estimate the survival function from a Cox 
model. I know how to do this in R using survfit(). 


But let's say the model was done is another software, and I was only given the 
estimate of baseline cumulative hazard "A0(t=10)" at the specified time "t=10" 
(baseline cumulative hazard refers to when covariate X=0)and the beta estimate 
"b" for the covariate used in Cox model "X". 


So the survival function at time 10 for a given covariate value x can be 
calculated as:

A(t=10|X=x) = exp(b*x)*A0(t=10) where A is cumulative hazard when X=x
S(t=10|X=x) = exp(-A(t=10|X=x)) where S is survival function to be calculated

Now I want to calculate confidence interval for S(t=10|X=x). I think I need to 
calculate the CI for cumulative hazard A(t=10|X=x) first and then exponentiate 
it to get CI for S, based on the relationship S = exp(-A).

To get CI for A, I need to calculate the estimate of standard error of A. I 
know the other software can give me the standard error of A0, the baseline 
cumulative hazard. Based on the relationship A = exp(b*x)*A0, I guess I'll need 
the standard error for b. But how do I calculate the standard error for A based 
on standard errors for A0 and b?

Any insights would be greatly appreciated!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] estimating survival function from Cox model

2013-12-17 Thread array chip
Hi, I have some questions on how to estimate the survival function from a Cox 
model. I know how to do this in R using survfit(). 


But let's say the model was done is another software, and I was only given the 
estimate of baseline cumulative hazard "A0(t=10)" at the specified time "t=10" 
(baseline cumulative hazard refers to when covariate X=0)and the beta estimate 
"b" for the covariate used in Cox model "X". 


So the survival function at time 10 for a given covariate value x can be 
calculated as:

A(t=10|X=x) = exp(b*x)*A0(t=10) where A is cumulative hazard when X=x
S(t=10|X=x) = exp(-A(t=10|X=x)) where S is survival function to be calculated

Now I want to calculate confidence interval for S(t=10|X=x). I think I need to 
calculate the CI for cumulative hazard A(t=10|X=x) first and then exponentiate 
it to get CI for S, based on the relationship S = exp(-A).

To get CI for A, I need to calculate the estimate of standard error of A. I 
know the other software can give me the standard error of A0, the baseline 
cumulative hazard. Based on the relationship A = exp(b*x)*A0, I guess I'll need 
the standard error for b. But how do I calculate the standard error for A based 
on standard errors for A0 and b?

Any insights would be greatly appreciated!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] estimating survival function from Cox model

2013-12-17 Thread array chip
Hi, I have some questions on how to estimate the survival function from a Cox 
model. I know how to do this in R using survfit(). 


But let's say the model was done is another software, and I was only given the 
estimate of baseline cumulative hazard "A0(t=10)" at the specified time "t=10" 
(baseline cumulative hazard refers to when covariate X=0)and the beta estimate 
"b" for the covariate used in Cox model "X". 


So the survival function at time 10 for a given covariate value x can be 
calculated as:

A(t=10|X=x) = exp(b*x)*A0(t=10) where A is cumulative hazard when X=x
S(t=10|X=x) = exp(-A(t=10|X=x)) where S is survival function to be calculated

Now I want to calculate confidence interval for S(t=10|X=x). I think I need to 
calculate the CI for cumulative hazard A(t=10|X=x) first and then exponentiate 
it to get CI for S, based on the relationship S = exp(-A).

To get CI for A, I need to calculate the estimate of standard error of A. I 
know the other software can give me the standard error of A0, the baseline 
cumulative hazard. Based on the relationship A = exp(b*x)*A0, I guess I'll need 
the standard error for b. But how do I calculate the standard error for A based 
on standard errors for A0 and b?

Any insights would be greatly appreciated!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Installation folder

2014-01-08 Thread array chip
Hi, I noticed that when I install/update packages, the installation folder is 
C:/User/My Document/R, not in C:/Program Files/R. R itself was still in Program 
Files folder. Don't know how this has happened. It used to work 
ok.Any clues or how to correct the problem is 
appreciated!ThanksJohnhttp://overview.mail.yahoo.com?.src=iOS";>Sent from Yahoo Mail 
for iPhone
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different colours for LatticeExtra graphs

2014-01-08 Thread array chip
http://overview.mail.yahoo.com?.src=iOS";>Sent from Yahoo 
Mail for iPhone
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different colours for LatticeExtra graphs

2014-01-08 Thread array chip
http://overview.mail.yahoo.com?.src=iOS";>Sent from Yahoo 
Mail for iPhone
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] median survival

2014-01-28 Thread array chip
Hi, if 50% survival probability horizontal line in a Kaplan-Meier survival 
curve overlap one of the step line between 2 time points t1 and t2, the 
survfit() from survival package estimates median survival as t2 (the longest 
time point). But I saw some articles (page 23: 
http://www.amstat.org/chapters/northeasternillinois/pastevents/presentations/summer05_Ibrahim_J.pdf)
 recommend the smallest time t such time S(t)<=0.5. 

What is the convention for the definition of median survival?

Thanks

John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unique rows

2014-01-28 Thread array chip
Hi, I wanted to remove redundant rows (with same entry in columns) in a data 
frame. For example, with this data frame:

> dat<-cbind(x=c('a','a','b','b','c','c'),y=c('x','x','d','s','g','g'))
> dat
 x   y  
[1,] "a" "x"
[2,] "a" "x"
[3,] "b" "d"
[4,] "b" "s"
[5,] "c" "g"
[6,] "c" "g"

after removing the redundancy, the end results should be

 x   y  
[1,] "a" "x"
[2,] "b" "d"
[3,] "b" "s"
[4,] "c" "g"

what is the best way to do this?

Thanks

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique rows

2014-01-28 Thread array chip
sorry.. don't know unique().. such a great function





 From: Bert Gunter 

Cc: "r-help@r-project.org"  
Sent: Tuesday, January 28, 2014 2:21 PM
Subject: Re: [R] unique rows


Inline.

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch





> Hi, I wanted to remove redundant rows (with same entry in columns) in a data 
> frame. For example, with this data frame:
>
>> dat<-cbind(x=c('a','a','b','b','c','c'),y=c('x','x','d','s','g','g'))

## this is not a data frame.

And would you kindly explain why you posted here instead of reading ?unique.

-- Bert


>> dat
>      x   y
> [1,] "a" "x"
> [2,] "a" "x"
> [3,] "b" "d"
> [4,] "b" "s"
> [5,] "c" "g"
> [6,] "c" "g"
>
> after removing the redundancy, the end results should be
>
>      x   y
> [1,] "a" "x"
> [2,] "b" "d"
> [3,] "b" "s"
> [4,] "c" "g"
>
> what is the best way to do this?
>
> Thanks
>
> John
>         [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] median survival

2014-01-28 Thread array chip

please ignore. actually the median survival from survfit() is the mean of the 2 
time points.






To: R help  
Sent: Tuesday, January 28, 2014 11:27 AM
Subject: [R] median survival


Hi, if 50% survival probability horizontal line in a Kaplan-Meier survival 
curve overlap one of the step line between 2 time points t1 and t2, the 
survfit() from survival package estimates median survival as t2 (the longest 
time point). But I saw some articles (page 23: 
http://www.amstat.org/chapters/northeasternillinois/pastevents/presentations/summer05_Ibrahim_J.pdf)
 recommend the smallest time t such time S(t)<=0.5. 

What is the convention for the definition of median survival?

Thanks

John

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] power analysis is applicable or not

2013-11-12 Thread array chip
Hi, this is a statistical question rather than a pure R question. I have got 
many help from R mailing list in the past, so would like to try here and 
appreciate any input:

I conducted Mantel-Haenszel test to show that the performance of a diagnostic 
test did not show heterogeneity among 4 study sites, i.e. Mantel Haenszel test 
p value > 0.05,  so that I could conduct a meta-analysis by combining data of 
all 4 study sites. 

Now one of the reviewers for the manuscript did a powering analysis for Mantel 
Haneszel test showing that with the sample sizes I have, the power for Mantel 
Haeszel test was only 50%. So he argued that I did not have enough power for 
Mantel Haenszel test.

My usage of Mantel Haenszel was NOT to show a significant p value, instead a 
non-sginificant p value was what I was looking for because non-significant p 
value indicate NO heterogeneity among study sites. Powering analysis in general 
is to show whether you have enough sample size to ensure a statistical 
significant difference can be seen with certain likelihood. But this is not how 
I used Mantel Haenszel test. So I think in my scenario, the power analysis is 
NOT applicable because I am simply using the test to demonstrate a 
non-significant p value.

Am I correct on this view?

Thanks and appreciate any thoughts.

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] power analysis is applicable or not

2013-11-12 Thread array chip
Hi Chris,

Thanks for sharing your thoughts.

The reviewer used the heterogeneity that I observed in my study for the power 
analysis. I understand what you have descried. And I agree that with the sample 
size I have, I do not have enough power to detect the heterogeneity that I 
observed with significance.

But if let's say I have enough sample size as calculated by the power analysis, 
then I will have 80% power to detect the heterogeneity, would it be true that I 
will almost very unlikely to declare homogeneity among study sites, so that I 
would almost never be able to combine study sites? This goes to the general 
thinking that if you have a sample size large enough, you will always be able 
to make any difference statistically significant...

On the the hand, making a statistical inference using any statistical test 
(including Mantel Haenszel test), I though, is always valid regardless of 
sample size. For the heterogeneity test, I am just doing that -- making a 
statistical inference with the p value from Mantel Haenszel test.  I am not 
sure if it is correct that it is mandatory to perform a power analysis before 
attempting a statistical test.


Please share your thoughts...

Thanks

John



 From: Christopher W. Ryan 

Sent: Tuesday, November 12, 2013 6:53 PM
Subject: Re: [R] power analysis is applicable or not


John--

Well, my simple-minded way of thinking about these issues goes something
like this:

You want to know if there is heterogeneity. You gather some data and do
your MH analysis. You never know *for sure* whether there is *really*
heterogeneity in your population; all you know is whether there is any
in your sample--you concluded there was not. Your reviewer calculated
that with the sample size you used, *even if there was heterogeneity in
your population* (unknowable by you or anyone else) then your sample
size only had a 50% probability of detecting it (a 50% probability of
coming up with a p < 0.05).  Meaning there *could have been*
heterogeneity there, at a 0.05 signficance level, and you *would* have
seen it, *if* your sample size was larger.

It's when you come up with a "non-significant result" that the issue of
power is most relevant. If you already have a "significant" result, then
yes, your sample size was large enough to show a significant result.

An important question is: what *magnitude* of heterogeneity did your
reviewer assume he/she was looking for when he/she did the power
calculation?  And is that magnitude meaningful?

All this being said, power calculations are best done before recruiting
subjects or gathering data.

--Chris Ryan
SUNY Upstate Medical University
Binghamton, NY

array chip wrote:
> Hi, this is a statistical question rather than a pure R question. I have got 
> many help from R mailing list in the past, so would like to try here and 
> appreciate any input:
> 
> I conducted Mantel-Haenszel test to show that the performance of a diagnostic 
> test did not show heterogeneity among 4 study sites, i.e. Mantel Haenszel 
> test p value > 0.05,  so that I could conduct a meta-analysis by combining 
> data of all 4 study sites. 
> 
> Now one of the reviewers for the manuscript did a powering analysis for 
> Mantel Haneszel test showing that with the sample sizes I have, the power for 
> Mantel Haeszel test was only 50%. So he argued that I did not have enough 
> power for Mantel Haenszel test.
> 
> My usage of Mantel Haenszel was NOT to show a significant p value, instead a 
> non-sginificant p value was what I was looking for because non-significant p 
> value indicate NO heterogeneity among study sites. Powering analysis in 
> general is to show whether you have enough sample size to ensure a 
> statistical significant difference can be seen with certain likelihood. But 
> this is not how I used Mantel Haenszel test. So I think in my scenario, the 
> power analysis is NOT applicable because I am simply using the test to 
> demonstrate a non-significant p value.
> 
> Am I correct on this view?
> 
> Thanks and appreciate any thoughts.
> 
> John
>     [[alternative HTML version deleted]]
> 
> 
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

> 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logistic regression with 50 varaibales

2010-06-13 Thread array chip
Hi, this is not R technical question per se. I know there are many excellent 
statisticians in this list, so here my questions: I have dataset with ~1800 
observations and 50 independent variables, so there are about 35 samples per 
variable. Is it wise to build a stable multiple logistic model with 50 
independent variables? Any problem with this approach? Thanks

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression with 50 varaibales

2010-06-14 Thread array chip
Thanks Charles for the reproducible codes. I started this question because I 
was asked to take a look at such dataset, but I have doubt if it's meaningful 
to do a LR with 50 variables. I haven't got the dataset yet, thus have not 
tried any code. But again for sharing some simulation code.

have one question for your code. When you calculate common standard errors for 
coefficients using sd(coef(fit)), should you exlude intercept by doing 
sd(coef(fit)[-1])? Actually after removing intercept, the standard error 
calculate this way is very similar to one reported from 
colMeans(coef(summary(fit))), for both scenarios in your example (coef = 0.14 
or 1.0)

Another question is the 50 simulated variables have very low correlations 
(ranging from -0.1 to 0.08), that may contribute to the stable model. If some 
(not all) of the 50 variable have considerable correlation, like 0.7 or 0.8 
correlation, How would LR model behave?

Thanks

John



- Original Message 
From: Charles C. Berry 
To: Joris Meys 
Cc: r-help@r-project.org; Marc Schwartz 
Sent: Mon, June 14, 2010 8:32:02 AM
Subject: Re: [R] logistic regression with 50 varaibales

On Mon, 14 Jun 2010, Joris Meys wrote:

> Hi,
>
> Marcs explanation is valid to a certain extent, but I don't agree with
> his conclusion. I'd like to point out "the curse of
> dimensionality"(Hughes effect) which starts to play rather quickly.
>

Ahem!

... minimal, self-contained, reproducible code ...

The OPs situation may not be an impossible one:

> set.seed(54321)
> dat <- as.data.frame(matrix(rnorm(1800*50),nc=50))
> dat$y <- rbinom(1800,1,plogis(rowSums(dat)/7))
> fit <- glm(y~., dat, family=binomial)
> 1/7 # the true coef
[1] 0.1428571
> sd(coef(fit)) # roughly, the common standard error
[1] 0.05605597
> colMeans(coef(summary(fit))) # what glm() got
   Estimate Std. Errorz value   Pr(>|z|)
0.14590836 0.05380666 2.71067328 0.06354820
> # a trickier case:
> set.seed(54321)
> dat <- as.data.frame(matrix(rnorm(1800*50),nc=50))
> dat$y <- rbinom(1800,1,plogis(rowSums(dat))) # try again with coef==1
> fit <- glm(y~., dat, family=binomial)
> colMeans(coef(summary(fit)))
Estimate  Std. Error z valuePr(>|z|)
0.982944012 0.119063631 8.222138491 0.008458002
>

Finer examination of the latter fit will show some values that differ too 
far from 1.0 to agree with the asymptotic std err.

> sd(coef(fit)) # rather bigger than 0.119
[1] 0.1827462
> range(coef(fit))
[1] -0.08128586  1.25797057

And near separability may be playing here:

> cbind(
+ table(
+ cut(
+ plogis(abs(predict(fit))),
+ c( 0, 0.9, 0.99, 0.999, 0., 0.9, 1 ) ) ) )
  [,1]
(0,0.9]   453
(0.9,0.99]427
(0.99,0.999]  313
(0.999,0.]251
(0.,0.9]  173
(0.9,1]   183
>

Recall that the observed information contains a factor of

plogis( predict(fit) )* plogis( -predict(fit))

> hist(plogis( predict(fit) )* plogis( -predict(fit)))

So the effective sample size here was much reduced.

But to the OP's question, whether what you get is reasonable depends on 
what the setup is.

I wouldn't call the first of the above cases 'highly unstable'.

Which is not to say that one cannot generate difficult cases (esp. with 
correlated covariates and/or one or more highly influential covariates) 
and that the OPs case is not one of them.


HTH,

Chuck

> The curse of dimensionality is easily demonstrated looking at the
> proximity between your datapoints. Say we scale the interval in one
> dimension to be 1 unit. If you have 20 evenly-spaced observations, the
> distance between the observations is 0.05 units. To have a proximity
> like that in a 2-dimensional space, you need 20^2=400 observations. in
> a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The
> distance between your observations is important, as a sparse dataset
> will definitely make your model misbehave.
>
> Even with about 35 samples per variable, using 50 independent
> variables will render a highly unstable model, as your dataspace is
> about as sparse as it can get. On top of that, interpreting a model
> with 50 variables is close to impossible, and then I didn't even start
> on interactions. No point in trying I'd say. If you really need all
> that information, you might want to take a look at some dimension
> reduction methods first.
>
> Cheers
> Joris
>
> On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz  wrote:
>> On Jun 13, 2010, at 10:20 PM, array chip wrote:
>>
>>> Hi, this is not R technical question per se. I know there are many 
>>> excellent statisticians in this list, so here my questions: I have dataset 
>>> with ~1800 observations and 50 independent variables, so there are about 35 
>>> 

[R] xyplot help

2010-07-08 Thread array chip
Hi, I am learning xyplot. I have an example dataset attached.

plotdata<-read.table("plotdata.txt",sep='\t',header=T,row.names=1)

head(plotdata,n=4)
  y  x type
1 -4.309601 -0.7448405A
2 -4.715421  0.7875994A
3 -2.310638  0.5455310A
4 -2.685803 10.4116868A

xyplot(y~x,groups=type,plotdata, auto.key=T)

This shows different colors for different levels of "type".

Now, I want to add a fitted line to the plot, the formula is 
-1.324+0.1117*x-0.0006357*x*x

I tried the following code to do this:

xyplot(y~x,groups=type,plotdata, auto.key=T
, panel = function(x,y) { 
panel.xyplot(x,y, type='p')
x<-sort(x)
panel.lines(x,-1.324+0.1117*x-0.0006357*x*x) 

  }) 

Now, it doesn't show different colors for different levels of "type". How can I 
restore that?

Also, is there anyway to put the legend at bottom of the plot (instead at the 
top right now)? And is there anyway to print legend horizontally, instead of 
vertically as shown right now?

Thanks

John


  y   x   type
1   -4.30960132841301   -0.744840518760792  A
2   -4.71542109959628   0.787599395527709   A
3   -2.31063823031730   0.545530976094587   A
4   -2.68580284811662   10.4116867886136A
5   0.857277739439742   45.8678415780720A
6   2.4572713553700584.8497464051184A
7   2.13169532824448103.133586922502A
8   1.28410512992259105.748839904875A
9   -1.03955702898241   -0.899013850822378  A
10  0.829280624483265   2.98715094752831A
11  -0.50214940111739   1.72294341018182A
12  3.1178629464421320.4231486028650A
13  3.3424685053647269.3541875183234A
14  2.70345650675251111.307639403626A
15  4.00616663408449124.004258887019A
16  5.1806910416212 128.613638097411A
17  -6.26534533776664   -0.944438526313875  A
18  -3.78598472716979   2.03694726901381A
19  -4.21383059434262.33837780519899A
20  1.8936558564966421.6366580986433A
21  4.5840279115829167.5861096660383A
22  2.05571359621679107.949510371836A
23  4.05242444227599112.126993688471A
24  4.34316209268166116.153400981747A
25  -2.12763929077743   -1.08663195011827   A
26  -2.10662282720267   -0.426141043811507  A
27  -3.01995286978557   -1.10752325011792   A
28  -1.64834644316891   7.78787375907454A
29  1.2440114496483745.3160993898581A
30  4.0353582382757787.1430284366417A
31  1.29543626134492105.003743707372A
32  3.84919041637386108.601951968250A
33  -3.2442183465   -0.382454625559666  A
34  -1.37712133953019   0.300186521658450   A
35  -2.12844632825254   -0.814335902708673  A
36  1.373118028256679.791747074437  A
37  -0.393168361803423  52.1011610030969A
38  2.66057348869074101.966194385881A
39  1.91384920016281112.592562878277A
40  2.06188667920977117.376424285145A
41  -3.05678162571978   -0.483215996703479  A
42  -0.184239995971267  1.23596224758922A
43  -1.42855399915072   0.248774451716021   A
44  1.4984400060720517.8847352593804A
45  2.5354258682307960.0336036447702A
46  2.95329003854665102.482588510080A
47  5.0685809676014 110.619630044398A
48  4.06642471090290120.486184775892A
49  -0.503130553392357  0.702688068466441   A
50  -0.860471679766733  1.43490186430460A
51  -0.17788663078253   3.11150418059122A
52  0.986266958945663   17.0902837098301A
53  2.1938933814974557.8826565905442A
54  3.9258915764087198.2558014383485A
55  2.53121102711589108.302821306135A
56  2.34432932634521111.053272678522A
97  -3.42191838986702   -0.885224809610779  A
98  -1.59014967363784   0.109987040681714   A
99  -1.58111592784537   3.78759624156396A
100 0.975980130414523   21.9589171041898A
101 2.6572303639741463.3950961679038A
102 3.3921352446590793.1009307560951A
103 4.2956847168814499.8712158796281A
104 4.84074935161124100.052602454596A
121 -1.68332987752984   0.227299766484946   A
122 0.468472823690819   1.87223738102554A
123 -0.547570024002551  4.37927291184889A
124 0.224584723010114   28.1273391056646A
125 1.9501827239534178.8401126239

Re: [R] xyplot help

2010-07-08 Thread array chip
Thank you for pointing out panel.superpose. I got it working with panel.xyplot 
as below:

xyplot(y~x,groups=type,plotdata, auto.key=list(space='bottom',points=T)
, panel = function(x,y,subscripts, groups) { 
panel.xyplot(x,y, col=groups[subscripts])
x<-sort(x)
panel.lines(x,-1.324+0.1117*x-0.0006357*x*x) 

  }) 


type='b' doesn't give me a smooth line, so it didn't produce what I want.

Thanks

John



- Original Message ----
From: Bert Gunter 
To: array chip ; r-help@r-project.org
Sent: Thu, July 8, 2010 4:25:51 PM
Subject: RE: [R] xyplot help

Read the "panel" part of ?xyplot carefully and note ?panel.superpose.
Typically, when you have a groups argument your panel function must be of
the form 

panel = function(x,y,...){
panel.superpose(x,y,...)
etc.
}

However, I think it might all happen automagically if you do the call as:

xyplot(y~x,groups=type,plotdata, type = "b", auto.key=T)

This is also referenced in the "panel" section of the xyplot help, where you
are referred to ?panel.xyplot for info on the type argument.

So, in brief, read the docs more carefully.

Bert Gunter
Genentech Nonclinical Statistics

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of array chip
> Sent: Thursday, July 08, 2010 3:58 PM
> To: r-help@r-project.org
> Subject: [R] xyplot help
> 
> Hi, I am learning xyplot. I have an example dataset attached.
> 
> plotdata<-read.table("plotdata.txt",sep='\t',header=T,row.names=1)
> 
> head(plotdata,n=4)
>   y  x type
> 1 -4.309601 -0.7448405A
> 2 -4.715421  0.7875994A
> 3 -2.310638  0.5455310A
> 4 -2.685803 10.4116868A
> 
> xyplot(y~x,groups=type,plotdata, auto.key=T)
> 
> This shows different colors for different levels of "type".
> 
> Now, I want to add a fitted line to the plot, the formula is
> -1.324+0.1117*x-0.0006357*x*x
> 
> I tried the following code to do this:
> 
> xyplot(y~x,groups=type,plotdata, auto.key=T
> , panel = function(x,y) {
> panel.xyplot(x,y, type='p')
> x<-sort(x)
> panel.lines(x,-1.324+0.1117*x-0.0006357*x*x)
> 
>   })
> 
> Now, it doesn't show different colors for different levels of "type". How
> can I
> restore that?
> 
> Also, is there anyway to put the legend at bottom of the plot (instead at
> the
> top right now)? And is there anyway to print legend horizontally, instead
> of
> vertically as shown right now?
> 
> Thanks
> 
> John
> 
> 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] "sequeeze" a data frame

2010-09-09 Thread array chip
Hi, suppose I have a data frame as below:

dat<-cbind(expand.grid(id=c(1,2,3),time=c(0,3,6),mode=c('R','L'),rep=(1:3)),y=rnorm(54))


I kind of want to "squeeze" the data frame into a new one with averaged "y" 
over 
"rep" for the same id, time and mode. taking average is easy with tapply:

tapply(dat$y, list(dat$id, dat$time, dat$mode), mean)

But I want the result to be in the same format as "dat". Certainly, we always 
can transform the result (array) into the data frame using a lot of codes. But 
is there a simple way to do this?

Thanks

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "sequeeze" a data frame

2010-09-09 Thread array chip
Thank you David. I probably didn't make my question clear enough. In the end, I 
want a "shorter" version of the original data frame, basically with variable 
"rep" removed, then just one row per id, time and mode combination. The 
original 
data frame has 54 rows, I wish to have a data frame with only 27 rows after 
average.

Using ave() would produce the following 3 rows for example:
  idtimemode   repy avg.y
1 1  0  R  1-1.089540576-0.21924078
19   1  0  R  2 -0.743374679   -0.21924078
37   1  0  R  3  1.175192908   -0.21924078

But I only need one row:

 id  time  modeavg.y
110   R-0.21924078

Thanks again,

John



- Original Message 
From: David Winsemius 
To: array chip 
Cc: r-help@r-project.org
Sent: Thu, September 9, 2010 3:21:15 PM
Subject: Re: [R] "sequeeze" a data frame


On Sep 9, 2010, at 5:47 PM, array chip wrote:

> Hi, suppose I have a data frame as below:
> 
>dat<-cbind(expand.grid(id=c(1,2,3),time=c(0,3,6),mode=c('R','L'),rep=(1:3)),y=rnorm(54))
>)
> 
> 
> I kind of want to "squeeze" the data frame into a new one with averaged "y" 
>over
> "rep" for the same id, time and mode. taking average is easy with tapply:
> 
> tapply(dat$y, list(dat$id, dat$time, dat$mode), mean)

Try:

dat$avg.y <- ave(dat$y, dat$id, dat$time, dat$mode, FUN=mean)

?ave

The syntax of ave is different than "tapply" or "by". The grouping factors are 
not presented as a list and the FUN argument comes after the ,..., so it needs 
to be named if different than the default of "mean".

> 
> But I want the result to be in the same format as "dat". Certainly, we always
> can transform the result (array) into the data frame using a lot of codes. But
> is there a simple way to do this?
> 
> Thanks

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "sequeeze" a data frame

2010-09-09 Thread array chip
Thank you Peter, yes this is what I need!

John



- Original Message 
From: Peter Alspach 
To: array chip ; David Winsemius 

Cc: "r-help@r-project.org" 
Sent: Thu, September 9, 2010 4:26:53 PM
Subject: RE: [R] "sequeeze" a data frame

Tena koe John

?aggregate

maybe?

HTH 

Peter Alspach

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of array chip
> Sent: Friday, 10 September 2010 11:13 a.m.
> To: David Winsemius
> Cc: r-help@r-project.org
> Subject: Re: [R] "sequeeze" a data frame
> 
> Thank you David. I probably didn't make my question clear enough. In
> the end, I
> want a "shorter" version of the original data frame, basically with
> variable
> "rep" removed, then just one row per id, time and mode combination. The
> original
> data frame has 54 rows, I wish to have a data frame with only 27 rows
> after
> average.
> 
> Using ave() would produce the following 3 rows for example:
>   idtimemode   repy avg.y
> 1 1  0  R  1-1.089540576-0.21924078
> 19   1  0  R  2 -0.743374679   -0.21924078
> 37   1  0  R  3  1.175192908   -0.21924078
> 
> But I only need one row:
> 
>  id  time  modeavg.y
> 11    0   R-0.21924078
> 
> Thanks again,
> 
> John
> 
> 
> 
> - Original Message 
> From: David Winsemius 
> To: array chip 
> Cc: r-help@r-project.org
> Sent: Thu, September 9, 2010 3:21:15 PM
> Subject: Re: [R] "sequeeze" a data frame
> 
> 
> On Sep 9, 2010, at 5:47 PM, array chip wrote:
> 
> > Hi, suppose I have a data frame as below:
> >
> >dat<-
> cbind(expand.grid(id=c(1,2,3),time=c(0,3,6),mode=c('R','L'),rep=(1:3)),
> y=rnorm(54))
> >)
> >
> >
> > I kind of want to "squeeze" the data frame into a new one with
> averaged "y"
> >over
> > "rep" for the same id, time and mode. taking average is easy with
> tapply:
> >
> > tapply(dat$y, list(dat$id, dat$time, dat$mode), mean)
> 
> Try:
> 
> dat$avg.y <- ave(dat$y, dat$id, dat$time, dat$mode, FUN=mean)
> 
> ?ave
> 
> The syntax of ave is different than "tapply" or "by". The grouping
> factors are
> not presented as a list and the FUN argument comes after the ,..., so
> it needs
> to be named if different than the default of "mean".
> 
> >
> > But I want the result to be in the same format as "dat". Certainly,
> we always
> > can transform the result (array) into the data frame using a lot of
> codes. But
> > is there a simple way to do this?
> >
> > Thanks
> 
> David Winsemius, MD
> West Hartford, CT
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

The contents of this e-mail are confidential and may be ...{{dropped:17}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lme, groupedData, random intercept and slope

2010-09-10 Thread array chip
from the output, I think it's both.



- Original Message 
From: John Sorkin 
To: r-help@r-project.org
Sent: Fri, September 10, 2010 5:25:44 AM
Subject: [R] lme, groupedData, random intercept and slope

Windows Vista
R 2.10.1




Does the following use of groupedData and lme produce an analysis with both 
random intercept and slope, or only random slope?


zz<-groupedData(y~time |  Subject,data=data.frame(data),
  labels = list( x = "Time",
y = "y" ),
  units = list( x = "(yr)", y = "(mm)")
)
plot(zz)

fit10<-lme(zz)
summary(fit10)

Linear mixed-effects model fit by REML
Data: zz 
AIC   BIC  logLik
  -123.1942 -115.2010 67.5971

Random effects:
Formula: ~time | Subject
Structure: General positive-definite
StdDev   Corr  
(Intercept) 6.054897e+00 (Intr)
time4.160662e-05 1
Residual9.775954e-04  

Fixed effects: y ~ time 
Value Std.Error DF   t-value p-value
(Intercept) 15.000217  1.914727 19 7.834   0
time-1.51  0.000219 19 -4566.598   0
Correlation: 
 (Intr)
time 0.059 

Standardized Within-Group Residuals:
Min  Q1 Med  Q3 Max 
-1.73706837 -0.36289558  0.06892484  0.59777067  1.69095476 

Number of Observations: 30
Number of Groups: 10 


John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmer fixed effects, SE, t . . . and p

2010-09-10 Thread array chip
But as far as I know, profile() seems to be de-activated in the lme4 package.



- Original Message 
From: Gavin Simpson 
To: John Sorkin 
Cc: r-help@r-project.org; Bert Gunter 
Sent: Fri, September 10, 2010 2:05:37 AM
Subject: Re: [R] lmer fixed effects, SE, t . . . and p

On Thu, 2010-09-09 at 23:40 -0400, John Sorkin wrote:
> Bert,
> I appreciate you comments, and I have read Doug Bates writing about p
> values in mixed effects regression. It is precisely because I read
> Doug's material that I asked "how are we to interpret the estimates"
> rather than "how can we compute a p value". My question is a simple
> question whose answer is undoubtedly complex, but one that needs an
> answer. Without p values, or confidence intervals, I am not certain
> what to make of the results of my analysis. Does my analysis suggest,
> or does it not suggest that there is a relation between time and y? If
> I can't answer this question after running the analysis, I don't have
> any more information than I did before I ran the analysis, and a fair
> question would be why did I run the analysis? I am asking for help not
> in calculation a p value or a CI, but rather to know what I can and
> can't say about the results of the analysis. If this basic question
> can not be answered, I am at a loss to interpret my results. 
> Thank you,
> John

Doug talks quite a lot about profiling lmer fits using 'profile
deviance' to investigate variability in fixed effects. For example, see
section 1.5 in the draft of chapter 1 of Doug's book on mixed models:

http://lme4.r-forge.r-project.org/book/

HTH

G

> John David Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> Bert 
>Gunter  9/9/2010 11:21 PM >>>
> John:
> 
> Search on this issue in the list archives. Doug Bates has addressed it
> at length. Basically, he does not calculate CI's or p-values because
> he does not know how to reliably do so.
> 
> However, the key remark in your query was:
> 
> > (2) lmer does not give p values or confidence intervals for the fixed 
>effects. How we are to interpret the estimates given that no p value or CI is 
>given for the estimates?
> 
> Think about it. A statistical analysis -- ANY statistical analysis --
> treats the data in isolation: it is not informed by physics,
> thermodynamics, biology,  other similar data, prior experience, or,
> indeed, any part of the body of relevant scientific knowledge. Do you
> really think that any such analysis, especially when predicated upon
> often tenuous or even (necessarily) unverifiable assumptions and
> simplifications should be considered authoritative? Classical
> statistical inference is just another piece of the puzzle, and not
> even particularly useful when, as if typically the case, hypotheses
> are formulated AFTER seeing the data (this invalidates the probability
> calculations -- hypotheses must be formulated before seeing the data
> to be meaningfully assessed). Leo Breiman called this statistics'
> "quiet scandal" something like 20 years ago, and he was no dummy.
> 
> It is comforting, perhaps, but illusory to believe that statistical
> inference can be relied on to give sound, objective scientific
> results. True, without such a framework, science seems rather
> subjective, perhaps closer to religion and arbitrary cultural
> archetypes than we care to admit. But see Thomas Kuhn and Paul
> Feuerabend for why this is neither surprising nor necessarily a bad
> thing.
> 
> Cheers,
> Bert Gunter
> 
> 
> 
> 
> On Thu, Sep 9, 2010 at 8:00 PM, John Sorkin  
>wrote:
> > windows Vista
> > R 2.10.1
> >
> >
> > (1) How can I get the complete table of for the fixed effects from lmer. As 
>can be seen from the example below, fixef(fit2) only give the estimates and 
>not 
>the SE or t value
> >
> >> fit3<- lmer(y~time + (1|Subject) + (time|Subject),data=data.frame(data))
> >> summary(fit3)
> > Linear mixed model fit by REML
> > Formula: y ~ time + (1 | Subject) + (time | Subject)
> >   Data: data.frame(data)
> >AICBIC logLik deviance REMLdev
> >  -126.2 -116.4   70.1   -152.5  -140.2
> > Random effects:
> >  Groups   NameVariance   Std.Dev.   Corr
> >  Subject  (Intercept) 2.9311e+01 5.41396385
> >  Subject  (Intercept) 0.e+00 0.
> >  time0.e+00 0.   NaN
> >  Residual 8.1591e-07 0.00090328
> > Number of obs: 30, groups: Subject, 10
> >
> > Fixed effects:
> > Estimate Std. Error t value
> > (Intercept) 14.998216   1.712046   9
> > time-0.999779   0.000202   -4950
> >
> > Correlation of Fixed Effects:
> > (Intr)
> > time -0.001
> >> fixef(fit3)
> > (Intercept)time
> >  14.9982158  -0.9997793
> >
> > (2) lmer does not give p values

[R] lme4a package loading error

2010-09-10 Thread array chip
Thanks for reminding this. So I found lme4a package from Doug's UserR!2010 
presentation folder: 
http://lme4.r-forge.r-project.org/slides/2010-07-20-Gaithersburg/pkg/

However, after installation, I got the following error message when trying to 
load the library:

library(Matrix)
> library(Rcpp)
> library(minqa)
> library(lme4a)
Error : classes "modelMatrix", "denseModelMatrix", "sparseModelMatrix", 
"ddenseModelMatrix", "dsparseModelMatrix", "predModule", "dPredModule", 
"sPredModule", "respModule", "glmRespMod", "nlsRespMod" are not exported by 
'namespace:Matrix'
Error: package/namespace load failed for 'lme4a'

Here is my sessionInfo()
> sessionInfo()
R version 2.11.1 (2010-05-31) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
 

[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C 
 

[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] minqa_1.1.9Rcpp_0.8.6 Matrix_0.999375-43 lattice_0.18-8

loaded via a namespace (and not attached):
[1] grid_2.11.1nlme_3.1-96splines_2.11.1 stats4_2.11.1  tools_2.11.1  

Any suggestions would be appreciated.

John





- Original Message 
From: Gavin Simpson 
To: array chip 
Cc: John Sorkin ; r-help@r-project.org; Bert 
Gunter 

Sent: Fri, September 10, 2010 10:00:15 AM
Subject: Re: [R] lmer fixed effects, SE, t . . . and p

On Fri, 2010-09-10 at 09:51 -0700, array chip wrote:
> But as far as I know, profile() seems to be de-activated in the lme4 package.

It is beta software. The lme4a version of the lme4 "package" might have
had profile re-enabled, IIRC. 

G

> - Original Message 
> From: Gavin Simpson 
> To: John Sorkin 
> Cc: r-help@r-project.org; Bert Gunter 
> Sent: Fri, September 10, 2010 2:05:37 AM
> Subject: Re: [R] lmer fixed effects, SE, t . . . and p
> 
> On Thu, 2010-09-09 at 23:40 -0400, John Sorkin wrote:
> > Bert,
> > I appreciate you comments, and I have read Doug Bates writing about p
> > values in mixed effects regression. It is precisely because I read
> > Doug's material that I asked "how are we to interpret the estimates"
> > rather than "how can we compute a p value". My question is a simple
> > question whose answer is undoubtedly complex, but one that needs an
> > answer. Without p values, or confidence intervals, I am not certain
> > what to make of the results of my analysis. Does my analysis suggest,
> > or does it not suggest that there is a relation between time and y? If
> > I can't answer this question after running the analysis, I don't have
> > any more information than I did before I ran the analysis, and a fair
> > question would be why did I run the analysis? I am asking for help not
> > in calculation a p value or a CI, but rather to know what I can and
> > can't say about the results of the analysis. If this basic question
> > can not be answered, I am at a loss to interpret my results. 
> > Thank you,
> > John
> 
> Doug talks quite a lot about profiling lmer fits using 'profile
> deviance' to investigate variability in fixed effects. For example, see
> section 1.5 in the draft of chapter 1 of Doug's book on mixed models:
> 
> http://lme4.r-forge.r-project.org/book/
> 
> HTH
> 
> G
> 
> > John David Sorkin M.D., Ph.D.
> > Chief, Biostatistics and Informatics
> > University of Maryland School of Medicine Division of Gerontology
> > Baltimore VA Medical Center
> > 10 North Greene Street
> > GRECC (BT/18/GR)
> > Baltimore, MD 21201-1524
> > (Phone) 410-605-7119
> > (Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> Bert 
> >Gunter  9/9/2010 11:21 PM >>>
> > John:
> > 
> > Search on this issue in the list archives. Doug Bates has addressed it
> > at length. Basically, he does not calculate CI's or p-values because
> > he does not know how to reliably do so.
> > 
> > However, the key remark in your query was:
> > 
> > > (2) lmer does not give p values or confidence intervals for the fixed 
> >effects. How we are to interpret the estimates given that no p value or CI 
> >is 

> >given for the estimates?
> > 
> > Think about it. A statistical analysis -- ANY statistical analysis --
> > treats the data in isolation: it is not informed by physics,
> > thermodynamics, biology,  other similar data, prior experience, or,

Re: [R] lme4a package loading error

2010-09-10 Thread array chip
Thank you for pointing ou R-forge. I tried link from R-forge for lme4a, it 
doesn't work at the time I tried (Returned "PAGE NOT FOUND".

However, the link for lme4b worked, and I installed lme4b package which can be 
loaded successfully. lme4b has lmer1() instead of lmer().

However, when trying to run lmer1(), it will tell you this warning message:

Warning message:
model.Matrix() has been moved from package 'Matrix' to the new package
'MatrixModels' which is now loaded (if installed).
  Its use from package 'Matrix' is deprecated.
Do use it from 'MatrixModels' instead. 

I think this may be why I got error message when I try to load lme4a, which may 
have not been updated to look for MatrixModels package instead of Matrix 
package.

The reason I posted to both R and R-mixed-models mailing list is that it seems 
that this question is more suitable to R-mixed-models, but response there is 
pretty slow,..

Thanks,

John


 


----- Original Message 
From: Gavin Simpson 
To: array chip 
Cc: John Sorkin ; r-help@r-project.org; Bert 
Gunter 

Sent: Fri, September 10, 2010 10:46:16 AM
Subject: Re: lme4a package loading error

On Fri, 2010-09-10 at 10:23 -0700, array chip wrote:
> Thanks for reminding this. So I found lme4a package from Doug's UserR!2010 
> presentation folder: 
> http://lme4.r-forge.r-project.org/slides/2010-07-20-Gaithersburg/pkg/

What is wrong with the one on the packages tab of the lme4 project page:

https://r-forge.r-project.org/R/?group_id=60

?

You might need to make sure you have the latest Matrix as well to run
lme4a. Update Matrix via update.packages() or install the latest version
from r-forge and see if that helps.

Also, try not to cross-post to multiple lists. Stick with one, or move
the thread onto the new list.

HTH

G

> However, after installation, I got the following error message when trying to 
> load the library:
> 
> library(Matrix)
> > library(Rcpp)
> > library(minqa)
> > library(lme4a)
> Error : classes "modelMatrix", "denseModelMatrix", "sparseModelMatrix", 
> "ddenseModelMatrix", "dsparseModelMatrix", "predModule", "dPredModule", 
> "sPredModule", "respModule", "glmRespMod", "nlsRespMod" are not exported by 
> 'namespace:Matrix'
> Error: package/namespace load failed for 'lme4a'
> 
> Here is my sessionInfo()
> > sessionInfo()
> R version 2.11.1 (2010-05-31) 
> i386-pc-mingw32 
> 
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
> States.1252  
>
> 
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                       
>  
>  
>
> 
> [5] LC_TIME=English_United States.1252    
> 
> attached base packages:
> [1] stats    graphics  grDevices utils    datasets  methods  base    
> 
> other attached packages:
> [1] minqa_1.1.9        Rcpp_0.8.6        Matrix_0.999375-43 lattice_0.18-8    
> 
> loaded via a namespace (and not attached):
> [1] grid_2.11.1    nlme_3.1-96    splines_2.11.1 stats4_2.11.1  tools_2.11.1  
> 
> Any suggestions would be appreciated.
> 
> John
> 
> 
> 
> 
> 
> - Original Message 
> From: Gavin Simpson 
> To: array chip 
> Cc: John Sorkin ; r-help@r-project.org; Bert 
>Gunter 
>
> 
> Sent: Fri, September 10, 2010 10:00:15 AM
> Subject: Re: [R] lmer fixed effects, SE, t . . . and p
> 
> On Fri, 2010-09-10 at 09:51 -0700, array chip wrote:
> > But as far as I know, profile() seems to be de-activated in the lme4 
package.
> 
> It is beta software. The lme4a version of the lme4 "package" might have
> had profile re-enabled, IIRC. 
> 
> G
> 
> > - Original Message 
> > From: Gavin Simpson 
> > To: John Sorkin 
> > Cc: r-help@r-project.org; Bert Gunter 
> > Sent: Fri, September 10, 2010 2:05:37 AM
> > Subject: Re: [R] lmer fixed effects, SE, t . . . and p
> > 
> > On Thu, 2010-09-09 at 23:40 -0400, John Sorkin wrote:
> > > Bert,
> > > I appreciate you comments, and I have read Doug Bates writing about p
> > > values in mixed effects regression. It is precisely because I read
> > > Doug's material that I asked "how are we to interpret the estimates"
> > > rather than "how can we compute a p value". My question is a simple
> > > question whose answer is undoubtedly complex, but one that needs an
> > > answer. Without p values, or confidence intervals, I am not certain
> > > what to make of the results of my analysis. Does my analysis suggest,
> > > or does it not suggest that there is a relation between time and y? If
&

[R] xyplot legends

2010-09-13 Thread array chip
Hi all, I

When I plot both lines and points using type=c('l', 'p') in xyplot(), if I want 
to include in legend both of them using keys=list(lines=list(col=1:3), 
points=list(pch=1:3)), the lines and points are plotted side by side in legend. 
Is there anyway to plot the points in the middle of the lines instead? 


that is the default is plotting like this:
- o

but I want something like this:
---o---

Thanks for any suggestions

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] xyplot axis line width

2010-09-13 Thread array chip
Hi, another question: is there any argument that controls the line width of 
axis 
box of xyplot()? I tried lwd=2 or lwd.axis=2 in xyplot() or within 
scales=list() 
argument, without success.

Thanks

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xyplot legends

2010-09-13 Thread array chip
Thanks David. It almost does what I wanted, except it's plotting the point 
characters 3 time for each line (left, middle and right): o---o---o

I can live with that if there is no way to get rid of the point characters at 
the ends.

Thanks very much!

John





- Original Message 
From: David Winsemius 
To: array chip 
Cc: r-help@r-project.org
Sent: Mon, September 13, 2010 4:05:04 PM
Subject: Re: [R] xyplot legends


On Sep 13, 2010, at 6:25 PM, array chip wrote:

> Hi all, I
> 
> When I plot both lines and points using type=c('l', 'p') in xyplot(), if I 
want
> to include in legend both of them using keys=list(lines=list(col=1:3),
> points=list(pch=1:3)), the lines and points are plotted side by side in 
legend.
> Is there anyway to plot the points in the middle of the lines instead?

It's "key", not "keys"

See if this is closer to what you had in mind:

key=list(type="o", lines=list(col=1:3,pch=1:3, size=4 )

> 
> 
> that is the default is plotting like this:
> - o
> 
> but I want something like this:
> ---o---

The best I could do was:
-o--o--o-

--David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xyplot axis line width

2010-09-13 Thread array chip
Thanks for the hint. But I want to change the line thickness, mgp() concerns 
the 
the spacing. Also I think par() doesn't have any effect on lattice plots.

It seems to be an easy thing to make axis line thicker, anyone has nay 
suggestions?

Thanks

John



- Original Message 
From: Daisy Englert Duursma 
To: array chip 
Cc: r-help@r-project.org
Sent: Mon, September 13, 2010 8:05:53 PM
Subject: Re: [R] xyplot axis line width

check out ?par for all the details on plotting

‘mgp’ The margin line (in ‘mex’ units) for the axis title,
        axis labels and axis line.  Note that ‘mgp[1]’ affects
        ‘title’ whereas ‘mgp[2:3]’ affect ‘axis’.  The
        default is ‘c(3, 1, 0)’.





On Tue, Sep 14, 2010 at 8:56 AM, array chip  wrote:
> Hi, another question: is there any argument that controls the line width of 
>axis
> box of xyplot()? I tried lwd=2 or lwd.axis=2 in xyplot() or within 
>scales=list()
> argument, without success.
>
> Thanks
>
> John
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xyplot axis line width

2010-09-14 Thread array chip
Thank you Deepayan. This is exactly what I needed.

John



- Original Message 
From: Deepayan Sarkar 
To: array chip 
Cc: r-help@r-project.org
Sent: Tue, September 14, 2010 4:52:29 AM
Subject: Re: [R] xyplot axis line width

On Tue, Sep 14, 2010 at 4:26 AM, array chip  wrote:
> Hi, another question: is there any argument that controls the line width of 
>axis
> box of xyplot()? I tried lwd=2 or lwd.axis=2 in xyplot() or within 
>scales=list()
> argument, without success.

xyplot(1:10 ~ 1:10, par.settings = list(axis.line = list(lwd = 2)))

If you do not want this to affect the tick marks as well, then you
additionally need

scales = list(lwd = 1)

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmer() vs. lme() gave different variance component estimates

2010-09-17 Thread array chip
Hi, I asked this on mixed model mailing list, but that list is not very active, 
so I'd like to try the general R mailing list. Sorry if anyone receives the 
double post.


Hi, I have a dataset of animals receiving some eye treatments. There are 8 

treatments, each animal's right and left eye was measured with some scores 
(ranging from 0 to 7) 4 times after treatment. So there are nesting groups eyes 
within animal. Dataset attached

> dat<-read.table("dat.txt",sep='\t',header=T,row.names=1)
> dat$id<-factor(dat$id)
> str(dat)
'data.frame':   640 obs. of  5 variables:
$ score: int  0 2 0 7 4 7 0 2 0 7 ...
$ id   : Factor w/ 80 levels "1","3","6","10",..: 7 48 66 54 18 26 38 52 39 63 
...
$ rep  : int  1 1 1 1 1 1 1 1 1 1 ...
$ eye  : Factor w/ 2 levels "L","R": 2 2 2 2 2 2 2 2 2 2 ...
$ trt  : Factor w/ 8 levels "A","B","C","Control",..: 1 1 1 1 1 1 1 1 1 1 ...

I fit a mixed model using both lmer() from lme4 package and lme() from nlme 
package:

> lmer(score~trt+(1|id/eye),dat)

Linear mixed model fit by REML 
Formula: score ~ trt + (1 | id/eye) 
   Data: dat 
   AIC   BIC logLik deviance REMLdev
446.7 495.8 -212.4430.9   424.7
Random effects:
Groups   NameVariance   Std.Dev.  
eye:id   (Intercept) 6.9208e+00 2.630742315798
id   (Intercept) 1.4471e-16 0.00012030
Residual 1.8750e-02 0.136930641909
Number of obs: 640, groups: eye:id, 160; id, 80

> summary(lme(score~trt, random=(~1|id/eye), dat))

Linear mixed-effects model fit by REML
Data: dat 
   AIC  BIClogLik
  425.1569 474.0947 -201.5785

Random effects:
Formula: ~1 | id
(Intercept)
StdDev:1.873576

Formula: ~1 | eye %in% id
(Intercept)  Residual
StdDev:1.896126 0.1369306

As you can see, the variance components estimates of random effects are quite 
different between the 2 model fits. From the data, I know that the variance 
component for "id" can't be near 0, which is what lmer() fit produced, so I 
think the lme() fit is correct while lmer() fit is off. This can also be seen 
from AIC, BIC etc. lme() fit has better values than lmer() fit. 


I guess this might be due to lmer() didn't converge very well, is there anyway 
to adjust to make lmer() converge better to get similar results as lme()?

Thanks

John


  score   id  rep eye trt
1   0   15  1   R   A
2   2   72  1   R   A
3   0   102 1   R   A
4   7   82  1   R   A
5   4   28  1   R   A
6   7   42  1   R   A
7   0   60  1   R   A
8   2   79  1   R   A
9   0   61  1   R   A
10  7   95  1   R   A
11  0   15  2   R   A
12  2   72  2   R   A
13  0   102 2   R   A
14  7   82  2   R   A
15  4   28  2   R   A
16  7   42  2   R   A
17  0   60  2   R   A
18  2   79  2   R   A
19  0   61  2   R   A
20  7   95  2   R   A
21  0   15  3   R   A
22  2   72  3   R   A
23  0   102 3   R   A
24  7   82  3   R   A
25  4   28  3   R   A
26  7   42  3   R   A
27  0   60  3   R   A
28  2   79  3   R   A
29  0   61  3   R   A
30  7   95  3   R   A
31  0   15  4   R   A
32  2   72  4   R   A
33  0   102 4   R   A
34  7   82  4   R   A
35  4   28  4   R   A
36  7   42  4   R   A
37  0   60  4   R   A
38  2   79  4   R   A
39  0   61  4   R   A
40  7   95  4   R   A
41  3   15  1   L   A
42  0   72  1   L   A
43  0   102 1   L   A
44  5   82  1   L   A
45  0   28  1   L   A
46  7   42  1   L   A
47  0   60  1   L   A
48  0   79  1   L   A
49  0   61  1   L   A
50  7   95  1   L   A
51  3   15  2   L   A
52  0   72  2   L   A
53  0   102 2   L   A
54  5   82  2   L   A
55  0   28  2   L   A
56  7   42  2   L   A
57  0   60  2   L   A
58  0   79  2   L   A
59  0   61  2   L   A
60  7   95  2   L   A
61  3   15  3   L   A
62  0   72  3   L   A
63  0   102 3   L   A
64  5   82  3  

Re: [R] lmer() vs. lme() gave different variance component estimates

2010-09-17 Thread array chip


Thank you Peter. Actually 3 people from mixed model mailing list tried my code 
using lmer(). They got the same results as what I got from lme4(). So they 
couldn't replicate my lmer() results:

Random effects:
Groups   NameVariance Std.Dev.
eye:id   (Intercept) 3.59531  1.89613 
id   (Intercept) 3.51025  1.87357 
Residual 0.01875  0.13693 
Number of obs: 640, groups: eye:id, 160; id, 80

The only difference they can think of is they are using Mac and FreeBSD while 
mine is PC. I can't imagine this can be the reason. I re-install lme4 package, 
but still got weird results with lmer().

Per your suggestion, here is the results for aov()

summary(aov(score~trt+Error(id/eye), data=dat))

Error: id
  Df Sum Sq Mean Sq F valuePr(>F)
trt7 1353.6 193.378   4.552 0.0002991 ***
Residuals 72 3058.7  42.482  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Error: id:eye
  Df Sum Sq Mean Sq F value Pr(>F)
Residuals 80   115214.4   

Error: Within
   Df Sum Sq Mean Sq F value Pr(>F)
Residuals 480  9 0.01875

As can be seen, thr within subject variance estimate (0.01875) is the same 
between aov, lmer and lme. But I am not sure how to relate results between aov 
and lmer/lme for the other 2 variance components (id and eye%in%id).

Thanks

John






- Original Message 
From: Peter Dalgaard 
To: array chip 
Cc: r-help@r-project.org
Sent: Fri, September 17, 2010 1:05:27 PM
Subject: Re: [R] lmer() vs. lme() gave different variance component estimates

On 09/17/2010 09:14 PM, array chip wrote:
> Hi, I asked this on mixed model mailing list, but that list is not very 
> active, 
>
> so I'd like to try the general R mailing list. Sorry if anyone receives the 
> double post.
> 
> 
> Hi, I have a dataset of animals receiving some eye treatments. There are 8 
> 
> treatments, each animal's right and left eye was measured with some scores 
> (ranging from 0 to 7) 4 times after treatment. So there are nesting groups 
> eyes 
>
> within animal. Dataset attached
> 
>> dat<-read.table("dat.txt",sep='\t',header=T,row.names=1)
>> dat$id<-factor(dat$id)
>> str(dat)
> 'data.frame':   640 obs. of  5 variables:
> $ score: int  0 2 0 7 4 7 0 2 0 7 ...
> $ id   : Factor w/ 80 levels "1","3","6","10",..: 7 48 66 54 18 26 38 52 39 
> 63 

> ...
> $ rep  : int  1 1 1 1 1 1 1 1 1 1 ...
> $ eye  : Factor w/ 2 levels "L","R": 2 2 2 2 2 2 2 2 2 2 ...
> $ trt  : Factor w/ 8 levels "A","B","C","Control",..: 1 1 1 1 1 1 1 1 1 1 ...
> 
> I fit a mixed model using both lmer() from lme4 package and lme() from nlme 
> package:
> 
>> lmer(score~trt+(1|id/eye),dat)
> 
> Linear mixed model fit by REML 
> Formula: score ~ trt + (1 | id/eye) 
>Data: dat 
>AIC   BIC logLik deviance REMLdev
> 446.7 495.8 -212.4430.9   424.7
> Random effects:
> Groups   NameVariance   Std.Dev.  
> eye:id   (Intercept) 6.9208e+00 2.630742315798
> id   (Intercept) 1.4471e-16 0.00012030
> Residual 1.8750e-02 0.136930641909
> Number of obs: 640, groups: eye:id, 160; id, 80
> 
>> summary(lme(score~trt, random=(~1|id/eye), dat))
> 
> Linear mixed-effects model fit by REML
> Data: dat 
>AIC  BIClogLik
>   425.1569 474.0947 -201.5785
> 
> Random effects:
> Formula: ~1 | id
> (Intercept)
> StdDev:1.873576
> 
> Formula: ~1 | eye %in% id
> (Intercept)  Residual
> StdDev:1.896126 0.1369306
> 
> As you can see, the variance components estimates of random effects are quite 
> different between the 2 model fits. From the data, I know that the variance 
> component for "id" can't be near 0, which is what lmer() fit produced, so I 
> think the lme() fit is correct while lmer() fit is off. This can also be seen 
> from AIC, BIC etc. lme() fit has better values than lmer() fit. 
> 
> 
> I guess this might be due to lmer() didn't converge very well, is there 
> anyway 

> to adjust to make lmer() converge better to get similar results as lme()?

That's your guess... I'd be more careful about jumping to conclusions.
If this is a balanced data set, perhaps you could supply the result of

summary(aov(score~trt+Error(id/eye), data=dat))

The correct estimates should be computable from the ANOVA table.


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmer() vs. lme() gave different variance component estimates

2010-09-20 Thread array chip
Thank you Peter for your explanation of relationship between aov and lme. It 
makes perfect sense. 


When you said "you might have computed the average of all 8
measurements on each animal and computed a 1-way ANOVA" for treatment effect, 
would this be the case for balanced design, or it is also true for unbalanced 
data?

Another question is if 1-way ANOVA is equivalent to mixed model for testing 
treatment effect, what would be reason why mixed model is used? Just to 
estimate 
the variance components? If the interest is not in the estimation of variance 
components, then there is no need to run mixed models to test treatment effects?

And my last question is I am glad to find that glht() from multcomp package 
works well with a lmer() fit for multiple comparisons. Given Professor Bates's 
view that denominator degree's of freedom is not well defined in mixed models, 
are the results from glht() reasonable/meaningful? If not, will the suggested 
1-way ANOVA used together with glht() give us correct post-hoc multiple 
comparsion results?

Thank you very much!

John





- Original Message 
From: Peter Dalgaard 
To: array chip 
Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org
Sent: Sat, September 18, 2010 1:35:45 AM
Subject: Re: [R] lmer() vs. lme() gave different variance component estimates


For a nested design, the relation is quite straightforward: The residual
MS are the variances of sample means scaled to be comparable with the
residuals (so that in the absense of random components, all
MS are equal to within the F-ratio variability). So to get the id:eye
variance component, subtract the Within MS from the id:eye MS and divide
by the number of replicates (4 in this case since you have 640
observations on 160 eyes) (14.4 - 0.01875)/4 = 3.59, and similarly, the
id variance is the MS for id minus that for id:eye scaled by 8:
(42.482-14.4)/8 = 3.51.

I.e. it is reproducing the lmer results above, but of course not those
from your original post.

(Notice, by the way, that if you are only interested in the treatment
effect, you might as well have computed the average of all 8
measurements on each animal and computed a 1-way ANOVA).

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmer() vs. lme() gave different variance component estimates

2010-09-20 Thread array chip
Thank you Peter and Ben for your comments.

John


- Original Message 
From: Peter Dalgaard 
To: array chip 
Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org
Sent: Mon, September 20, 2010 12:28:43 PM
Subject: Re: [R] lmer() vs. lme() gave different variance component estimates

On 09/20/2010 08:09 PM, array chip wrote:
> Thank you Peter for your explanation of relationship between aov and lme. It 
> makes perfect sense. 
> 
> 
> When you said "you might have computed the average of all 8
> measurements on each animal and computed a 1-way ANOVA" for treatment effect, 
> would this be the case for balanced design, or it is also true for unbalanced 
> data?

It is only exactly true for a balanced design, although it can be a
practical expedient in nearly-balanced cases, especially if there is a
clearly dominant animal variation. In strongly unbalanced data, you get
reduced efficiency because animals with less data should be downweighted
(not proportionally if there is substantial between-animal variation,
though). And of course the whole thing relies on the fact that you have
individuals nested in treatment (no animals had multiple treatments)

> 
> Another question is if 1-way ANOVA is equivalent to mixed model for testing 
> treatment effect, what would be reason why mixed model is used? Just to 
>estimate 
>
> the variance components? If the interest is not in the estimation of variance 
> components, then there is no need to run mixed models to test treatment 
>effects?

Not too far off the mark. In more complex cases, there is the advantage
that the mixed model helps figure out a sensible analysis for you.


> And my last question is I am glad to find that glht() from multcomp package 
> works well with a lmer() fit for multiple comparisons. Given Professor 
> Bates's 

> view that denominator degree's of freedom is not well defined in mixed 
> models, 

> are the results from glht() reasonable/meaningful? If not, will the suggested 
> 1-way ANOVA used together with glht() give us correct post-hoc multiple 
> comparsion results?

I think Doug's view is that DFs are not _reliably_estimated_ with any of
the current procedures. In the balanced cases, they are very well
defined (well, give or take the issues with "negative variances"), and I
would expect glht() to give meaningful results. Do check the residuals
for at least approximate normality, though.


> 
> Thank you very much!
> 
> John
> 
> 
> 
> 
> 
> - Original Message 
> From: Peter Dalgaard 
> To: array chip 
> Cc: r-help@r-project.org; r-sig-mixed-mod...@r-project.org
> Sent: Sat, September 18, 2010 1:35:45 AM
> Subject: Re: [R] lmer() vs. lme() gave different variance component estimates
> 
> 
> For a nested design, the relation is quite straightforward: The residual
> MS are the variances of sample means scaled to be comparable with the
> residuals (so that in the absense of random components, all
> MS are equal to within the F-ratio variability). So to get the id:eye
> variance component, subtract the Within MS from the id:eye MS and divide
> by the number of replicates (4 in this case since you have 640
> observations on 160 eyes) (14.4 - 0.01875)/4 = 3.59, and similarly, the
> id variance is the MS for id minus that for id:eye scaled by 8:
> (42.482-14.4)/8 = 3.51.
> 
> I.e. it is reproducing the lmer results above, but of course not those
> from your original post.
> 
> (Notice, by the way, that if you are only interested in the treatment
> effect, you might as well have computed the average of all 8
> measurements on each animal and computed a 1-way ANOVA).
> 


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to make point character thicker in xyplot?

2010-09-23 Thread array chip
Is there anyway to make plotting point character being thicker in xyplot? I 
mean 
not larger which can achieved by "cex=2", but thicker. I tried lwd=2, but it 
didn't work. I know "lwd" works in regular plot() not only for lines, but also 
for points. For example

plot(1:10, lwd=2)

Thanks

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to make point character thicker in xyplot?

2010-09-23 Thread array chip
Thank you Greg. I also got it work by using panel.points (lwd=2) instead of 
using panel.xyplot()






- Original Message 
From: Greg Snow 
To: array chip ; "r-help@r-project.org" 

Sent: Thu, September 23, 2010 2:48:06 PM
Subject: RE: [R] how to make point character thicker in xyplot?

There is probably a simpler way, but if you want full customization, look at 
panel.my.symbols in the TeachingDemos package.

-Original Message-----
From: array chip 
Sent: Thursday, September 23, 2010 2:52 PM
To: r-help@r-project.org 
Subject: [R] how to make point character thicker in xyplot?


Is there anyway to make plotting point character being thicker in xyplot? I mean
not larger which can achieved by "cex=2", but thicker. I tried lwd=2, but it
didn't work. I know "lwd" works in regular plot() not only for lines, but also
for points. For example

plot(1:10, lwd=2)

Thanks

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to make point character thicker in the legend of xyplot?

2010-09-23 Thread array chip
Now I got point character thicker using panel.points(lwd=2), But I can't make 
it 
thicker in the legend of the plot. Here is an example:

xyplot(1:10~1:10,groups=rep(1:5,2),col=1:2,pch=0:1,type='p',cex=2,panel=panel.points,lwd=3,

key=list(x=0.7,y=0.2,corner=c(0,0),type='p',
points=list(col=1:2,pch=0:1,cex=2,lwd=2),
text=list(lab=c('A','B'),cex=1.5,font=2)))

Any suggestions?

Thanks

John



- Original Message 
From: array chip 
To: Greg Snow 
Cc: r-help@r-project.org
Sent: Thu, September 23, 2010 4:03:00 PM
Subject: Re: [R] how to make point character thicker in xyplot?

Thank you Greg. I also got it work by using panel.points (lwd=2) instead of 
using panel.xyplot()






- Original Message 
From: Greg Snow 
To: array chip ; "r-help@r-project.org" 

Sent: Thu, September 23, 2010 2:48:06 PM
Subject: RE: [R] how to make point character thicker in xyplot?

There is probably a simpler way, but if you want full customization, look at 
panel.my.symbols in the TeachingDemos package.

-Original Message-
From: array chip 
Sent: Thursday, September 23, 2010 2:52 PM
To: r-help@r-project.org 
Subject: [R] how to make point character thicker in xyplot?


Is there anyway to make plotting point character being thicker in xyplot? I mean
not larger which can achieved by "cex=2", but thicker. I tried lwd=2, but it
didn't work. I know "lwd" works in regular plot() not only for lines, but also
for points. For example

plot(1:10, lwd=2)

Thanks

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to make point character thicker in the legend of xyplot?

2010-09-23 Thread array chip
Yes, it does what I want. Thank you Peter! Just wondering what else grid.pars 
controls? not just the symbol in legend, right?

John



- Original Message 
From: Peter Ehlers 
To: array chip 
Cc: "r-help@r-project.org" 
Sent: Thu, September 23, 2010 4:34:44 PM
Subject: Re: [R] how to make point character thicker in the legend of xyplot?

On 2010-09-23 17:15, array chip wrote:
> Now I got point character thicker using panel.points(lwd=2), But I can't make 
>it
> thicker in the legend of the plot. Here is an example:
>
>xyplot(1:10~1:10,groups=rep(1:5,2),col=1:2,pch=0:1,type='p',cex=2,panel=panel.points,lwd=3,
>,
>
> key=list(x=0.7,y=0.2,corner=c(0,0),type='p',
> points=list(col=1:2,pch=0:1,cex=2,lwd=2),
> text=list(lab=c('A','B'),cex=1.5,font=2)))
>
> Any suggestions?
>

You can add this line to your xyplot call:

  par.settings = list(grid.pars = list(lwd = 2)),


   -Peter Ehlers

> Thanks
>
> John
>
>
>
> - Original Message 
> From: array chip
> To: Greg Snow
> Cc: r-help@r-project.org
> Sent: Thu, September 23, 2010 4:03:00 PM
> Subject: Re: [R] how to make point character thicker in xyplot?
>
> Thank you Greg. I also got it work by using panel.points (lwd=2) instead of
> using panel.xyplot()
>
>
>
>
>
>
> - Original Message 
> From: Greg Snow
> To: array chip; "r-help@r-project.org"
> 
> Sent: Thu, September 23, 2010 2:48:06 PM
> Subject: RE: [R] how to make point character thicker in xyplot?
>
> There is probably a simpler way, but if you want full customization, look at
> panel.my.symbols in the TeachingDemos package.
>
> -Original Message-
> From: array chip
> Sent: Thursday, September 23, 2010 2:52 PM
> To: r-help@r-project.org
> Subject: [R] how to make point character thicker in xyplot?
>
>
> Is there anyway to make plotting point character being thicker in xyplot? I 
>mean
> not larger which can achieved by "cex=2", but thicker. I tried lwd=2, but it
> didn't work. I know "lwd" works in regular plot() not only for lines, but also
> for points. For example
>
> plot(1:10, lwd=2)
>
> Thanks
>
> John
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] scientific vs. fixed notation in xyplot()

2010-09-27 Thread array chip
Hi I am using xyplot() to plot on the log scale by using scale=list(log=T) 
argument. For example:

xyplot(1:10~1:10, scales=list(log=T))

But the axis labels are printed as scientific notation (10^0.0, etc), instead 
of 
fixed notation. How can I change that to fixed notation?

options(scipen=4) doesn't work on xyplot()

Thanks

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scientific vs. fixed notation in xyplot()

2010-09-27 Thread array chip
Thanks for the suggestion. But my example is just an example, I would prefer to 
have some generalized solution, like what options(scipen=4) does in general 
graphics, which usually gave pretty axis labels as well.

Any suggestions?

Thanks

John



From: Henrique Dallazuanna 

Cc: r-help@r-project.org
Sent: Mon, September 27, 2010 12:16:31 PM
Subject: Re: [R] scientific vs. fixed notation in xyplot()

Try this:

 xyplot(1:10~1:10, scales=list(log = T, labels = round(log(1:10), 4)))





Hi I am using xyplot() to plot on the log scale by using scale=list(log=T)
>argument. For example:
>
>xyplot(1:10~1:10, scales=list(log=T))
>
>But the axis labels are printed as scientific notation (10^0.0, etc), instead 
of
>fixed notation. How can I change that to fixed notation?
>
>options(scipen=4) doesn't work on xyplot()
>
>Thanks
>
>John
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scientific vs. fixed notation in xyplot()

2010-09-27 Thread array chip
Hi, I found an old thread which refered to 2 examples (Fig 8.4 & Fig 8.5) from 
Dr. Sarkar's lattice book that address this problem using "xscale.components" 
and "yscale.components":

http://lmdvr.r-forge.r-project.org/figures/figures.html

I am able to change the label of x-axis, but still can't change the y-axis 
label. I think it's straight forward, but can't figure out what I am missing in 
the code, Can anyone suggest?

logTicks <- function (lim, loc = c(1, 5)) { 
ii <- floor(log10(range(lim))) + c(-1, 2) 
main <- 10^(ii[1]:ii[2]) 
r <- as.numeric(outer(loc, main, "*")) 
r[lim[1] <= r & r <= lim[2]] } 


xscale.components.log10 <- function(lim, ...) { 
ans <- xscale.components.default(lim = lim, ...)
tick.at <- logTicks(10^lim,loc=1)
ans$bottom$ticks$at <- log(tick.at, 10)
ans$bottom$labels$at <- log(tick.at, 10)
ans$bottom$labels$labels <- as.character(tick.at)
ans } 

yscale.components.log10 <- function(lim, ...) { 
ans <- yscale.components.default(lim = lim, ...)
tick.at <- logTicks(10^lim,loc=1)
ans$bottom$ticks$at <- log(tick.at, 10)
ans$bottom$labels$at <- log(tick.at, 10)
ans$bottom$labels$labels <- as.character(tick.at)
ans } 

xyplot(1:1~1:1, scales=list(log=T),xscale.components = 
xscale.components.log10,yscale.components = yscale.components.log10)

Any suggestions?

Thanks

John







- Original Message 
From: Don McKenzie 
To: Henrique Dallazuanna 
Cc: R-help Forum 
Sent: Mon, September 27, 2010 1:00:04 PM
Subject: Re: [R] scientific vs. fixed notation in xyplot()

This is quite elegant (thanks) and brings up a problem I could not solve awhile 
back, although Dr. Sarkar did his best to help.
How do I do the same thing in a panel plot?

e.g., toy example

temp.df <- 
data.frame(X=seq(1,100,by=1),Y=seq(1,50.5,by=.5),class=rep(c("A","B"),each=50))
xyplot(Y ~ X | class,data=temp.df,scales=list(x=round(log(1:100), 
4),y=round(log(1:50.5), 4),log=T))

gives me the right points on the page but still gives axis labels in scientific 
notation.

If I try to specify "labels" as a list I get an error message

> xyplot(Y ~ X | class,data=temp.df,scales=list(log = T, labels = 
>list(x=round(log(1:100), 4),y=round(log(1:50.5), 4

Error in construct.scales(log = TRUE, labels = list(x = c(0, 0.6931, 1.0986,  :
  the at and labels components of scales may not be lists when relation = same

Syntax problem in this last command?

Thanks


On 27-Sep-10, at 12:16 PM, Henrique Dallazuanna wrote:

> Try this:
> 
>  xyplot(1:10~1:10, scales=list(log = T, labels = round(log(1:10), 4)))
> 
> 
> On Mon, Sep 27, 2010 at 4:10 PM, array chip  wrote:
> 
>> Hi I am using xyplot() to plot on the log scale by using scale=list(log=T)
>> argument. For example:
>> 
>> xyplot(1:10~1:10, scales=list(log=T))
>> 
>> But the axis labels are printed as scientific notation (10^0.0, etc),
>> instead of
>> fixed notation. How can I change that to fixed notation?
>> 
>> options(scipen=4) doesn't work on xyplot()
>> 
>> Thanks
>> 
>> John
> 
> --Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Don McKenzie, Research Ecologist
Pacific WIldland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Forest Resources, College of the Environment
CSES Climate Impacts Group
University of Washington

desk: 206-732-7824
cell: 206-321-5966
d...@uw.edu
donaldmcken...@fs.fed.us

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scientific vs. fixed notation in xyplot()

2010-09-27 Thread array chip
I found where the problem is with y-axis, the yscale.components.log10 should be

yscale.components.log10 <- function(lim, ...) { 
ans <- yscale.components.default(lim = lim, ...)
tick.at <- logTicks(10^lim,loc=1)
ans$left$ticks$at <- log(tick.at, 10)
ans$left$labels$at <- log(tick.at, 10)
ans$left$labels$labels <- as.character(tick.at)
ans } 






- Original Message ----
From: array chip 
To: Don McKenzie ; Henrique Dallazuanna 

Cc: R-help Forum 
Sent: Mon, September 27, 2010 3:09:20 PM
Subject: Re: [R] scientific vs. fixed notation in xyplot()

Hi, I found an old thread which refered to 2 examples (Fig 8.4 & Fig 8.5) from 
Dr. Sarkar's lattice book that address this problem using "xscale.components" 
and "yscale.components":

http://lmdvr.r-forge.r-project.org/figures/figures.html

I am able to change the label of x-axis, but still can't change the y-axis 
label. I think it's straight forward, but can't figure out what I am missing in 
the code, Can anyone suggest?

logTicks <- function (lim, loc = c(1, 5)) { 
ii <- floor(log10(range(lim))) + c(-1, 2) 
main <- 10^(ii[1]:ii[2]) 
r <- as.numeric(outer(loc, main, "*")) 
r[lim[1] <= r & r <= lim[2]] } 


xscale.components.log10 <- function(lim, ...) { 
ans <- xscale.components.default(lim = lim, ...)
tick.at <- logTicks(10^lim,loc=1)
ans$bottom$ticks$at <- log(tick.at, 10)
ans$bottom$labels$at <- log(tick.at, 10)
ans$bottom$labels$labels <- as.character(tick.at)
ans } 

yscale.components.log10 <- function(lim, ...) { 
ans <- yscale.components.default(lim = lim, ...)
tick.at <- logTicks(10^lim,loc=1)
ans$bottom$ticks$at <- log(tick.at, 10)
ans$bottom$labels$at <- log(tick.at, 10)
ans$bottom$labels$labels <- as.character(tick.at)
ans } 

xyplot(1:1~1:1, scales=list(log=T),xscale.components = 
xscale.components.log10,yscale.components = yscale.components.log10)

Any suggestions?

Thanks

John







- Original Message 
From: Don McKenzie 
To: Henrique Dallazuanna 
Cc: R-help Forum 
Sent: Mon, September 27, 2010 1:00:04 PM
Subject: Re: [R] scientific vs. fixed notation in xyplot()

This is quite elegant (thanks) and brings up a problem I could not solve awhile 
back, although Dr. Sarkar did his best to help.
How do I do the same thing in a panel plot?

e.g., toy example

temp.df <- 
data.frame(X=seq(1,100,by=1),Y=seq(1,50.5,by=.5),class=rep(c("A","B"),each=50))
xyplot(Y ~ X | class,data=temp.df,scales=list(x=round(log(1:100), 
4),y=round(log(1:50.5), 4),log=T))

gives me the right points on the page but still gives axis labels in scientific 
notation.

If I try to specify "labels" as a list I get an error message

> xyplot(Y ~ X | class,data=temp.df,scales=list(log = T, labels = 
>list(x=round(log(1:100), 4),y=round(log(1:50.5), 4

Error in construct.scales(log = TRUE, labels = list(x = c(0, 0.6931, 1.0986,  :
  the at and labels components of scales may not be lists when relation = same

Syntax problem in this last command?

Thanks


On 27-Sep-10, at 12:16 PM, Henrique Dallazuanna wrote:

> Try this:
> 
>  xyplot(1:10~1:10, scales=list(log = T, labels = round(log(1:10), 4)))
> 
> 
> On Mon, Sep 27, 2010 at 4:10 PM, array chip  wrote:
> 
>> Hi I am using xyplot() to plot on the log scale by using scale=list(log=T)
>> argument. For example:
>> 
>> xyplot(1:10~1:10, scales=list(log=T))
>> 
>> But the axis labels are printed as scientific notation (10^0.0, etc),
>> instead of
>> fixed notation. How can I change that to fixed notation?
>> 
>> options(scipen=4) doesn't work on xyplot()
>> 
>> Thanks
>> 
>> John
> 
> --Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Don McKenzie, Research Ecologist
Pacific WIldland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Forest Resources, College of the Environment
CSES Climate Impacts Group
University of Washington

desk: 206-732-7824
cell: 206-321-5966
d...@uw.edu
donaldmcken...@fs.fed.us

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-

Re: [R] scientific vs. fixed notation in xyplot()

2010-09-27 Thread array chip
Thank you Dr. Sarkar. yscale.components.log10.3 is pretty good choice.

John



- Original Message 
From: Deepayan Sarkar 
To: array chip 
Cc: Don McKenzie ; Henrique Dallazuanna 
; R-help Forum 
Sent: Mon, September 27, 2010 8:07:15 PM
Subject: Re: [R] scientific vs. fixed notation in xyplot()

On Mon, Sep 27, 2010 at 5:28 PM, array chip  wrote:
> I found where the problem is with y-axis, the yscale.components.log10 should 
be
>
> yscale.components.log10 <- function(lim, ...) {
>    ans <- yscale.components.default(lim = lim, ...)
>    tick.at <- logTicks(10^lim,loc=1)
>    ans$left$ticks$at <- log(tick.at, 10)
>    ans$left$labels$at <- log(tick.at, 10)
>    ans$left$labels$labels <- as.character(tick.at)
>    ans }

A few such functions are now available in latticeExtra. E.g.,

library(latticeExtra)

xyplot(1:10~1:10, scales=list(log=T),
      yscale.components = yscale.components.log10.3)

xyplot(1:10~1:10, scales=list(log=T),
      yscale.components = yscale.components.log10ticks)

-Deepayan

>
> - Original Message 
> From: array chip 
> To: Don McKenzie ; Henrique Dallazuanna
> 
> Cc: R-help Forum 
> Sent: Mon, September 27, 2010 3:09:20 PM
> Subject: Re: [R] scientific vs. fixed notation in xyplot()
>
> Hi, I found an old thread which refered to 2 examples (Fig 8.4 & Fig 8.5) from
> Dr. Sarkar's lattice book that address this problem using "xscale.components"
> and "yscale.components":
>
> http://lmdvr.r-forge.r-project.org/figures/figures.html
>
> I am able to change the label of x-axis, but still can't change the y-axis
> label. I think it's straight forward, but can't figure out what I am missing 
in
> the code, Can anyone suggest?
>
> logTicks <- function (lim, loc = c(1, 5)) {
>    ii <- floor(log10(range(lim))) + c(-1, 2)
>    main <- 10^(ii[1]:ii[2])
>    r <- as.numeric(outer(loc, main, "*"))
>    r[lim[1] <= r & r <= lim[2]] }
>
>
> xscale.components.log10 <- function(lim, ...) {
>    ans <- xscale.components.default(lim = lim, ...)
>    tick.at <- logTicks(10^lim,loc=1)
>    ans$bottom$ticks$at <- log(tick.at, 10)
>    ans$bottom$labels$at <- log(tick.at, 10)
>    ans$bottom$labels$labels <- as.character(tick.at)
>    ans }
>
> yscale.components.log10 <- function(lim, ...) {
>    ans <- yscale.components.default(lim = lim, ...)
>    tick.at <- logTicks(10^lim,loc=1)
>    ans$bottom$ticks$at <- log(tick.at, 10)
>    ans$bottom$labels$at <- log(tick.at, 10)
>    ans$bottom$labels$labels <- as.character(tick.at)
>    ans }
>
> xyplot(1:1~1:1, scales=list(log=T),xscale.components =
> xscale.components.log10,yscale.components = yscale.components.log10)
>
> Any suggestions?
>
> Thanks
>
> John
>
>
>
>
>
>
>
> - Original Message 
> From: Don McKenzie 
> To: Henrique Dallazuanna 
> Cc: R-help Forum 
> Sent: Mon, September 27, 2010 1:00:04 PM
> Subject: Re: [R] scientific vs. fixed notation in xyplot()
>
> This is quite elegant (thanks) and brings up a problem I could not solve 
awhile
> back, although Dr. Sarkar did his best to help.
> How do I do the same thing in a panel plot?
>
> e.g., toy example
>
> temp.df <-
> 
data.frame(X=seq(1,100,by=1),Y=seq(1,50.5,by=.5),class=rep(c("A","B"),each=50))
> xyplot(Y ~ X | class,data=temp.df,scales=list(x=round(log(1:100),
> 4),y=round(log(1:50.5), 4),log=T))
>
> gives me the right points on the page but still gives axis labels in 
scientific
> notation.
>
> If I try to specify "labels" as a list I get an error message
>
>> xyplot(Y ~ X | class,data=temp.df,scales=list(log = T, labels =
>>list(x=round(log(1:100), 4),y=round(log(1:50.5), 4
>
> Error in construct.scales(log = TRUE, labels = list(x = c(0, 0.6931, 1.0986, 
 :
>  the at and labels components of scales may not be lists when relation = same
>
> Syntax problem in this last command?
>
> Thanks
>
>
> On 27-Sep-10, at 12:16 PM, Henrique Dallazuanna wrote:
>
>> Try this:
>>
>>  xyplot(1:10~1:10, scales=list(log = T, labels = round(log(1:10), 4)))
>>
>>
>> On Mon, Sep 27, 2010 at 4:10 PM, array chip  wrote:
>>
>>> Hi I am using xyplot() to plot on the log scale by using scale=list(log=T)
>>> argument. For example:
>>>
>>> xyplot(1:10~1:10, scales=list(log=T))
>>>
>>> But the axis labels are printed as scientific notation (10^0.0, etc),
>>> instead of
>>> fixed notation. How can I change that to fixed notation?
>>>
>>> options(scipen=4) doesn't work on xyplot()

[R] add a new column to data frame

2010-10-01 Thread array chip
Hi, I am wondering if anyone can propose a simple/best way to do the following:

Let's say I have a data frame 

dat <- 
cbind(expand.grid(mode=c('right','left'),time=0:3,id=c('p1','p2','p3')),y=c(3,5,rep(4,6),6,2,rep(3,6),4,4,rep(2,6)))


dat
mode time id y
1  right0 p1 3
2   left0 p1 5
3  right1 p1 4
4   left1 p1 4
5  right2 p1 4
6   left2 p1 4
7  right0 p2 6
8   left0 p2 2
9  right1 p2 3
10  left1 p2 3
11 right2 p2 3
12  left2 p2 3
13 right0 p3 4
14  left0 p3 4
15 right1 p3 2
16  left1 p3 2
17 right2 p3 2
18  left2 p3 2

Now I want to add a new column "type" to this data frame with values of either 
"left" or "right" for each "id" based on the following logic:

For each "id", the value of "type" column is the value of "mode" for which the 
value of "y" is the maximum of "y" based on time=0. For example for id=p1, the 
value of "type" is "left" because for the 2 "y" values (3 & 5) based on time=0, 
mode=left has the bigger "y" value (5). But if the 2 "y" values are the same 
for 
"mode", then let type=right.

In the end the new data frame is:

mode time id y  type
1  right0 p1 3  left
2   left0 p1 5  left
3  right1 p1 4  left
4   left1 p1 4  left
5  right2 p1 4  left
6   left2 p1 4  left
7  right0 p2 6 right
8   left0 p2 2 right
9  right1 p2 3 right
10  left1 p2 3 right
11 right2 p2 3 right
12  left2 p2 3 right
13 right0 p3 4 right
14  left0 p3 4 right
15 right1 p3 2 right
16  left1 p3 2 right
17 right2 p3 2 right
18  left2 p3 2 right

Any suggestions would be appreciated.

John


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] add a new column to data frame

2010-10-01 Thread array chip
Thank you Phil. The only problem with the code is when the 2 y values are 
equal, 
it should default to "right". Your code would default to the first maximum. But 
it's ok, I see the logic here and can figure it out. I wondered whether there 
is 
some one-line thing that can do this, I guess not possible.

Thank you

John





From: Phil Spector 

Cc: r-help@r-project.org
Sent: Fri, October 1, 2010 10:58:55 AM
Subject: Re: [R] add a new column to data frame

Here are two ways:

dat0 = subset(dat,time==0)

one = as.data.frame(as.table(by(dat0,dat0$id,
 function(x)as.character(x$mode)[which.max(x$y)])))
names(one) = c('id','type')
merge(dat,one)

two = sapply(split(dat0,dat0$id),
  function(x)as.character(x$mode)[which.max(x$y)])
merge(dat,data.frame(id=names(two),type=two))


- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
spec...@stat.berkeley.edu




On Fri, 1 Oct 2010, array chip wrote:

> Hi, I am wondering if anyone can propose a simple/best way to do the 
following:
>
> Let's say I have a data frame
>
> dat <-
>cbind(expand.grid(mode=c('right','left'),time=0:3,id=c('p1','p2','p3')),y=c(3,5,rep(4,6),6,2,rep(3,6),4,4,rep(2,6)))
>)
>
>
> dat
>mode time id y
> 1  right0 p1 3
> 2   left0 p1 5
> 3  right1 p1 4
> 4   left1 p1 4
> 5  right2 p1 4
> 6   left2 p1 4
> 7  right0 p2 6
> 8   left0 p2 2
> 9  right1 p2 3
> 10  left1 p2 3
> 11 right2 p2 3
> 12  left2 p2 3
> 13 right0 p3 4
> 14  left0 p3 4
> 15 right1 p3 2
> 16  left1 p3 2
> 17 right2 p3 2
> 18  left2 p3 2
>
> Now I want to add a new column "type" to this data frame with values of either
> "left" or "right" for each "id" based on the following logic:
>
> For each "id", the value of "type" column is the value of "mode" for which the
> value of "y" is the maximum of "y" based on time=0. For example for id=p1, the
> value of "type" is "left" because for the 2 "y" values (3 & 5) based on 
time=0,
> mode=left has the bigger "y" value (5). But if the 2 "y" values are the same 
>for
> "mode", then let type=right.
>
> In the end the new data frame is:
>
>mode time id y  type
> 1  right0 p1 3  left
> 2   left0 p1 5  left
> 3  right1 p1 4  left
> 4   left1 p1 4  left
> 5  right2 p1 4  left
> 6   left2 p1 4  left
> 7  right0 p2 6 right
> 8   left0 p2 2 right
> 9  right1 p2 3 right
> 10  left1 p2 3 right
> 11 right2 p2 3 right
> 12  left2 p2 3 right
> 13 right0 p3 4 right
> 14  left0 p3 4 right
> 15 right1 p3 2 right
> 16  left1 p3 2 right
> 17 right2 p3 2 right
> 18  left2 p3 2 right
>
> Any suggestions would be appreciated.
>
> John
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to retrieve user coordinates in xyplot

2010-10-08 Thread array chip
Hi, is there a way to retrieve the extremes of the user coordinates of the 
plotting region, like what par("usr") does in general graphics? I'd like to use 
them to print additional texts at certain place inside each panel. Thanks

John



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] point characters THICKER in xyplot()

2010-10-08 Thread array chip
Hi, how can I make the point characters thicker (NOT larger) in xyplot when 
groups= argument is used?

dat<-data.frame(x=1:100,y=1:100,group=rep(LETTERS[1:5],each=20))

### lwd=2 doesn't work here
xyplot(y~x,groups=group,data=dat,col=1:4,pch=1:4,lwd=2)

### lwd=2 works with panel.points(), but grouping is messed up!
xyplot(y~x,groups=group,data=dat,col=1:4,pch=1:4,
panel=function(...) {panel.points(...,lwd=2)})

### group is correct with panel.superpose(), but lwd=2 doesn't work!
xyplot(y~x,groups=group,data=dat,col=1:4,pch=1:4,
panel=function(...) {panel.superpose(...,lwd=2)})

Any suggestions?

Thanks

John


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] type II & III test for mixed model

2010-10-13 Thread array chip
Hi, is there a package for getting type II or type III tests on mixed models 
(lme or lmer), just like what Anova() in car package does for aov, lm, etc.?

Thanks

John



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] clustering on scaled dataset or not?

2010-10-28 Thread array chip
Hi, just a general question: when we do hierarchical clustering, should we 
compute the dissimilarity matrix based on scaled dataset or non-scaled dataset? 
daisy() in cluster package allow standardizing the variables before calculating 
dissimilarity matrix; but dist() doesn't have that option at all. Appreciate if 
you can share your thoughts?

Thanks

John



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logistic regression or not?

2010-12-20 Thread array chip
Hi, I have a dataset where the response for each person on one of the 2 
treatments was a proportion (percentage of certain number of markers being 
positive), I also have the number of positive & negative markers available for 
each person. what is the best way to analyze this kind of data?

I can think of analyzing this data using glm() with the attached dataset:

test<-read.table('test.txt',sep='\t')
fit<-glm(cbind(positive,total-positive)~treatment,test,family=binomial)
summary(fit)
anova(fit, test='Chisq')

First, is this still called logistic regression or something else? I thought 
with logistic regression, the response variable is a binary factor?

Second, then summary(fit) and anova(fit, test='Chisq') gave me different p 
values, why is that? which one should I use?

Third, is there an equivalent model where I can use variable "percentage" 
instead of "positive" & "total"?

Finally, what is the best way to analyze this kind of dataset where it's almost 
the same as ANOVA except that the response variable is a proportion (or success 
and failure)?

Thanks

John



  "treatment" "total" "positive"  "percentage"
"1" "exposed"   11  4   0.363636363636364
"2" "exposed"   10  4   0.4
"3" "exposed"   9   4   0.444
"4" "exposed"   7   4   0.571428571428571
"5" "exposed"   7   4   0.571428571428571
"6" "exposed"   6   5   0.833
"8" "exposed"   12  7   0.583
"9" "exposed"   8   5   0.625
"10""exposed"   13  12  0.923076923076923
"11""exposed"   10  5   0.5
"12""control"   10  1   0.1
"13""control"   11  2   0.181818181818182
"14""control"   8   0   0
"16""control"   12  1   0.0833
"15""control"   8   0   0
"17""control"   10  1   0.1
"18""control"   10  1   0.1
"19""control"   8   1   0.125
"20""control"   8   0   0
"21""control"   9   1   0.111
"22""control"   10  1   0.1
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression or not?

2010-12-21 Thread array chip
Thank you Ben, Steve and Peter. 

Ben, my last question was to see if there are other ways of analyzing this type 
of data where the response variable is a proportion, in addition to binomial 
regression. 


BTW, I also found the following is also an equivalent model directly using 
percentage:

glm(log(percentage/(1-percentage))~treatment,data=test)

Thanks

John

 




From: Ben Bolker 
To: r-h...@stat.math.ethz.ch
Sent: Tue, December 21, 2010 5:08:34 AM
Subject: Re: [R] logistic regression or not?

array chip  yahoo.com> writes:

[snip]

> I can think of analyzing this data using glm() with the attached dataset:
> 
> test<-read.table('test.txt',sep='\t')
> fit<-glm(cbind(positive,total-positive)~treatment,test,family=binomial)
> summary(fit)
> anova(fit, test='Chisq')

> First, is this still called logistic regression or something else? I thought 
> with logistic regression, the response variable is a binary factor?

  Sometimes I've seen it called "binomial regression", or just 
"a binomial generalized linear model"

> Second, then summary(fit) and anova(fit, test='Chisq') gave me different p 
> values, why is that? which one should I use?

  summary(fit) gives you p-values from a Wald test.
  anova() gives you tests based on the Likelihood Ratio Test.
  In general the LRT is more accurate.

> Third, is there an equivalent model where I can use variable "percentage"
> instead of "positive" & "total"?

  glm(percentage~treatment,weights=total,data=tests,family=binomial)

is equivalent to the model you fitted above.
> 
> Finally, what is the best way to analyze this kind of dataset 
> where it's almost the same as ANOVA except that the response variable
>  is a proportion (or success and failure)?

  Don't quite know what you mean here.  How is the situation "almost
the same as ANOVA" different from the situation you described above?
Do you mean when there are multiple factors? or ???

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression or not?

2010-12-21 Thread array chip
Ben, thanks again.

John





From: Ben Bolker 

Cc: r-h...@stat.math.ethz.ch; S Ellison ; peter dalgaard 

Sent: Tue, December 21, 2010 9:26:29 AM
Subject: Re: [R] logistic regression or not?

On 10-12-21 12:20 PM, array chip wrote:
> Thank you Ben, Steve and Peter.
>  
> Ben, my last question was to see if there are other ways of analyzing
> this type of data where the response variable is a proportion, in
> addition to binomial regression.
>  
> BTW, I also found the following is also an equivalent model directly
> using percentage:
>  
> glm(log(percentage/(1-percentage))~treatment,data=test)
>  
> Thanks
>  
> John
> 

  Yes, but this is a different model.

  The model you have here uses Gaussian errors (it is in fact an
identical model, although not necessarily quite an identical algorithm
(?), to just using lm().  It will fail if you have any percentages that
are 0 or 1.  See Stuart's comment about how things were done in the "old
days".

  Beta regression (see e.g. the betareg package) is another way of
handling analysis of proportions.

> 
> 
> *From:* Ben Bolker 
> *To:* r-h...@stat.math.ethz.ch
> *Sent:* Tue, December 21, 2010 5:08:34 AM
> *Subject:* Re: [R] logistic regression or not?
> 
> array chip  yahoo.com <http://yahoo.com/>> writes:
> 
> [snip]
> 
>> I can think of analyzing this data using glm() with the attached dataset:
>>
>> test<-read.table('test.txt',sep='\t')
>> fit<-glm(cbind(positive,total-positive)~treatment,test,family=binomial)
>> summary(fit)
>> anova(fit, test='Chisq')
> 
>> First, is this still called logistic regression or something else? I
> thought
>> with logistic regression, the response variable is a binary factor?
> 
>  Sometimes I've seen it called "binomial regression", or just
> "a binomial generalized linear model"
> 
>> Second, then summary(fit) and anova(fit, test='Chisq') gave me
> different p
>> values, why is that? which one should I use?
> 
>  summary(fit) gives you p-values from a Wald test.
>  anova() gives you tests based on the Likelihood Ratio Test.
>  In general the LRT is more accurate.
> 
>> Third, is there an equivalent model where I can use variable "percentage"
>> instead of "positive" & "total"?
> 
>  glm(percentage~treatment,weights=total,data=tests,family=binomial)
> 
> is equivalent to the model you fitted above.
>>
>> Finally, what is the best way to analyze this kind of dataset
>> where it's almost the same as ANOVA except that the response variable
>>  is a proportion (or success and failure)?
> 
>  Don't quite know what you mean here.  How is the situation "almost
> the same as ANOVA" different from the situation you described above?
> Do you mean when there are multiple factors? or ???
> 
> __
> R-help@r-project.org <mailto:R-help@r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RData size

2011-01-04 Thread array chip
Hi, I noticed a Rdata size issue that's puzzling to me. Attached find 2 example 
datasets in text file. Both are 100x5, so the sizes for both text file are the 
same. However, when I read them into R, the sizes are much different:

tt<-as.matrix(read.table("tt.txt",header=T,row.names=1))
save(tt,file='tt.RData')

tt.big<-as.matrix(read.table("tt.big.txt",header=T,row.names=1))
save(tt.big,file='tt.big.RData')

"tt.RData" is 2KB while "tt.big.RData" is 5KB. This is not a big deal with the 
example datasets, but my real datasets are much larger, the difference is 35MB 
vs. 1MB for the RData objects. 


The difference between the 2 datasets above is that "tt.big" is a smoothed 
version of "tt", so there are a lot less unique values in tt.big than tt, I 
guess this is the reason for the difference in sizes of RData objects, can 
anyone confirm? 


Thanks

John


  PC1 PC2 PC3 PC4 PC5
SNP_A-1677174   1.6838481.9758262.3556821.587903
1.157294
SNP_A-1718890   2.0614551.9265522.19278 2.0716532.712994
SNP_A-1678466   2.1051241.9758262.1284371.587903
1.157294
SNP_A-1676440   2.1051241.9758262.1284371.587903
1.157294
SNP_A-1662392   2.1051241.9758262.1284371.587903
1.157294
SNP_A-1685736   2.1051241.9758262.1284371.587903
1.157294
SNP_A-1681384   2.1051241.9758262.1284371.587903
1.157294
SNP_A-1668776   2.0614551.9265522.19278 1.5546990.616379
SNP_A-1642581   1.5606471.2978831.4581621.587903
1.157294
SNP_A-1669029   1.5606471.2978831.4581621.587903
1.157294
SNP_A-1723597   2.0614551.9265522.19278 1.5546990.616379
SNP_A-1718237   1.5606471.2978831.4581621.016043
1.157294
SNP_A-1748467   1.5606471.2978831.4581621.016043
1.157294
SNP_A-1728870   2.0614551.9265522.19278 1.5546990.616379
SNP_A-1669946   2.0614551.9265522.19278 1.5546990.616379
SNP_A-1708099   2.0614551.9265522.19278 1.5546990.616379
SNP_A-1716798   2.0614551.9265522.19278 1.5546990.616379
SNP_A-1701872   2.0614551.9265522.19278 1.5546990.616379
SNP_A-1705537   2.3656062.6521552.5829832.002516
1.023379
SNP_A-1683756   2.3656062.6521552.5829832.002516
1.023379
SNP_A-1697748   1.6250971.5435591.5434831.554699
0.616379
SNP_A-1699138   1.6250971.5435591.5434831.554699
0.616379
SNP_A-1696782   2.3656062.6521552.5829832.002516
1.398077
SNP_A-1750020   1.6250971.5435591.5434831.554699
0.616379
SNP_A-1673422   2.3656062.6521552.5829831.672339
1.398077
SNP_A-1670878   1.5691111.53458 1.3282691.6723392.015989
SNP_A-1743511   1.5691111.53458 1.3282691.6723392.015989
SNP_A-1646481   1.9624921.9290512.4172741.554699
2.752331
SNP_A-1736635   1.5691111.53458 1.3282691.6723393.106939
SNP_A-1666738   1.5691111.53458 2.1444731.6723393.106939
SNP_A-1721407   1.5691111.53458 2.1444731.6723393.106939
SNP_A-1686722   1.5691111.53458 2.1444731.6723393.106939
SNP_A-1701882   1.9624921.9290512.4172741.554699
2.752331
SNP_A-1701982   1.9624921.9290512.4172741.554699
2.752331
SNP_A-1705703   2.2070682.522.1444731.672339
3.106939
SNP_A-1737197   2.2070682.522.1444731.672339
3.106939
SNP_A-1754224   2.2070682.522.1444731.672339
3.106939
SNP_A-1749896   1.9624921.9290512.4172741.554699
1.85133
SNP_A-1750090   1.9624921.9290512.4172741.554699
1.85133
SNP_A-1642231   2.2070682.522.1444731.672339
2.196275
SNP_A-1714480   1.9624921.9290512.4172741.554699
1.85133
SNP_A-1712765   2.2070682.522.1444731.672339
2.196275
SNP_A-1710770   2.2070682.522.1444731.672339
1.518637
SNP_A-1649717   1.9624921.9290511.8893781.554699
1.85133
SNP_A-1684884   2.2070682.522.1444731.672339
1.518637
SNP_A-1658786   2.2070682.522.1444731.672339
1.518637
SNP_A-1688090   2.2070682.522.1444731.672339

[R] algorithm help

2011-01-06 Thread array chip
Hi, I am seeking help on designing an algorithm to identify the locations of 
stretches of 1s in a vector of 0s and 1s. Below is an simple example:

> dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T)
  ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97)))

> dat
   a  b
1  0  4
2  0 12
3  1 13
4  1 16
5  1 18
6  1 20
7  0 28
8  0 30
9  1 34
10 1 46
11 0 47
12 1 49
13 1 61
14 1 73
15 1 77
16 0 84
17 0 87
18 0 90
19 0 95
20 1 97

In this dataset, "b" is sorted and denotes the location for each number in "a". 
So I would like to find the starting & ending locations for each stretch of 1s 
within "a", also counting the number of 1s in each stretch as well. Hope the 
results from the algorithm would be:

stretch   start   end   No.of.1s
1 13  204
2 34  462
3 49  774
4 97  971

I can imagine using for loops can do the job, but I feel it's not a clever way 
to do this. Is there an efficient algorithm that can do this fast?

Thanks for any suggestions.

John


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] algorithm help

2011-01-06 Thread array chip
Thanks very much, Ted. Yes, it does what I need!

I made a routine to do this:

f.fragment<-function(a,b) {
dat<-as.data.frame(cbind(a,b))

L <- rle(dat$a)$lengths
V <- rle(dat$a)$values
pos <- c(1,cumsum(L))
V1 <- c(-1,V)
start<-1+pos[V1==0]
end<-pos[V1==1]
 
cbind(stretch=1:length(start),start=dat$b[start]
 ,end=dat$b[end],no.of.1s=L[V==1])

}

f.fragment(dat$a,dat$b)

 stretch start end no.of.1s
[1,]   113  204
[2,]   234  462
[3,]   349  774
[4,]   497  971

John





From: "ted.hard...@wlandres.net" 

Cc: r-h...@stat.math.ethz.ch
Sent: Thu, January 6, 2011 2:57:47 PM
Subject: RE: [R] algorithm help

On 06-Jan-11 22:16:38, array chip wrote:
> Hi, I am seeking help on designing an algorithm to identify the
> locations of stretches of 1s in a vector of 0s and 1s. Below is
> an simple example:
> 
>> dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T)
>   ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97)))
> 
>> dat
>a  b
> 1  0  4
> 2  0 12
> 3  1 13
> 4  1 16
> 5  1 18
> 6  1 20
> 7  0 28
> 8  0 30
> 9  1 34
> 10 1 46
> 11 0 47
> 12 1 49
> 13 1 61
> 14 1 73
> 15 1 77
> 16 0 84
> 17 0 87
> 18 0 90
> 19 0 95
> 20 1 97
> 
> In this dataset, "b" is sorted and denotes the location for each
> number in "a". 
> So I would like to find the starting & ending locations for each
> stretch of 1s within "a", also counting the number of 1s in each
> stretch as well.
> Hope the results from the algorithm would be:
> 
> stretch   start   end   No.of.1s
> 1 13  204
> 2 34  462
> 3 49  774
> 4 97  971
> 
> I can imagine using for loops can do the job, but I feel it's not a
> clever way to do this. Is there an efficient algorithm that can do
> this fast?
> 
> Thanks for any suggestions.
> John

The basic information you need can be got using rle() ("run length
encoding"). See '?rle'. In your example:

  rle(dat$a)
  # Run Length Encoding
  #   lengths: int [1:8] 2 4 2 2 1 4 4 1
  #   values : num [1:8] 0 1 0 1 0 1 0 1
  ## Note: F -> 0, T -> 1

The following has a somewhat twisted logic at the end, and may
[[elided Yahoo spam]]

  L <- rle(dat$a)$lengths
  V <- rle(dat$a)$values
  pos <- c(1,cumsum(L))
  V1 <- c(-1,V)
  1+pos[V1==0]
  # [1]  3  9 12 20
  ## Positions in the series dat$a where each run of "T" (i.e. 1)
  ##   starts

Hoping this helps,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 06-Jan-11   Time: 22:57:44
-- XFMail --



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] algorithm help

2011-01-06 Thread array chip
Thanks very much Bill, good catch!

John





From: William Dunlap 

Cc: r-h...@stat.math.ethz.ch
Sent: Thu, January 6, 2011 3:52:47 PM
Subject: RE: [R] algorithm help

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of array chip
> Sent: Thursday, January 06, 2011 3:29 PM
> To: ted.hard...@wlandres.net
> Cc: r-h...@stat.math.ethz.ch
> Subject: Re: [R] algorithm help
> 
[[elided Yahoo spam]]
> 
> I made a routine to do this:
> 
> f.fragment<-function(a,b) {
> dat<-as.data.frame(cbind(a,b))
> 
> L <- rle(dat$a)$lengths
> V <- rle(dat$a)$values
> pos <- c(1,cumsum(L))
> V1 <- c(-1,V)
> start<-1+pos[V1==0]
> end<-pos[V1==1]
>  
> cbind(stretch=1:length(start),start=dat$b[start]
>  ,end=dat$b[end],no.of.1s=L[V==1])
> 
> }
> 
> f.fragment(dat$a,dat$b)
> 
>  stretch start end no.of.1s
> [1,]   113  204
> [2,]   234  462
> [3,]   349  774
> [4,]   497  971

You need to be more careful about the first
and last rows in the dataset.  I think yours
only works when a starts with 0 and ends with 1.

  > f.fragment(c(1,1,0,0), c(11,12,13,14))
   stretch start end no.of.1s
  [1,]   1NA  122
  > f.fragment(c(1,1,0,1), c(11,12,13,14))
   stretch start end no.of.1s
  [1,]   114  122
  [2,]   114  141
  > f.fragment(c(0,1,0,1), c(11,12,13,14))
   stretch start end no.of.1s
  [1,]   112  121
  [2,]   214  141
  > f.fragment(c(0,1,0,0), c(11,12,13,14))
   stretch start end no.of.1s
  [1,]   112  121
  [2,]   2NA  121
  > f.fragment(c(1,1,1,1), c(11,12,13,14))
   stretch end no.of.1s
  [1,]   1  144
  [2,]   0  144
  > f.fragment(c(0,0,0,0), c(11,12,13,14))
   stretch start
  [1,]   1NA

The following does better.  It keeps things as
logical vectors as long as possible, which tends
to work better when dealing with runs.
  f <- function(a, b) {
   isFirstIn1Run <- c(TRUE, a[-1] != a[-length(a)]) & a==1
   isLastIn1Run <- c(a[-1] != a[-length(a)], TRUE) & a==1
   data.frame(stretch=seq_len(sum(isFirstIn1Run)),
  start = b[isFirstIn1Run],
  end = b[isLastIn1Run],
  no.of.1s = which(isLastIn1Run) - which(isFirstIn1Run)
+ 1)
  }
  > f(c(1,1,0,0), c(11,12,13,14))
stretch start end no.of.1s
  1   111  122
  > f(c(1,1,0,1), c(11,12,13,14))
stretch start end no.of.1s
  1   111  122
  2   214  141
  > f(c(0,1,0,1), c(11,12,13,14))
stretch start end no.of.1s
  1   112  121
  2   214  141
  > f(c(0,1,0,0), c(11,12,13,14))
stretch start end no.of.1s
  1   112  121
  > f(c(1,1,1,1), c(11,12,13,14))
stretch start end no.of.1s
  1   111  144
  > f(c(0,0,0,0), c(11,12,13,14))
  [1] stretch  startend  no.of.1s
  <0 rows> (or 0-length row.names)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 


> 
> John
> 
> 
> 
> 
> ____
> From: "ted.hard...@wlandres.net" 
> 
> Cc: r-h...@stat.math.ethz.ch
> Sent: Thu, January 6, 2011 2:57:47 PM
> Subject: RE: [R] algorithm help
> 
> On 06-Jan-11 22:16:38, array chip wrote:
> > Hi, I am seeking help on designing an algorithm to identify the
> > locations of stretches of 1s in a vector of 0s and 1s. Below is
> > an simple example:
> > 
> >> 
> dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T)
> >   ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97)))
> > 
> >> dat
> >a  b
> > 1  0  4
> > 2  0 12
> > 3  1 13
> > 4  1 16
> > 5  1 18
> > 6  1 20
> > 7  0 28
> > 8  0 30
> > 9  1 34
> > 10 1 46
> > 11 0 47
> > 12 1 49
> > 13 1 61
> > 14 1 73
> > 15 1 77
> > 16 0 84
> > 17 0 87
> > 18 0 90
> > 19 0 95
> > 20 1 97
> > 
> > In this dataset, "b" is sorted and denotes the location for each
> > number in "a". 
> > So I would like to find the starting & ending locations for each
> > stretch of 1s within "a", also counting the number of 1s in each
> > stretch as well.
> > Hope the results from the algorithm would be:
> > 
> > stretch   start   end   No.of.1s
> > 1 13  204
> > 2 34  462
> > 3 49  774
> > 4

[R] effects packages for mixed model?

2011-01-16 Thread array chip
Hi, I am wondering if there is a similar effects package for mixed models, just 
like what effects package does for linear, generalized linear models? 
Specifically I am looking for a way to calculate the SAS-co-called least 
squared 
means (LS means) in mixed models (I understand there is a substantial debate on 
whether such adjusted means should be computed in the first place).

Thank you,

John


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] analysis strategy - baseline and repeated measure

2011-01-17 Thread array chip
Hi, assume that I have a repeated measure dataset with 3 time points: baseline, 
day 5 and day 10. There are 4 treatment groups (vehicle, treatment 1, treatment 
2 and treatment 3). 20 subjects per treatment group. A simple straight-forward 
way to analyze the data is to use mixed model:

model 1:
obj <- lmer(y ~ treatment * time +(time|subject)) where time is numeric with 
value 0,5 and 10.

The problem with this approach is that this model does not account for baseline 
imbalance between treatment groups. But if I want to include baseline value of 
the response variable in the model, then I think I have to exclude the baseline 
data from the rows of the dataset (so that baseline will become one variable, 
i.e. one column of the dataset, correct me if I am wrong). With this dataset 
tranformation, I only end up with 2 time points left in the dataset (day 5 and 
day 10). Then a linear term on the numeric time variable is not possible in 
lmer(). 


In this situation, what I can think of is to treatment time variable as  a 
factor (say named as "time.f"), and run the following model:

model 2:
obj<- lmer(y ~ treatment * time.f +(1|subject)) where time.f is a factor with 
value 5 and 10.

Couple of questions:

1. Should we really concern about the baseline imbalance by including baseline 
as a variable in the model? What's the advantage of doing so versus not doing 
so? 

2. If the objective of the study is to evaluate at the end of the study  (day 
10), which treatment group produces significantly difference from  the vehicle 
group, is model 2 a reasonable model to do that?
3. In general, with a repeated measures of 2 to 3 time points, is mixed models 
really necessary? In mixed-model mailing list, I realized that there is 
concerns 
about running mixed models on just a few time points. But I feel uncomfortable 
to run simple ANOVA (or ANCOVA) while completely ignore the fact the data 
arecorrelated among time points.
4. What are the better alternatives analyzing such datasets?

Thanks

John



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read SAS dataset using read.ssd()

2010-08-02 Thread array chip
Hi, I am using read.ssd() from foreign package to read some SAS datasets. I 
have 
2 types of SAS datasets, one with "sas7bdat" extension, the other with "ssd01" 
extension. I have no problem with the first dataset type, but got the following 
error message with the 2nd dataset type (with "ssd01" extension):

test<-read.ssd("C:/Documents and Settings/Desktop","test",sascmd="C:/Program 
Files/SAS/SASFoundation/9.2/sas.exe")

SAS failed.  SAS program at 
C:\DOCUME~1\LOCALS~1\Temp\Rtmp08209e\file65b5504.sas 

The log file will be file65b5504.log in the current directory
Warning message:
In read.ssd("C:/Documents and Settings/Desktop", "test", sascmd = "C:/Program 
Files/SAS/SASFoundation/9.2/sas.exe") :
  SAS return code was 2


Attached please find the log file. Thanks for any suggestions.

John



  __
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read SAS dataset using read.ssd()

2010-08-02 Thread array chip
Looks like the log file is not appropriately attached. Here it is again. Thanks 
for any suggestions.

John



- Original Message 
From: array chip 
To: r-help@r-project.org
Sent: Mon, August 2, 2010 2:18:32 PM
Subject: [R] read SAS dataset using read.ssd()

Hi, I am using read.ssd() from foreign package to read some SAS datasets. I 
have 

2 types of SAS datasets, one with "sas7bdat" extension, the other with "ssd01" 
extension. I have no problem with the first dataset type, but got the following 
error message with the 2nd dataset type (with "ssd01" extension):

test<-read.ssd("C:/Documents and Settings/Desktop","test",sascmd="C:/Program 
Files/SAS/SASFoundation/9.2/sas.exe")

SAS failed.  SAS program at 
C:\DOCUME~1\LOCALS~1\Temp\Rtmp08209e\file65b5504.sas 


The log file will be file65b5504.log in the current directory
Warning message:
In read.ssd("C:/Documents and Settings/Desktop", "test", sascmd = "C:/Program 
Files/SAS/SASFoundation/9.2/sas.exe") :
  SAS return code was 2


Attached please find the log file. Thanks for any suggestions.

John


  1  The SAS System   
14:10 Monday, August 2, 2010

NOTE: Unable to open SASUSER.REGSTRY. WORK.REGSTRY will be opened instead.
NOTE: All registry changes will be lost at the end of the session.

WARNING: Unable to copy SASUSER registry to WORK registry. Because of this, you 
will not see registry customizations during this 
 session.
NOTE: Unable to open SASUSER.PROFILE. WORK.PROFILE will be opened instead.
NOTE: All profile changes will be lost at the end of the session.
NOTE: Copyright (c) 2002-2008 by SAS Institute Inc., Cary, NC, USA. 
NOTE: This session is executing on the XP_PRO  platform.



NOTE: SAS initialization used:
  real time   0.98 seconds
  cpu time0.35 seconds
  
1  option validvarname = v6;libname src2rd 'C:/Documents and 
Settings/Desktop';
NOTE: Libref SRC2RD was successfully assigned as follows: 
  Engine:V9 
  Physical Name: C:\Documents and Settings\Desktop
2  libname rd xport 'C:\DOCUME~1\LOCALS~1\Temp\Rtmp08209e\file25af6ba1';
NOTE: Libref RD was successfully assigned as follows: 
  Engine:XPORT 
  Physical Name: C:\DOCUME~1\LOCALS~1\Temp\Rtmp08209e\file25af6ba1
3  proc copy in=src2rd out=rd;
4  select test ;
ERROR: The file SRC2RD.TEST (memtype=ALL) was not found, but appears on a 
SELECT statement.
WARNING: Input library SRC2RD is empty.
NOTE: Statements not processed because of errors noted above.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE COPY used (Total process time):
  real time   0.06 seconds
  cpu time0.04 seconds
  

ERROR: Errors printed on page 1.

NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
  real time   1.06 seconds
  cpu time0.40 seconds
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] xlsx package

2010-08-03 Thread array chip
HI, I was trying to install xlsx package for reading in Excel 2007 files. The 
installation went smoothly. But when I tried to load the library, I got the 
following error message:

> library(xlsx)
Loading required package: xlsxjars
Loading required package: rJava
Error : .onLoad failed in loadNamespace() for 'xlsxjars', details:
  call: .jinit()
  error: cannot obtain Class.getSimpleName method ID
Error: package 'xlsxjars' could not be loaded


The help page of the xlsx package has a note "NOTE: You may need to add to your 
PATH variable the location of your JVM  (e.g. C:/Program 
Files/Java/jre6/bin/client) to get rJava working.  "

I did add that to the PATH variable using:

> Sys.setenv(PATH=paste(Sys.getenv("PATH"),"C:\\Program 
>Files\\Java\\jre6\\bin\\client",collapse=';'))

But the error still persists. Any suggestions?

Thanks

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   >