[Rd] Type: Frontend ignored in 'Bundle' packages

2008-07-11 Thread Dirk Eddelbuettel

A few weeks ago I started a thread expressing the desire to expand 'R CMD
INSTALL' to also provide a hook for a post-build install step. A Gmane-base
view is here
http://thread.gmane.org/gmane.comp.lang.r.devel/16827
but it misses at least one further 'in support' message by Greg Warnes.

There has be no follow-up from R Core.

Now, I tried to code around this limitation by 
 - switching to 'Bundle'-style packaging
 - using the 'Type: Frontend' in one part that the other part would depend upon
 - and using a normal package structure in another

But 'R CMD check' fails to recognise 'Type: Frontend' in bundle packages, and
tries to treat it as a normal package, and fails. So I cannot successfully run
'R CMD check' on the whole, and hence could not upload to CRAN.

Is this something that should get fixed in the R tools?  Shall I look into a
patch? 

That said, I would still like to see a 'make install'-alike functionality in
the R CMD INSTALL process.  Could this be revisited?

Regards, Dirk

-- 
Three out of two people have difficulties with fractions.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] matching predictors and dummies

2008-07-11 Thread Jeroen Ooms

I am trying to make a little web interface for the lm() function. It
calculates both anova F-tests and parameters and returns it in a nice table.
However, I have a problem with matching the Anova predictors with the
regression coefficients: For numeric predictors there is no problem: the
coefficients have the same names as the predictors. However, when a factor
IV is specified, lm() automatically converts this factor to dummy variables,
which (of course) have different names than the orriginal predictor. The lm
model that is returned contains a seperate parameter for every dummy
variable.

Then when you use anova(lm.model) the function seems to know which of the
parameters are dummies of one and the same factor, and takes these together
in the anova-test. The anova() function returns the variance explained by
the orriginal factor, which are all dummies. It does not show the seperate
dummy variables anymore. Of course, this is exactly what you want in an
analysis of variance.

My question is: where in the lm or glm object is stored which of the
parameters are dummies of the same factor? The only thing i could think of
was using lm.model$xlevels, however manipulating these names in the lm-model
did not confuse anova() at all, so I guess there is a better way.

An additional question is if it is possible to specify the names of the
dummy variables that lm/glm creates when a factor is specified as IV? 
-- 
View this message in context: 
http://www.nabble.com/matching-predictors-and-dummies-tp18405023p18405023.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion: 20% speed up of which() with two-character mod

2008-07-11 Thread Charles C. Berry

On Thu, 10 Jul 2008, Henrik Bengtsson wrote:


Hi,

by replacing 'll' with 'wh' in the source code for base::which() one
gets ~20% speed up for *named logical vectors*.



The amount of speedup depends on how sparse the TRUE values are.

When the proportion of TRUEs gets small the speedup is more than twofold 
on my macbook. For high proportions of TRUE, the speedup is more like the 
20% you cite.


HTH,

Chuck



CURRENT CODE:

which <- function(x, arr.ind = FALSE)
{
   if(!is.logical(x))
stop("argument to 'which' is not logical")
   wh <- seq_along(x)[ll <- x & !is.na(x)]
   m <- length(wh)
   dl <- dim(x)
   if (is.null(dl) || !arr.ind) {
   names(wh) <- names(x)[ll]
   }
   ...
   wh;
}

SUGGESTED CODE: (Remove 'll' and use 'wh')

which2 <- function(x, arr.ind = FALSE)
{
   if(!is.logical(x))
stop("argument to 'which' is not logical")
   wh <- seq_along(x)[x & !is.na(x)]
   m <- length(wh)
   dl <- dim(x)
   if (is.null(dl) || !arr.ind) {
   names(wh) <- names(x)[wh]
   }
   ...
   wh;
}

That's all.

BENCHMARKING:

# To measure both in same environment
which1 <- base::which;
environment(which1) <- globalenv();  # Needed?

N <- 1e6;
set.seed(0xbeef);
x <- sample(c(TRUE, FALSE), size=N, replace=TRUE);
names(x) <- seq_along(x);
B <- 10;
t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
stopifnot(identical(idxs1, idxs2));
print(t1/t2);
# Fair benchmarking
t2 <- system.time({ for (bb in 1:B) idxs2 <- which2(x); });
t1 <- system.time({ for (bb in 1:B) idxs1 <- which1(x); });
print(t1/t2);
##  usersystem   elapsed
##   1.283186   1.052632   1.25

You get similar results if you put for loop outside the system.time()
call (and sum up the timings).

Cheers

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] matching predictors and dummies

2008-07-11 Thread Charles C. Berry

On Fri, 11 Jul 2008, Jeroen Ooms wrote:



I am trying to make a little web interface for the lm() function. It
calculates both anova F-tests and parameters and returns it in a nice table.
However, I have a problem with matching the Anova predictors with the
regression coefficients: For numeric predictors there is no problem: the
coefficients have the same names as the predictors. However, when a factor
IV is specified, lm() automatically converts this factor to dummy variables,
which (of course) have different names than the orriginal predictor. The lm
model that is returned contains a seperate parameter for every dummy
variable.

Then when you use anova(lm.model) the function seems to know which of the
parameters are dummies of one and the same factor, and takes these together
in the anova-test. The anova() function returns the variance explained by
the orriginal factor, which are all dummies. It does not show the seperate
dummy variables anymore. Of course, this is exactly what you want in an
analysis of variance.

My question is: where in the lm or glm object is stored which of the
parameters are dummies of the same factor? The only thing i could think of
was using lm.model$xlevels, however manipulating these names in the lm-model
did not confuse anova() at all, so I guess there is a better way.


See
?terms
?terms.object

and run

example( terms.object )

or something like

terms( lm( Ozone ~ Temp + factor(Month), airquality ) )



An additional question is if it is possible to specify the names of the
dummy variables that lm/glm creates when a factor is specified as IV?
--


I'm guessing a custom contrast function would do this. Have a look at

?contrasts
page( contr.treatment, 'print' )

Or just hack the names attribute of the relevant pieces in the object 
returned by lm/glm.


You do know to use str(), right?


HTH,

Chuck



View this message in context: 
http://www.nabble.com/matching-predictors-and-dummies-tp18405023p18405023.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel