Oliver Bandel wrote:
Sarah Goslee <sarah.goslee <at> gmail.com> writes:
I think we need the reproducible example requested in
the posting guide.
====================
for ( datum in names(weblog_by_date) )
{
print(datum)
selected <- weblog_by_date[[datum]]
res_size_by_host <- tapply( selected$size, selected$host, sum)
mycat <- function(a,b) cat(paste(a, "==>", b, "\n"))
mapply( mycat, selected$size, selected$host )
print( res_size_by_host )
print( "is there any NA?!")
print( any( is.na(selected$size)) )
}
====================
Why do so many people have such trouble with the word "reproducible"? We
can't reproduce that without access to weblog_by_date!
Anyways I think it is tapply that is behaving unexpectedly to you:
> x <- factor(1,levels=1:2)
> tapply(1,x,sum)
1 2
1 NA
which is kind of surprising since the sum over an empty set is usually
zero. However, that _is_ what the documentation for tapply says:
When 'FUN' is present, 'tapply' calls 'FUN' for each cell that has
any data in it. If 'FUN' returns a single atomic value for each
such cell (e.g., functions 'mean' or 'var') and when 'simplify' is
'TRUE', 'tapply' returns a multi-way array containing the values,
and 'NA' for the empty cells.
a passable workaround is
> sapply(split(1,x),sum)
1 2
1 0
At the end of the printouts, it gives me:
=======================
94.101.145.110 94.23.3.220
NA NA
[1] "is there any NA?!"
[1] FALSE
=======================
--
O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.