2008/7/25 Frank E Harrell Jr <[EMAIL PROTECTED]> > David Hajage wrote: > >> Hello R users, >> >> I have several problems with the functions summary.formula and latex in >> package Hmisc. Here an example : >> >> #### >> library(Hmisc) >> sex <- factor(sample(c("m","f", "?"), 50, rep=TRUE)) >> age <- rnorm(50, 50, 5) >> treatment <- factor(sample(c("Drug","Placebo"), 50, rep=TRUE)) >> symp <- c('Headache','Stomach Ache','Hangnail', >> 'Muscle Ache','Depressed') >> symptom1 <- sample(symp, 50,TRUE) >> symptom2 <- sample(symp, 50,TRUE) >> symptom3 <- sample(symp, 50,TRUE) >> Symptoms <- mChoice(symptom1, symptom2, symptom3, label='Primary >> Symptoms') >> >> f <- summary(treatment ~ age + sex + Symptoms, method="reverse", >> test=TRUE) >> print(f, digits = 5, pdig = 2, pctdig = 3, eps = 0.5, prmsd = F) >> print(f, digits = 5, pdig = 2, pctdig = 3, eps = 0.5, prmsd = T) >> latex(f, long = T, pdig = 2, pctdig = 3, eps = 0.5, prmsd = F, file = "") >> ### >> >> Here the problems : >> - The first print(f, ...) doesn't replace all p-value <0.5 by "P<0.5" : >> >> >> +----------------------------+-------------------------+-------------------------+-----------------------------+ >> | |Drug >> |Placebo | Test | >> | |(N=31) >> |(N=19) |Statistic | >> >> +----------------------------+-------------------------+-------------------------+-----------------------------+ >> |age | 45.926/48.750/54.019| >> 47.344/50.728/53.696| F=0.9 d.f.=1,48 P<0.5 | >> >> +----------------------------+-------------------------+-------------------------+-----------------------------+ >> |sex : ? | 25.806% ( 8) | 26.316% ( >> 5) | Chi-square=0.3 d.f.=2 P=0.86| >> >> +----------------------------+-------------------------+-------------------------+-----------------------------+ >> | f | 38.710% (12) | 31.579% ( >> 6) | | >> >> +----------------------------+-------------------------+-------------------------+-----------------------------+ >> | m | 35.484% (11) | 42.105% ( >> 8) | | >> >> +----------------------------+-------------------------+-------------------------+-----------------------------+ >> |Primary Symptoms : Depressed| 41.935% (13) | 63.158% >> (12) | Chi-square=2.12 d.f.=1 P<0.5| >> >> +----------------------------+-------------------------+-------------------------+-----------------------------+ >> | Hangnail | 48.387% (15) | 42.105% ( >> 8) |Chi-square=0.19 d.f.=1 P=0.67| >> >> +----------------------------+-------------------------+-------------------------+-----------------------------+ >> | Stomach Ache | 45.161% (14) | 68.421% >> (13) | Chi-square=2.57 d.f.=1 P<0.5| >> >> +----------------------------+-------------------------+-------------------------+-----------------------------+ >> | Muscle Ache | 54.839% (17) | 26.316% ( >> 5) | Chi-square=3.89 d.f.=1 P<0.5| >> >> +----------------------------+-------------------------+-------------------------+-----------------------------+ >> | Headache | 51.613% (16) | 36.842% ( >> 7) | Chi-square=1.03 d.f.=1 P<0.5| >> >> +----------------------------+-------------------------+-------------------------+-----------------------------+ >> > > I did not see an instance of P < 0.5 that was not replaced by "P<0.5". This > is an unusual cutoff, and you are printing many more digits of precision > than offered by the data.
For example 'Hangnail', p-value is printed as 'P=0.67' instead of 'P<0.5'. I know this is an unusual cutoff, I don't remember why I chose that for this example... > > >> - The second print(f, ..., prmsd = T) has the same problem for p-value. >> There is also 4 decimals in the first line instead of 3 : >> > > Try modifying the digits argument. I tryed to modify the argument digit (with digit = 3, digit = 1 and digit = 10), but there is always 4 decimals in the first line. > > > >> >> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ >> | |Drug >> |Placebo | >> Test | >> | |(N=31) >> |(N=19) >> |Statistic | >> >> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ >> |age |45.9263/48.7501/54.0194 49.1817+/- >> 5.2751|47.3436/50.7276/53.6963 51.1313+/- 5.5868| F=0.9 >> d.f.=1,48 >> P<0.5 | >> >> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ >> |sex : ? | 25.806% ( 8) >> | 26.316% ( 5) | Chi-square=0.3 d.f.=2 >> P=0.86 | >> >> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ >> | f | 38.710% (12) >> | 31.579% ( 6) >> | | >> >> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ >> | m | 35.484% (11) >> | 42.105% ( 8) >> | | >> >> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ >> |Primary Symptoms : Depressed| 41.935% (13) >> | 63.158% (12) | Chi-square=2.12 d.f.=1 >> P<0.5 | >> >> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ >> | Hangnail | 48.387% (15) >> | 42.105% ( 8) | Chi-square=0.19 d.f.=1 >> P=0.67| >> >> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ >> | Stomach Ache | 45.161% (14) >> | 68.421% (13) | Chi-square=2.57 d.f.=1 >> P<0.5 | >> >> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ >> | Muscle Ache | 54.839% (17) >> | 26.316% ( 5) | Chi-square=3.89 d.f.=1 >> P<0.5 | >> >> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ >> | Headache | 51.613% (16) >> | 36.842% ( 7) | Chi-square=1.03 d.f.=1 >> P<0.5 | >> >> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ >> >> - In the latex(f, ...), only the first-line p-value is replaced by "< >> 0.5", >> and the quantiles for age have too much decimal : >> > > This is a bug. An override source is below, to use until the next release. Thank you, I will try it very soon. > > > >> % latex.default(cstats, title = title, caption = caption, rowlabel = >> rowlabel, col.just = col.just, numeric.dollar = FALSE, insert.bottom >> = >> legend, rowname = lab, dcolumn = dcolumn, extracolheads = >> extracolheads, extracolsize = Nsize, ...) >> % >> \begin{table}[!tbp] >> \caption{Descriptive Statistics by treatment\label{f}} >> \begin{center} >> \begin{tabular}{lccc}\hline\hline >> \multicolumn{1}{l}{}& >> \multicolumn{1}{c}{Drug}& >> \multicolumn{1}{c}{Placebo}& >> \multicolumn{1}{c}{Test Statistic} >> \\ &\multicolumn{1}{c}{{\scriptsize >> $N=31$}}&\multicolumn{1}{c}{{\scriptsize $N=19$}}&\\ \hline >> age&{\scriptsize 45.92627049~}{48.75014585 }{\scriptsize 54.01942542} >> &{\scriptsize 47.34358486~}{50.72757132 }{\scriptsize 53.69634575} &$ >> F_{1,48}=0.9 ,~ P<0.5 ^{1} $\\ >> sex&&&$ \chi^{2}_{2}=0.3 ,~ P=0.859 ^{2} $\\ >> ~~~~?&25.806\%~{\scriptsize~(~8)}&26.316\%~{\scriptsize~(~5)}&\\ >> ~~~~f&38.710\%~{\scriptsize~(12)}&31.579\%~{\scriptsize~(~6)}&\\ >> ~~~~m&35.484\%~{\scriptsize~(11)}&42.105\%~{\scriptsize~(~8)}&\\ >> Primary~Symptoms&&&\\ >> ~~~~Depressed&41.935\%~{\scriptsize~(13)}&63.158\%~{\scriptsize~(12)}&$ >> \chi^{2}_{1}=2.12 ,~ P=0.145 ^{2} $\\ >> ~~~~Hangnail&48.387\%~{\scriptsize~(15)}&42.105\%~{\scriptsize~(~8)}&$ >> \chi^{2}_{1}=0.19 ,~ P=0.665 ^{2} $\\ >> ~~~~Stomach~Ache&45.161\%~{\scriptsize~(14)}&68.421\%~{\scriptsize~(13)}&$ >> \chi^{2}_{1}=2.57 ,~ P=0.109 ^{2} $\\ >> ~~~~Muscle~Ache&54.839\%~{\scriptsize~(17)}&26.316\%~{\scriptsize~(~5)}&$ >> \chi^{2}_{1}=3.89 ,~ P=0.049 ^{2} $\\ >> ~~~~Headache&51.613\%~{\scriptsize~(16)}&36.842\%~{\scriptsize~(~7)}&$ >> \chi^{2}_{1}=1.03 ,~ P=0.309 ^{2} $\\ >> \hline >> \end{tabular} >> >> \end{center} >> >> \noindent {\scriptsize $a$\ }{$b$\ }{\scriptsize $c$\ } represent the >> lower >> quartile $a$, the median $b$, and the upper quartile $c$\ for continuous >> variables.\\ >> Numbers after percents are frequencies.\\ >> \indent Tests used: $^{1}$Wilcoxon test; $^{2}$Pearson test >> \end{table} >> >> I would like to understand : >> - why does option 'pctdig' affect both percentages and quantiles in the >> 'print' command (pctdig : 'number of digits to the right of the decimal >> place for *printing percentages*. The default is zero, so percents will be >> rounded to the nearest percent'); >> - how can I change the number of decimal for continuous variables in the >> 'latex' command ? >> > > I hope to address these 2 questions soon. Thank you very much. > > > Frank > > - why doesn't option 'eps' affect all p-value ? >> >> I've just discover these wonderfull functions. Thank you to Frank Harrell >> for this package. >> >> DH >> >> > latex.summary.formula.reverse <- > function(object, title=first.word(deparse(substitute(object))), > digits, prn = any(n!=N), pctdig=0, > npct=c('numerator','both','denominator','none'), > npct.size='scriptsize', Nsize='scriptsize', > exclude1=TRUE, vnames=c("labels","names"), prUnits=TRUE, > middle.bold=FALSE, outer.size="scriptsize", > caption, rowlabel="", > insert.bottom=TRUE, dcolumn=FALSE, formatArgs=NULL, round=NULL, > prtest=c('P','stat','df','name'), prmsd=FALSE, msdsize=NULL, > long=dotchart, pdig=3, eps=.001, auxCol=NULL, dotchart=FALSE, > ...) > { > x <- object > npct <- match.arg(npct) > vnames <- match.arg(vnames) > if(is.logical(prtest) && !prtest) > prtest <- 'none' > > stats <- x$stats > nv <- length(stats) > cstats <- lab <- character(0) > nn <- integer(0) > type <- x$type > n <- x$n > N <- x$N > nams <- names(stats) > labels <- x$labels > Units <- x$units > nw <- if(lg <- length(x$group.freq)) lg > else 1 #23Nov98 > gnames <- names(x$group.freq) > test <- x$testresults > if(!length(test)) > prtest <- 'none' > > gt1.test <- > if(all(prtest=='none')) > FALSE > else > length(unique(sapply(test,function(a)a$testname))) > 1 > > if(!missing(digits)) { #.Options$digits <- digits 6Aug00 > oldopt <- options(digits=digits) > on.exit(options(oldopt)) > } > > if(missing(caption)) > caption <- paste("Descriptive Statistics", > if(length(x$group.label)) > paste(" by",x$group.label) > else > paste(" $(N=",x$N,")$",sep=""), sep="") > > bld <- if(middle.bold) '\\bf ' > else '' > > cstats <- NULL > testUsed <- auxc <- character(0) > > for(i in 1:nv) { > if(length(auxCol)) > auxc <- c(auxc, auxCol[[1]][i]) > > nn <- c(nn, n[i]) ## 12aug02 > nam <- if(vnames=="names") nams[i] > else labels[i] > > if(prUnits && nchar(Units[i]) > 0) > nam <- paste(nam, '~\\hfill\\tiny{',translate(Units[i],'*',' > '),'}',sep='') > > tr <- if(length(test) && all(prtest!='none')) test[[nams[i]]] > else NULL > > if(length(test) && all(prtest!='none')) > testUsed <- unique(c(testUsed, tr$testname)) > > if(type[i]==1 || type[i]==3) { > cs <- formatCats(stats[[i]], nam, tr, type[i], > if(length(x$group.freq)) x$group.freq else x$n[i], > npct, pctdig, exclude1, long, prtest, > latex=TRUE, testUsed=testUsed, > npct.size=npct.size, > pdig=pdig, eps=eps, > footnoteTest=gt1.test, dotchart=dotchart) > nn <- c(nn, rep(NA, nrow(cs)-1)) > } else cs <- formatCons(stats[[i]], nam, tr, x$group.freq, prmsd, > prtest=prtest, formatArgs=formatArgs, > round=round, > latex=TRUE, testUsed=testUsed, > middle.bold=middle.bold, > outer.size=outer.size, msdsize=msdsize, > pdig=pdig, eps=eps, footnoteTest=gt1.test) > > cstats <- rbind(cstats, cs) > if(length(auxc) && nrow(cstats) > 1) > auxc <- c(auxc, rep(NA, nrow(cs)-1)) > } > > lab <- dimnames(cstats)[[1]] > gl <- names(x$group.freq) > ##gl <- if(length(gl)) paste(gl, " $(N=",x$group.freq,")$",sep="") else " > " > ## Thanks: Eran Bellin <[EMAIL PROTECTED]> 3Aug01 > if(!length(gl)) > gl <- " " > > lab <- sedit(lab,c(" ","&"),c("~","\\&")) #was format(lab) 21Jan99 > lab <- latexTranslate(lab, greek=.R.) > gl <- latexTranslate(gl, greek=.R.) > ## if(any(gl != " ")) gl <- paste(gl, " $(N=",x$group.freq,")$",sep="") # > 3Aug01 > ## Added any( ) 26Mar02 21jan03 > extracolheads <- > if(any(gl != " ")) > c(if(prn)'', paste('$N=',x$group.freq,'$',sep='')) > else NULL # 21jan03 > > if(length(test) && !all(prtest=='none')) { > gl <- c(gl, > if(length(prtest)==1 && prtest!='stat') > if(prtest=='P') 'P-value' > else prtest > else 'Test Statistic') > > if(length(extracolheads)) extracolheads <- c(extracolheads,'') # 21jan03 > } > > dimnames(cstats) <- list(NULL,gl) > ## was dimnames(cstats) <- list(lab, gl) 12aug02 > cstats <- data.frame(cstats, check.names=FALSE) > > ## Added row.names=lab below 10jul02 - S+ was dropping dimnames[[1]] > ##attr(cstats,'row.names') <- lab 12aug02 > col.just <- rep("c",length(gl)) > if(dcolumn && all(prtest!='none') && > gl[length(gl)] %in% c('P-value','Test Statistic')) > col.just[length(col.just)] <- '.' > > if(prn) { > cstats <- data.frame(N=nn, cstats, check.names=FALSE) > col.just <- c("r",col.just) > } > > if(!insert.bottom) > legend <- NULL > else { > legend <- paste(if(any(type==2)) { > paste("\\noindent {\\",outer.size," $a$\\ > }{",bld,"$b$\\ }{\\", > outer.size," $c$\\ } represent the lower > quartile $a$, the median $b$, and the upper quartile $c$\\ for continuous > variables.", > if(prmsd) '~~$x\\pm s$ represents $\\bar{X}\\pm > 1$ SD.' > else '', > '\\\\', sep="") > } else NULL, > if(prn) '$N$\\ is the number of non--missing > values.\\\\', > if(any(type==1) && npct=='numerator') > 'Numbers after percents are frequencies.\\\\', > sep="\n") > legend <- NULL > if(any(type==2)) { > legend <- paste("\\noindent {\\", outer.size, " $a$\\ }{", bld, > "$b$\\ }{\\", outer.size, > " $c$\\ } represent the lower quartile $a$, the median > $b$, and the upper quartile $c$\\ for continuous variables.", > if(prmsd) '~~$x\\pm s$ represents $\\bar{X}\\pm 1$ > SD.' > else '', > '\\\\\n', sep="") > } > > if(prn) { > legend <- paste(legend, > '$N$\\ is the number of non--missing values.\\\\\n', > sep='') > } > > if(any(type==1) && npct=='numerator') { > legend <- paste(legend, > 'Numbers after percents are frequencies.\\\\\n', > sep='') > } > > if(length(testUsed)) > legend <-paste(legend, > if(length(testUsed)==1)'\\noindent Test used:' > else '\\indent Tests used:', > if(length(testUsed)==1) paste(testUsed,'test') > else > paste(paste('$^{',1:length(testUsed),'}$',testUsed, > ' test',sep=''),collapse='; ')) > > ## added rowname=lab 12aug02 added '\n\n' 4mar03 for ctable=T > } > > if(length(auxc)) { > if(length(auxc) != nrow(cstats)) > stop(paste('length of auxCol (',length(auxCol[[1]]), > ') is not equal to number or variables in table (', > nv,').', sep='')) > auxcc <- format(auxc) > auxcc[is.na(auxc)] <- '' > cstats <- cbind(auxcc, cstats) > nax <- names(auxCol) > heads <- get2rowHeads(nax) > names(cstats)[1] <- heads[[1]] > if(length(col.just)) col.just <- c('r', col.just) > if(length(extracolheads)) extracolheads <- c(heads[2], extracolheads) > } > resp <- latex.default(cstats, title=title, caption=caption, > rowlabel=rowlabel, > col.just=col.just, numeric.dollar=FALSE, > insert.bottom=legend, rowname=lab, dcolumn=dcolumn, > extracolheads=extracolheads, extracolsize=Nsize, > ...) > > if(dotchart) > resp$style <- unique(c(resp$style, 'calc', 'epic', 'color')) > > resp > } > > > -- > Frank E Harrell Jr Professor and Chair School of Medicine > Department of Biostatistics Vanderbilt University > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.