Re: [R] Package Hmisc, functions summary.formula() and latex(), options pdig, pctdig, eps and prmsd

David Hajage Fri, 25 Jul 2008 05:55:06 -0700

2008/7/25 Frank E Harrell Jr <[EMAIL PROTECTED]>

> David Hajage wrote:
>
>> Hello R users,
>>
>> I have several problems with the functions summary.formula and latex in
>> package Hmisc. Here an example :
>>
>> ####
>> library(Hmisc)
>> sex <- factor(sample(c("m","f", "?"), 50, rep=TRUE))
>> age <- rnorm(50, 50, 5)
>> treatment <- factor(sample(c("Drug","Placebo"), 50, rep=TRUE))
>> symp <- c('Headache','Stomach Ache','Hangnail',
>>          'Muscle Ache','Depressed')
>> symptom1 <- sample(symp, 50,TRUE)
>> symptom2 <- sample(symp, 50,TRUE)
>> symptom3 <- sample(symp, 50,TRUE)
>> Symptoms <- mChoice(symptom1, symptom2, symptom3, label='Primary
>> Symptoms')
>>
>> f <- summary(treatment ~ age + sex + Symptoms, method="reverse",
>> test=TRUE)
>> print(f, digits = 5, pdig = 2, pctdig = 3, eps = 0.5, prmsd = F)
>> print(f, digits = 5, pdig = 2, pctdig = 3, eps = 0.5, prmsd = T)
>> latex(f, long = T, pdig = 2, pctdig = 3, eps = 0.5, prmsd = F, file = "")
>> ###
>>
>> Here the problems :
>>  - The first print(f, ...) doesn't replace all p-value <0.5 by "P<0.5" :
>>
>>
>> +----------------------------+-------------------------+-------------------------+-----------------------------+
>> |                            |Drug
>> |Placebo                  |  Test                       |
>> |                            |(N=31)
>> |(N=19)                   |Statistic                    |
>>
>> +----------------------------+-------------------------+-------------------------+-----------------------------+
>> |age                         |     45.926/48.750/54.019|
>> 47.344/50.728/53.696|    F=0.9 d.f.=1,48 P<0.5    |
>>
>> +----------------------------+-------------------------+-------------------------+-----------------------------+
>> |sex : ?                     |         25.806% ( 8)    |         26.316% (
>> 5)    | Chi-square=0.3 d.f.=2 P=0.86|
>>
>> +----------------------------+-------------------------+-------------------------+-----------------------------+
>> |    f                       |         38.710% (12)    |         31.579% (
>> 6)    |                             |
>>
>> +----------------------------+-------------------------+-------------------------+-----------------------------+
>> |    m                       |         35.484% (11)    |         42.105% (
>> 8)    |                             |
>>
>> +----------------------------+-------------------------+-------------------------+-----------------------------+
>> |Primary Symptoms : Depressed|         41.935% (13)    |         63.158%
>> (12)    | Chi-square=2.12 d.f.=1 P<0.5|
>>
>> +----------------------------+-------------------------+-------------------------+-----------------------------+
>> |    Hangnail                |         48.387% (15)    |         42.105% (
>> 8)    |Chi-square=0.19 d.f.=1 P=0.67|
>>
>> +----------------------------+-------------------------+-------------------------+-----------------------------+
>> |    Stomach Ache            |         45.161% (14)    |         68.421%
>> (13)    | Chi-square=2.57 d.f.=1 P<0.5|
>>
>> +----------------------------+-------------------------+-------------------------+-----------------------------+
>> |    Muscle Ache             |         54.839% (17)    |         26.316% (
>> 5)    | Chi-square=3.89 d.f.=1 P<0.5|
>>
>> +----------------------------+-------------------------+-------------------------+-----------------------------+
>> |    Headache                |         51.613% (16)    |         36.842% (
>> 7)    | Chi-square=1.03 d.f.=1 P<0.5|
>>
>> +----------------------------+-------------------------+-------------------------+-----------------------------+
>>
>
> I did not see an instance of P < 0.5 that was not replaced by "P<0.5". This
> is an unusual cutoff, and you are printing many more digits of precision
> than offered by the data.



For example 'Hangnail', p-value is printed as 'P=0.67' instead of 'P<0.5'. I
know this is an unusual cutoff, I don't remember why I chose that for this
example...


>
>
>>  - The second print(f, ..., prmsd = T) has the same problem for p-value.
>> There is also 4 decimals in the first line instead of 3 :
>>
>
> Try modifying the digits argument.

I tryed to modify the argument digit (with digit = 3, digit = 1 and digit =
10), but there is always 4 decimals in the first line.


>
>
>
>>
>> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
>> |                            |Drug
>> |Placebo                                   |
>> Test                              |
>> |                            |(N=31)
>> |(N=19)
>> |Statistic                           |
>>
>> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
>> |age                         |45.9263/48.7501/54.0194  49.1817+/-
>> 5.2751|47.3436/50.7276/53.6963  51.1313+/- 5.5868|           F=0.9
>> d.f.=1,48
>> P<0.5    |
>>
>> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
>> |sex : ?                     |               25.806% ( 8)
>> |               26.316% ( 5)               |       Chi-square=0.3 d.f.=2
>> P=0.86 |
>>
>> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
>> |    f                       |               38.710% (12)
>> |               31.579% ( 6)
>> |                                    |
>>
>> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
>> |    m                       |               35.484% (11)
>> |               42.105% ( 8)
>> |                                    |
>>
>> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
>> |Primary Symptoms : Depressed|               41.935% (13)
>> |               63.158% (12)               |       Chi-square=2.12 d.f.=1
>> P<0.5 |
>>
>> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
>> |    Hangnail                |               48.387% (15)
>> |               42.105% ( 8)               |       Chi-square=0.19 d.f.=1
>> P=0.67|
>>
>> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
>> |    Stomach Ache            |               45.161% (14)
>> |               68.421% (13)               |       Chi-square=2.57 d.f.=1
>> P<0.5 |
>>
>> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
>> |    Muscle Ache             |               54.839% (17)
>> |               26.316% ( 5)               |       Chi-square=3.89 d.f.=1
>> P<0.5 |
>>
>> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
>> |    Headache                |               51.613% (16)
>> |               36.842% ( 7)               |       Chi-square=1.03 d.f.=1
>> P<0.5 |
>>
>> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
>>
>>  - In the latex(f, ...), only the first-line p-value is replaced by "<
>> 0.5",
>> and the quantiles for age have too much decimal :
>>
>
> This is a bug.  An override source is below, to use until the next release.

Thank you, I will try it very soon.

>
>
>
>> % latex.default(cstats, title = title, caption = caption, rowlabel =
>> rowlabel,      col.just = col.just, numeric.dollar = FALSE, insert.bottom
>> =
>> legend,      rowname = lab, dcolumn = dcolumn, extracolheads =
>> extracolheads,      extracolsize = Nsize, ...)
>> %
>> \begin{table}[!tbp]
>>  \caption{Descriptive Statistics by treatment\label{f}}
>>  \begin{center}
>>  \begin{tabular}{lccc}\hline\hline
>> \multicolumn{1}{l}{}&
>> \multicolumn{1}{c}{Drug}&
>> \multicolumn{1}{c}{Placebo}&
>> \multicolumn{1}{c}{Test Statistic}
>> \\   &\multicolumn{1}{c}{{\scriptsize
>> $N=31$}}&\multicolumn{1}{c}{{\scriptsize $N=19$}}&\\ \hline
>> age&{\scriptsize 45.92627049~}{48.75014585 }{\scriptsize 54.01942542}
>> &{\scriptsize 47.34358486~}{50.72757132 }{\scriptsize 53.69634575} &$
>> F_{1,48}=0.9 ,~ P<0.5 ^{1} $\\
>> sex&&&$ \chi^{2}_{2}=0.3 ,~ P=0.859 ^{2} $\\
>> ~~~~?&25.806\%~{\scriptsize~(~8)}&26.316\%~{\scriptsize~(~5)}&\\
>> ~~~~f&38.710\%~{\scriptsize~(12)}&31.579\%~{\scriptsize~(~6)}&\\
>> ~~~~m&35.484\%~{\scriptsize~(11)}&42.105\%~{\scriptsize~(~8)}&\\
>> Primary~Symptoms&&&\\
>> ~~~~Depressed&41.935\%~{\scriptsize~(13)}&63.158\%~{\scriptsize~(12)}&$
>> \chi^{2}_{1}=2.12 ,~ P=0.145 ^{2} $\\
>> ~~~~Hangnail&48.387\%~{\scriptsize~(15)}&42.105\%~{\scriptsize~(~8)}&$
>> \chi^{2}_{1}=0.19 ,~ P=0.665 ^{2} $\\
>> ~~~~Stomach~Ache&45.161\%~{\scriptsize~(14)}&68.421\%~{\scriptsize~(13)}&$
>> \chi^{2}_{1}=2.57 ,~ P=0.109 ^{2} $\\
>> ~~~~Muscle~Ache&54.839\%~{\scriptsize~(17)}&26.316\%~{\scriptsize~(~5)}&$
>> \chi^{2}_{1}=3.89 ,~ P=0.049 ^{2} $\\
>> ~~~~Headache&51.613\%~{\scriptsize~(16)}&36.842\%~{\scriptsize~(~7)}&$
>> \chi^{2}_{1}=1.03 ,~ P=0.309 ^{2} $\\
>> \hline
>> \end{tabular}
>>
>> \end{center}
>>
>> \noindent {\scriptsize $a$\ }{$b$\ }{\scriptsize $c$\ } represent the
>> lower
>> quartile $a$, the median $b$, and the upper quartile $c$\ for continuous
>> variables.\\
>> Numbers after percents are frequencies.\\
>>  \indent Tests used: $^{1}$Wilcoxon test; $^{2}$Pearson test
>> \end{table}
>>
>> I would like to understand :
>>  - why does option 'pctdig' affect both percentages and quantiles in the
>> 'print' command (pctdig : 'number of digits to the right of the decimal
>> place for *printing percentages*. The default is zero, so percents will be
>> rounded to the nearest percent');
>>  - how can I change the number of decimal for continuous variables in the
>> 'latex' command ?
>>
>
> I hope to address these 2 questions soon.


Thank you very much.

>
>
> Frank
>
>   - why doesn't option 'eps' affect all p-value ?
>>
>> I've just discover these wonderfull functions. Thank you to Frank Harrell
>> for this package.
>>
>> DH
>>
>>
> latex.summary.formula.reverse <-
>  function(object, title=first.word(deparse(substitute(object))),
>           digits, prn = any(n!=N), pctdig=0,
>           npct=c('numerator','both','denominator','none'),
>           npct.size='scriptsize', Nsize='scriptsize',
>           exclude1=TRUE,  vnames=c("labels","names"), prUnits=TRUE,
>           middle.bold=FALSE, outer.size="scriptsize",
>           caption, rowlabel="",
>           insert.bottom=TRUE, dcolumn=FALSE, formatArgs=NULL, round=NULL,
>           prtest=c('P','stat','df','name'), prmsd=FALSE, msdsize=NULL,
>           long=dotchart, pdig=3, eps=.001, auxCol=NULL, dotchart=FALSE,
> ...)
> {
>  x      <- object
>  npct   <- match.arg(npct)
>  vnames <- match.arg(vnames)
>  if(is.logical(prtest) && !prtest)
>    prtest <- 'none'
>
>  stats  <- x$stats
>  nv     <- length(stats)
>  cstats <- lab <- character(0)
>  nn     <- integer(0)
>  type   <- x$type
>  n      <- x$n
>  N      <- x$N
>  nams   <- names(stats)
>  labels <- x$labels
>  Units  <- x$units
>  nw     <- if(lg <- length(x$group.freq)) lg
>            else 1  #23Nov98
>  gnames <- names(x$group.freq)
>  test   <- x$testresults
>  if(!length(test))
>    prtest <- 'none'
>
>  gt1.test <-
>    if(all(prtest=='none'))
>      FALSE
>    else
>      length(unique(sapply(test,function(a)a$testname))) > 1
>
>  if(!missing(digits)) {   #.Options$digits <- digits 6Aug00
>    oldopt <- options(digits=digits)
>    on.exit(options(oldopt))
>  }
>
>  if(missing(caption))
>    caption <- paste("Descriptive Statistics",
>                     if(length(x$group.label))
>                       paste(" by",x$group.label)
>                     else
>                       paste("  $(N=",x$N,")$",sep=""), sep="")
>
>  bld <- if(middle.bold) '\\bf '
>         else ''
>
>  cstats <- NULL
>  testUsed <- auxc <- character(0)
>
>  for(i in 1:nv) {
>    if(length(auxCol))
>      auxc <- c(auxc, auxCol[[1]][i])
>
>    nn <- c(nn, n[i])   ## 12aug02
>    nam <- if(vnames=="names") nams[i]
>           else labels[i]
>
>    if(prUnits && nchar(Units[i]) > 0)
>      nam <- paste(nam, '~\\hfill\\tiny{',translate(Units[i],'*','
> '),'}',sep='')
>
>    tr  <- if(length(test) && all(prtest!='none')) test[[nams[i]]]
>           else NULL
>
>    if(length(test) && all(prtest!='none'))
>      testUsed <- unique(c(testUsed, tr$testname))
>
>    if(type[i]==1 || type[i]==3) {
>      cs <- formatCats(stats[[i]], nam, tr, type[i],
>                       if(length(x$group.freq)) x$group.freq else x$n[i],
>                       npct, pctdig, exclude1, long, prtest,
>                       latex=TRUE, testUsed=testUsed,
>                       npct.size=npct.size,
>                       pdig=pdig, eps=eps,
>                       footnoteTest=gt1.test, dotchart=dotchart)
>      nn <- c(nn, rep(NA, nrow(cs)-1))
>    } else cs <- formatCons(stats[[i]], nam, tr, x$group.freq, prmsd,
>                            prtest=prtest, formatArgs=formatArgs,
> round=round,
>                            latex=TRUE, testUsed=testUsed,
>                            middle.bold=middle.bold,
>                            outer.size=outer.size, msdsize=msdsize,
>                            pdig=pdig, eps=eps, footnoteTest=gt1.test)
>
>    cstats <- rbind(cstats, cs)
>    if(length(auxc) && nrow(cstats) > 1)
>      auxc <- c(auxc, rep(NA, nrow(cs)-1))
>  }
>
>  lab <- dimnames(cstats)[[1]]
>  gl <- names(x$group.freq)
>  ##gl <- if(length(gl)) paste(gl, " $(N=",x$group.freq,")$",sep="") else "
> "
>  ## Thanks: Eran Bellin <[EMAIL PROTECTED]>   3Aug01
>  if(!length(gl))
>    gl <- " "
>
>  lab <- sedit(lab,c(" ","&"),c("~","\\&"))  #was format(lab) 21Jan99
>  lab <- latexTranslate(lab, greek=.R.)
>  gl  <- latexTranslate(gl, greek=.R.)
>  ## if(any(gl != " ")) gl <- paste(gl, " $(N=",x$group.freq,")$",sep="") #
> 3Aug01
>  ## Added any( ) 26Mar02  21jan03
>  extracolheads <-
>    if(any(gl != " "))
>      c(if(prn)'', paste('$N=',x$group.freq,'$',sep=''))
>    else NULL # 21jan03
>
>  if(length(test) && !all(prtest=='none')) {
>    gl <- c(gl,
>            if(length(prtest)==1 && prtest!='stat')
>              if(prtest=='P') 'P-value'
>              else prtest
>            else 'Test Statistic')
>
>    if(length(extracolheads)) extracolheads <- c(extracolheads,'') # 21jan03
>  }
>
>  dimnames(cstats) <- list(NULL,gl)
>  ## was dimnames(cstats) <- list(lab, gl) 12aug02
>  cstats <- data.frame(cstats, check.names=FALSE)
>
>  ## Added row.names=lab below 10jul02 - S+ was dropping dimnames[[1]]
>  ##attr(cstats,'row.names') <- lab  12aug02
>  col.just <- rep("c",length(gl))
>  if(dcolumn && all(prtest!='none') &&
>     gl[length(gl)] %in% c('P-value','Test Statistic'))
>    col.just[length(col.just)] <- '.'
>
>  if(prn) {
>    cstats <- data.frame(N=nn, cstats, check.names=FALSE)
>    col.just <- c("r",col.just)
>  }
>
>  if(!insert.bottom)
>    legend <- NULL
>  else {
>    legend <- paste(if(any(type==2)) {
>                      paste("\\noindent {\\",outer.size," $a$\\
> }{",bld,"$b$\\ }{\\",
>                            outer.size," $c$\\ } represent the lower
> quartile $a$, the median $b$, and the upper quartile $c$\\ for continuous
> variables.",
>                            if(prmsd) '~~$x\\pm s$ represents $\\bar{X}\\pm
> 1$ SD.'
>                            else '',
>                            '\\\\', sep="")
>                    } else NULL,
>                    if(prn) '$N$\\ is the number of non--missing
> values.\\\\',
>                    if(any(type==1) && npct=='numerator')
>                      'Numbers after percents are frequencies.\\\\',
>                    sep="\n")
>    legend <- NULL
>    if(any(type==2)) {
>      legend <- paste("\\noindent {\\", outer.size, " $a$\\ }{", bld,
>                      "$b$\\ }{\\", outer.size,
>                      " $c$\\ } represent the lower quartile $a$, the median
> $b$, and the upper quartile $c$\\ for continuous variables.",
>                      if(prmsd) '~~$x\\pm s$ represents $\\bar{X}\\pm 1$
> SD.'
>                      else '',
>                      '\\\\\n', sep="")
>    }
>
>    if(prn) {
>      legend <- paste(legend,
>                      '$N$\\ is the number of non--missing values.\\\\\n',
>                      sep='')
>    }
>
>    if(any(type==1) && npct=='numerator') {
>      legend <- paste(legend,
>                      'Numbers after percents are frequencies.\\\\\n',
>                      sep='')
>    }
>
>    if(length(testUsed))
>      legend <-paste(legend,
>                     if(length(testUsed)==1)'\\noindent Test used:'
>                     else '\\indent Tests used:',
>                     if(length(testUsed)==1) paste(testUsed,'test')
>                     else
>                       paste(paste('$^{',1:length(testUsed),'}$',testUsed,
>                                   ' test',sep=''),collapse='; '))
>
>    ## added rowname=lab 12aug02  added '\n\n' 4mar03 for ctable=T
>  }
>
>  if(length(auxc)) {
>    if(length(auxc) != nrow(cstats))
>      stop(paste('length of auxCol (',length(auxCol[[1]]),
>                 ') is not equal to number or variables in table (',
>                 nv,').', sep=''))
>    auxcc <- format(auxc)
>    auxcc[is.na(auxc)] <- ''
>    cstats <- cbind(auxcc, cstats)
>    nax <- names(auxCol)
>    heads <- get2rowHeads(nax)
>    names(cstats)[1] <- heads[[1]]
>    if(length(col.just)) col.just <- c('r', col.just)
>    if(length(extracolheads)) extracolheads <- c(heads[2], extracolheads)
>  }
>  resp <- latex.default(cstats, title=title, caption=caption,
> rowlabel=rowlabel,
>                        col.just=col.just, numeric.dollar=FALSE,
>                        insert.bottom=legend,  rowname=lab, dcolumn=dcolumn,
>                        extracolheads=extracolheads, extracolsize=Nsize,
>                        ...)
>
>  if(dotchart)
>    resp$style <- unique(c(resp$style, 'calc', 'epic', 'color'))
>
>  resp
> }
>
>
> --
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                     Department of Biostatistics   Vanderbilt University
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Package Hmisc, functions summary.formula() and latex(), options pdig, pctdig, eps and prmsd

Reply via email to