Re: [R] Package Hmisc, functions summary.formula() and latex(), options pdig, pctdig, eps and prmsd

Frank E Harrell Jr Fri, 25 Jul 2008 05:26:30 -0700

David Hajage wrote:

Hello R users,


I have several problems with the functions summary.formula and latex in
package Hmisc. Here an example :

####
library(Hmisc)
sex <- factor(sample(c("m","f", "?"), 50, rep=TRUE))
age <- rnorm(50, 50, 5)
treatment <- factor(sample(c("Drug","Placebo"), 50, rep=TRUE))
symp <- c('Headache','Stomach Ache','Hangnail',
          'Muscle Ache','Depressed')
symptom1 <- sample(symp, 50,TRUE)
symptom2 <- sample(symp, 50,TRUE)
symptom3 <- sample(symp, 50,TRUE)
Symptoms <- mChoice(symptom1, symptom2, symptom3, label='Primary Symptoms')

f <- summary(treatment ~ age + sex + Symptoms, method="reverse", test=TRUE)
print(f, digits = 5, pdig = 2, pctdig = 3, eps = 0.5, prmsd = F)
print(f, digits = 5, pdig = 2, pctdig = 3, eps = 0.5, prmsd = T)
latex(f, long = T, pdig = 2, pctdig = 3, eps = 0.5, prmsd = F, file = "")
###

Here the problems :
 - The first print(f, ...) doesn't replace all p-value <0.5 by "P<0.5" :

+----------------------------+-------------------------+-------------------------+-----------------------------+
|                            |Drug
|Placebo                  |  Test                       |
|                            |(N=31)
|(N=19)                   |Statistic                    |
+----------------------------+-------------------------+-------------------------+-----------------------------+
|age                         |     45.926/48.750/54.019|
47.344/50.728/53.696|    F=0.9 d.f.=1,48 P<0.5    |
+----------------------------+-------------------------+-------------------------+-----------------------------+
|sex : ?                     |         25.806% ( 8)    |         26.316% (
5)    | Chi-square=0.3 d.f.=2 P=0.86|
+----------------------------+-------------------------+-------------------------+-----------------------------+
|    f                       |         38.710% (12)    |         31.579% (
6)    |                             |
+----------------------------+-------------------------+-------------------------+-----------------------------+
|    m                       |         35.484% (11)    |         42.105% (
8)    |                             |
+----------------------------+-------------------------+-------------------------+-----------------------------+
|Primary Symptoms : Depressed|         41.935% (13)    |         63.158%
(12)    | Chi-square=2.12 d.f.=1 P<0.5|
+----------------------------+-------------------------+-------------------------+-----------------------------+
|    Hangnail                |         48.387% (15)    |         42.105% (
8)    |Chi-square=0.19 d.f.=1 P=0.67|
+----------------------------+-------------------------+-------------------------+-----------------------------+
|    Stomach Ache            |         45.161% (14)    |         68.421%
(13)    | Chi-square=2.57 d.f.=1 P<0.5|
+----------------------------+-------------------------+-------------------------+-----------------------------+
|    Muscle Ache             |         54.839% (17)    |         26.316% (
5)    | Chi-square=3.89 d.f.=1 P<0.5|
+----------------------------+-------------------------+-------------------------+-----------------------------+
|    Headache                |         51.613% (16)    |         36.842% (
7)    | Chi-square=1.03 d.f.=1 P<0.5|
+----------------------------+-------------------------+-------------------------+-----------------------------+

I did not see an instance of P < 0.5 that was not replaced by "P<0.5".This is an unusual cutoff, and you are printing many more digits ofprecision than offered by the data.


 - The second print(f, ..., prmsd = T) has the same problem for p-value.
There is also 4 decimals in the first line instead of 3 :


Try modifying the digits argument.


+----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
|                            |Drug
|Placebo                                   |
Test                              |
|                            |(N=31)
|(N=19)
|Statistic                           |
+----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
|age                         |45.9263/48.7501/54.0194  49.1817+/-
5.2751|47.3436/50.7276/53.6963  51.1313+/- 5.5868|           F=0.9 d.f.=1,48
P<0.5    |
+----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
|sex : ?                     |               25.806% ( 8)
|               26.316% ( 5)               |       Chi-square=0.3 d.f.=2
P=0.86 |
+----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
|    f                       |               38.710% (12)
|               31.579% ( 6)
|                                    |
+----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
|    m                       |               35.484% (11)
|               42.105% ( 8)
|                                    |
+----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
|Primary Symptoms : Depressed|               41.935% (13)
|               63.158% (12)               |       Chi-square=2.12 d.f.=1
P<0.5 |
+----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
|    Hangnail                |               48.387% (15)
|               42.105% ( 8)               |       Chi-square=0.19 d.f.=1
P=0.67|
+----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
|    Stomach Ache            |               45.161% (14)
|               68.421% (13)               |       Chi-square=2.57 d.f.=1
P<0.5 |
+----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
|    Muscle Ache             |               54.839% (17)
|               26.316% ( 5)               |       Chi-square=3.89 d.f.=1
P<0.5 |
+----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
|    Headache                |               51.613% (16)
|               36.842% ( 7)               |       Chi-square=1.03 d.f.=1
P<0.5 |
+----------------------------+------------------------------------------+------------------------------------------+------------------------------------+

 - In the latex(f, ...), only the first-line p-value is replaced by "< 0.5",
and the quantiles for age have too much decimal :


This is a bug.  An override source is below, to use until the next release.


% latex.default(cstats, title = title, caption = caption, rowlabel =
rowlabel,      col.just = col.just, numeric.dollar = FALSE, insert.bottom =
legend,      rowname = lab, dcolumn = dcolumn, extracolheads =
extracolheads,      extracolsize = Nsize, ...)
%
\begin{table}[!tbp]
 \caption{Descriptive Statistics by treatment\label{f}}
 \begin{center}
 \begin{tabular}{lccc}\hline\hline
\multicolumn{1}{l}{}&
\multicolumn{1}{c}{Drug}&
\multicolumn{1}{c}{Placebo}&
\multicolumn{1}{c}{Test Statistic}
\\   &\multicolumn{1}{c}{{\scriptsize
$N=31$}}&\multicolumn{1}{c}{{\scriptsize $N=19$}}&\\ \hline
age&{\scriptsize 45.92627049~}{48.75014585 }{\scriptsize 54.01942542}
&{\scriptsize 47.34358486~}{50.72757132 }{\scriptsize 53.69634575} &$
F_{1,48}=0.9 ,~ P<0.5 ^{1} $\\
sex&&&$ \chi^{2}_{2}=0.3 ,~ P=0.859 ^{2} $\\
~~~~?&25.806\%~{\scriptsize~(~8)}&26.316\%~{\scriptsize~(~5)}&\\
~~~~f&38.710\%~{\scriptsize~(12)}&31.579\%~{\scriptsize~(~6)}&\\
~~~~m&35.484\%~{\scriptsize~(11)}&42.105\%~{\scriptsize~(~8)}&\\
Primary~Symptoms&&&\\
~~~~Depressed&41.935\%~{\scriptsize~(13)}&63.158\%~{\scriptsize~(12)}&$
\chi^{2}_{1}=2.12 ,~ P=0.145 ^{2} $\\
~~~~Hangnail&48.387\%~{\scriptsize~(15)}&42.105\%~{\scriptsize~(~8)}&$
\chi^{2}_{1}=0.19 ,~ P=0.665 ^{2} $\\
~~~~Stomach~Ache&45.161\%~{\scriptsize~(14)}&68.421\%~{\scriptsize~(13)}&$
\chi^{2}_{1}=2.57 ,~ P=0.109 ^{2} $\\
~~~~Muscle~Ache&54.839\%~{\scriptsize~(17)}&26.316\%~{\scriptsize~(~5)}&$
\chi^{2}_{1}=3.89 ,~ P=0.049 ^{2} $\\
~~~~Headache&51.613\%~{\scriptsize~(16)}&36.842\%~{\scriptsize~(~7)}&$
\chi^{2}_{1}=1.03 ,~ P=0.309 ^{2} $\\
\hline
\end{tabular}

\end{center}

\noindent {\scriptsize $a$\ }{$b$\ }{\scriptsize $c$\ } represent the lower
quartile $a$, the median $b$, and the upper quartile $c$\ for continuous
variables.\\
Numbers after percents are frequencies.\\
 \indent Tests used: $^{1}$Wilcoxon test; $^{2}$Pearson test
\end{table}

I would like to understand :
 - why does option 'pctdig' affect both percentages and quantiles in the
'print' command (pctdig : 'number of digits to the right of the decimal
place for *printing percentages*. The default is zero, so percents will be
rounded to the nearest percent');
 - how can I change the number of decimal for continuous variables in the
'latex' command ?


I hope to address these 2 questions soon.

Frank

 - why doesn't option 'eps' affect all p-value ?

I've just discover these wonderfull functions. Thank you to Frank Harrell
for this package.

DH


latex.summary.formula.reverse <-
  function(object, title=first.word(deparse(substitute(object))),
           digits, prn = any(n!=N), pctdig=0,
           npct=c('numerator','both','denominator','none'),
           npct.size='scriptsize', Nsize='scriptsize',
           exclude1=TRUE,  vnames=c("labels","names"), prUnits=TRUE,
           middle.bold=FALSE, outer.size="scriptsize",
           caption, rowlabel="",
           insert.bottom=TRUE, dcolumn=FALSE, formatArgs=NULL, round=NULL,
           prtest=c('P','stat','df','name'), prmsd=FALSE, msdsize=NULL,

long=dotchart, pdig=3, eps=.001, auxCol=NULL,dotchart=FALSE, ...)

{
  x      <- object
  npct   <- match.arg(npct)
  vnames <- match.arg(vnames)
  if(is.logical(prtest) && !prtest)
    prtest <- 'none'

  stats  <- x$stats
  nv     <- length(stats)
  cstats <- lab <- character(0)
  nn     <- integer(0)
  type   <- x$type
  n      <- x$n
  N      <- x$N
  nams   <- names(stats)
  labels <- x$labels
  Units  <- x$units
  nw     <- if(lg <- length(x$group.freq)) lg
            else 1  #23Nov98
  gnames <- names(x$group.freq)
  test   <- x$testresults
  if(!length(test))
    prtest <- 'none'

  gt1.test <-
    if(all(prtest=='none'))
      FALSE
    else
      length(unique(sapply(test,function(a)a$testname))) > 1

  if(!missing(digits)) {   #.Options$digits <- digits 6Aug00
    oldopt <- options(digits=digits)
    on.exit(options(oldopt))
  }

  if(missing(caption))
    caption <- paste("Descriptive Statistics",
                     if(length(x$group.label))
                       paste(" by",x$group.label)
                     else
                       paste("  $(N=",x$N,")$",sep=""), sep="")

  bld <- if(middle.bold) '\\bf '
         else ''

  cstats <- NULL
  testUsed <- auxc <- character(0)

  for(i in 1:nv) {
    if(length(auxCol))
      auxc <- c(auxc, auxCol[[1]][i])

    nn <- c(nn, n[i])   ## 12aug02
    nam <- if(vnames=="names") nams[i]
           else labels[i]

    if(prUnits && nchar(Units[i]) > 0)

nam <- paste(nam, '~\\hfill\\tiny{',translate(Units[i],'*',''),'}',sep='')


    tr  <- if(length(test) && all(prtest!='none')) test[[nams[i]]]
           else NULL

    if(length(test) && all(prtest!='none'))
      testUsed <- unique(c(testUsed, tr$testname))

    if(type[i]==1 || type[i]==3) {
      cs <- formatCats(stats[[i]], nam, tr, type[i],
                       if(length(x$group.freq)) x$group.freq else x$n[i],
                       npct, pctdig, exclude1, long, prtest,
                       latex=TRUE, testUsed=testUsed,
                       npct.size=npct.size,
                       pdig=pdig, eps=eps,
                       footnoteTest=gt1.test, dotchart=dotchart)
      nn <- c(nn, rep(NA, nrow(cs)-1))
    } else cs <- formatCons(stats[[i]], nam, tr, x$group.freq, prmsd,

prtest=prtest, formatArgs=formatArgs,round=round,

                            latex=TRUE, testUsed=testUsed,
                            middle.bold=middle.bold,
                            outer.size=outer.size, msdsize=msdsize,
                            pdig=pdig, eps=eps, footnoteTest=gt1.test)

    cstats <- rbind(cstats, cs)
    if(length(auxc) && nrow(cstats) > 1)
      auxc <- c(auxc, rep(NA, nrow(cs)-1))
  }

  lab <- dimnames(cstats)[[1]]
  gl <- names(x$group.freq)

##gl <- if(length(gl)) paste(gl, " $(N=",x$group.freq,")$",sep="")else " "

  ## Thanks: Eran Bellin <[EMAIL PROTECTED]>   3Aug01
  if(!length(gl))
    gl <- " "

  lab <- sedit(lab,c(" ","&"),c("~","\\&"))  #was format(lab) 21Jan99
  lab <- latexTranslate(lab, greek=.R.)
  gl  <- latexTranslate(gl, greek=.R.)

## if(any(gl != " ")) gl <- paste(gl, "$(N=",x$group.freq,")$",sep="") # 3Aug01

  ## Added any( ) 26Mar02  21jan03
  extracolheads <-
    if(any(gl != " "))
      c(if(prn)'', paste('$N=',x$group.freq,'$',sep=''))
    else NULL # 21jan03

  if(length(test) && !all(prtest=='none')) {
    gl <- c(gl,
            if(length(prtest)==1 && prtest!='stat')
              if(prtest=='P') 'P-value'
              else prtest
            else 'Test Statistic')

if(length(extracolheads)) extracolheads <- c(extracolheads,'') #21jan03

  }

  dimnames(cstats) <- list(NULL,gl)
  ## was dimnames(cstats) <- list(lab, gl) 12aug02
  cstats <- data.frame(cstats, check.names=FALSE)

  ## Added row.names=lab below 10jul02 - S+ was dropping dimnames[[1]]
  ##attr(cstats,'row.names') <- lab  12aug02
  col.just <- rep("c",length(gl))
  if(dcolumn && all(prtest!='none') &&
     gl[length(gl)] %in% c('P-value','Test Statistic'))
    col.just[length(col.just)] <- '.'

  if(prn) {
    cstats <- data.frame(N=nn, cstats, check.names=FALSE)
    col.just <- c("r",col.just)
  }

  if(!insert.bottom)
    legend <- NULL
  else {
    legend <- paste(if(any(type==2)) {

paste("\\noindent {\\",outer.size," $a$\\}{",bld,"$b$\\ }{\\",outer.size," $c$\\ } represent the lowerquartile $a$, the median $b$, and the upper quartile $c$\\ forcontinuous variables.",if(prmsd) '~~$x\\pm s$ represents$\\bar{X}\\pm 1$ SD.'

                            else '',
                            '\\\\', sep="")
                    } else NULL,

if(prn) '$N$\\ is the number of non--missingvalues.\\\\',

                    if(any(type==1) && npct=='numerator')
                      'Numbers after percents are frequencies.\\\\',
                    sep="\n")
    legend <- NULL
    if(any(type==2)) {
      legend <- paste("\\noindent {\\", outer.size, " $a$\\ }{", bld,
                      "$b$\\ }{\\", outer.size,

" $c$\\ } represent the lower quartile $a$, themedian $b$, and the upper quartile $c$\\ for continuous variables.",if(prmsd) '~~$x\\pm s$ represents $\\bar{X}\\pm1$ SD.'

                      else '',
                      '\\\\\n', sep="")
    }

    if(prn) {
      legend <- paste(legend,
                      '$N$\\ is the number of non--missing values.\\\\\n',
                      sep='')
    }

    if(any(type==1) && npct=='numerator') {
      legend <- paste(legend,
                      'Numbers after percents are frequencies.\\\\\n',
                      sep='')
    }

    if(length(testUsed))
      legend <-paste(legend,
                     if(length(testUsed)==1)'\\noindent Test used:'
                     else '\\indent Tests used:',
                     if(length(testUsed)==1) paste(testUsed,'test')
                     else
                       paste(paste('$^{',1:length(testUsed),'}$',testUsed,
                                   ' test',sep=''),collapse='; '))

    ## added rowname=lab 12aug02  added '\n\n' 4mar03 for ctable=T
  }

  if(length(auxc)) {
    if(length(auxc) != nrow(cstats))
      stop(paste('length of auxCol (',length(auxCol[[1]]),
                 ') is not equal to number or variables in table (',
                 nv,').', sep=''))
    auxcc <- format(auxc)
    auxcc[is.na(auxc)] <- ''
    cstats <- cbind(auxcc, cstats)
    nax <- names(auxCol)
    heads <- get2rowHeads(nax)
    names(cstats)[1] <- heads[[1]]
    if(length(col.just)) col.just <- c('r', col.just)
    if(length(extracolheads)) extracolheads <- c(heads[2], extracolheads)
  }

resp <- latex.default(cstats, title=title, caption=caption,rowlabel=rowlabel,

                        col.just=col.just, numeric.dollar=FALSE,

insert.bottom=legend, rowname=lab,dcolumn=dcolumn,

                        extracolheads=extracolheads, extracolsize=Nsize,
                        ...)

  if(dotchart)
    resp$style <- unique(c(resp$style, 'calc', 'epic', 'color'))

  resp
}


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Package Hmisc, functions summary.formula() and latex(), options pdig, pctdig, eps and prmsd

Reply via email to