Thank you very much Martin. Below is a patch implementing that.
Two newbie questions: - should I add row.names = NULL, optional = FALSE to match the arguments of the generic? (this is not the case for e.g. as.data.frame.table but I thought it was needed: https://cloud.r-project.org/doc/manuals/r-devel/R-exts.html#Generic-functions-and-methods) - shouldn't we use match.fun(transFUN)? diff --git a/src/library/stats/R/lm.R b/src/library/stats/R/lm.R index 13a458797b..2ce6b16f6e 100644 --- a/src/library/stats/R/lm.R +++ b/src/library/stats/R/lm.R @@ -982,3 +982,18 @@ labels.lm <- function(object, ...) asgn <- object$assign[qr.lm(object)$pivot[1L:object$rank]] tl[unique(asgn)] } + +as.data.frame.lm <- function(x, ..., level = 0.95, transFUN = NULL) +{ + cf <- x |> summary() |> coef() + ci <- confint(x, level = level) + if(!is.null(transFUN)) { + stopifnot(is.function(transFUN)) + cf[, "Estimate"] <- transFUN(cf[, "Estimate"]) + ci <- transFUN(ci) + } + df <- data.frame(row.names(cf), cf, ci, row.names = NULL) + names(df) <- c("term", "estimate", "std.error", "statistic", "p.value", + "conf.low", "conf.high") + df +} diff --git a/src/library/stats/man/lm.Rd b/src/library/stats/man/lm.Rd index ff05afabff..b54373dff4 100644 --- a/src/library/stats/man/lm.Rd +++ b/src/library/stats/man/lm.Rd @@ -21,6 +21,8 @@ lm(formula, data, subset, weights, na.action, singular.ok = TRUE, contrasts = NULL, offset, \dots) \S3method{print}{lm}(x, digits = max(3L, getOption("digits") - 3L), \dots) + +\S3method{as.data.frame}{lm}(x, ..., level = 0.95, transFUN = NULL) } \arguments{ \item{formula}{an object of class \code{"\link{formula}"} (or one that @@ -81,6 +83,10 @@ lm(formula, data, subset, weights, na.action, \item{digits}{the number of \emph{significant} digits to be passed to \code{\link{format}(\link{coef}(x), .)} when \I{\code{\link{print}()}ing}.} + %% as.data.frame.lm(): + \item{level}{the confidence level required.} + \item{transFUN}{a function to transform \code{estimate}, \code{conf.low} and + \code{conf.high}.} } \details{ Models for \code{lm} are specified symbolically. A typical model has @@ -168,6 +174,10 @@ lm(formula, data, subset, weights, na.action, \code{effects} and (unless not requested) \code{qr} relating to the linear fit, for use by extractor functions such as \code{summary} and \code{\link{effects}}. + + \code{as.data.frame} returns a data frame with statistics as provided by + \code{coef(summary(.))} and confidence intervals for model + estimates. } \section{Using time series}{ Considerable care is needed when using \code{lm} with time series. De : Martin Maechler [mailto:maech...@stat.math.ethz.ch] Envoyé : vendredi 17 janvier 2025 17:04 À : SOEIRO Thomas Cc : r-devel@r-project.org Objet : Re: [Rd] as.data.frame() methods for model objects >>>>> SOEIRO Thomas via R-devel >>>>> on Fri, 17 Jan 2025 14:19:31 +0000 writes: > Following Duncan Murdoch's off-list comments (thanks again!), here is a more > complete/flexible version: > > as.data.frame.lm <- function(x, ..., level = 0.95, exp = FALSE) { > cf <- x |> summary() |> stats::coef() > ci <- stats::confint(x, level = level) > if (exp) { > cf[, "Estimate"] <- exp(cf[, "Estimate"]) > ci <- exp(ci) > } > df <- data.frame(row.names(cf), cf, ci, row.names = NULL) > names(df) <- c("term", "estimate", "std.error", "statistic", "p.value", > "conf.low", "conf.high") > df > } Indeed, using level is much better already. Instead of the exp = FALSE , I'd use transFUN = NULL and then if(!is.null(transFUN)) { stopifnot(is.function(transFUN)) cf[, "Estimate"] <- transFUN(cf[, "Estimate"]) ci <- transFUN(ci) } Noting that I'd want "inverse-logit" (*) in some cases, but also different things for different link functions, hence just exp = T/F is not enough. Martin -- *) "inverse-logit" is simply R's plogis() function; quite a few people have been re-inventing it, also in their packages ... > > lm(breaks ~ wool + tension, warpbreaks) |> as.data.frame() > term estimate std.error statistic p.value conf.low conf.high > 1 (Intercept) 39.277778 3.161783 12.422667 6.681866e-17 32.92715 45.6284061 > 2 woolB -5.777778 3.161783 -1.827380 7.361367e-02 -12.12841 0.5728505 > 3 tensionM -10.000000 3.872378 -2.582393 1.278683e-02 -17.77790 -2.2221006 > 4 tensionH -14.722222 3.872378 -3.801856 3.913842e-04 -22.50012 -6.9443228 > > > glm(breaks < 20 ~ wool + tension, data = warpbreaks) |> as.data.frame(exp = > > TRUE) > Waiting for profiling to be done... > term estimate std.error statistic p.value conf.low conf.high > 1 (Intercept) 1.076887 0.1226144 0.6041221 0.54849393 0.8468381 1.369429 > 2 woolB 1.076887 0.1226144 0.6041221 0.54849393 0.8468381 1.369429 > 3 tensionM 1.248849 0.1501714 1.4797909 0.14520270 0.9304302 1.676239 > 4 tensionH 1.395612 0.1501714 2.2196863 0.03100435 1.0397735 1.873229 > > Thank you. > > Best regards, > Thomas > > > > -----Message d'origine----- > De : SOEIRO Thomas > Envoyé : jeudi 16 janvier 2025 14:36 > À : r-devel@r-project.org > Objet : as.data.frame() methods for model objects > > Hello all, > > Would there be any interest for adding as.data.frame() methods for model > objects? > Of course there is packages (e.g. broom), but I think providing methods would > be more discoverable (and the patch would be small). > It is really useful for exporting model results or for plotting. > > e.g.: > > as.data.frame.lm <- function(x) { # could get other arguments, e.g. exp = > TRUE/FALSE to exponentiate estimate, conf.low, conf.high > cf <- x |> summary() |> stats::coef() > ci <- stats::confint(x) > data.frame( > term = row.names(cf), > estimate = cf[, "Estimate"], > p.value = cf[, 4], # magic number because name changes between lm() and > glm(*, family = *) > conf.low = ci[, "2.5 %"], > conf.high = ci[, "97.5 %"], > row.names = NULL > ) > } > > > lm(breaks ~ wool + tension, warpbreaks) |> as.data.frame() > term estimate p.value conf.low conf.high > 1 (Intercept) 39.277778 6.681866e-17 32.92715 45.6284061 > 2 woolB -5.777778 7.361367e-02 -12.12841 0.5728505 > 3 tensionM -10.000000 1.278683e-02 -17.77790 -2.2221006 > 4 tensionH -14.722222 3.913842e-04 -22.50012 -6.9443228 > > > glm(breaks < 20 ~ wool + tension, data = warpbreaks) |> as.data.frame() > Waiting for profiling to be done... > term estimate p.value conf.low conf.high > 1 (Intercept) 0.07407407 0.54849393 -0.16624575 0.3143939 > 2 woolB 0.07407407 0.54849393 -0.16624575 0.3143939 > 3 tensionM 0.22222222 0.14520270 -0.07210825 0.5165527 > 4 tensionH 0.33333333 0.03100435 0.03900286 0.6276638 > > Thank you. > > Best regards, > Thomas ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel