Re: [Rd] Statistical mode

Arni Magnusson Fri, 27 May 2011 08:50:55 -0700

Thank you, Kevin, for the feedback.

1. The mode is not so interesting for continuous data. I would muchrather use something like density().

Absolutely. The help page for statmode() says it is for discrete data, andpoints to density() for continuous data.

2. Both the iris and barley data sets are balanced (each factor levelappears equally often), and the current output from the statmodefunction is misleading by only showing one level.

Try statmode(iris,TRUE). It points out that petal lengths 1.4 and 1.5 areequally common in the data. I decided to make all=FALSE the defaultbehavior, but I'd be equally happy with all=TRUE as the default.

As for the barley data, statmode(barley,TRUE) is just the honest answer.The yield is continuous, so the discrete mode is not of interest, and thefactors levels are all equally common as you point out.

3. I think the describe() function in the Hmisc package is much moreuseful and informative, even for introductory stat classes. I alwaysuse describe() after importing data into R.

The describe() function is a verbose summary, usually of a data frame. Thestatmode() function is the discrete mode, usually of a vector.Importantly, describe(faithful$waiting) points out the mean, median andrange, but not the mode.

---

Allow me to include two more valid comments, from Sarah Goslee and DavidWinsemius, respectively:

4. The 'modeest' package does this and more, see for example mfv().

I think core R should come with a basic function to get the mode of adiscrete vector. One option would be to lift mfv() into the 'stats'package, but something like statmode() could also cover factors andstrings. Might as well provide all=TRUE/FALSE functionality, too, andretain integers as integers.

It's common to find rudimentary basic functionality in the 'stats'package, and dedicated packages for more details; time series models androbust statistics come to mind. The 'modeest' package is impressiveindeed.

5. Isn't this just table(Vec)[which.max(table(Vec))]?

Yes it is, only less cumbersome. Much like sd(Vec) is less cumbersome thansqrt(var(Vec)). Moreover, I find it confusing to see the count as well,


  table(volcano)[which.max(table(volcano))]
  # 110
  # 177

although this can be debated. Finally, I think the examples

  statmode(mtcars)
  statmode(mtcars, TRUE)

demonstrate practical functionality beyondtable(Vec)[which.max(table(Vec))].

The mean, median, and mode are often mentioned together as fundamentaldescriptive statistics, and I just find it odd that statmode() is notalready in core R. Sure, we could get by without the sd() function in coreR, but why should we?


All the best,

Arni

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Statistical mode

Reply via email to