Hi All,

    I don't have a "I need help" question, so much as a query into any
update whether 'R' has made any progress with some of the core functions
retaining classes.  As an example, because it's one of the cases that most
egregiously impacts me & my work and keeps pushing me away from 'R' and
into other numerical languages (such as NumPy in python), I will use sapply
/ lapply to demonstrate, but this behavior is ubiquitous throughout 'R'.

    Let's say I have a class which is theoretically supported, but not one
of the core "numeric" or "character" classes (and, to some degree, "factor"
classes).  Many of the basic functions will convert my desired class into
either numeric or character, so that my returned answer is gibberish.

E.g.:

test= as.difftime(c(1, 1, 8, 0.25, 8, 1.25), units= "days")  ## create a
small array of time differences
class(test)  ## this will return the proper class, "difftime"
class(test[1] ) ## this will also return the proper class, "difftime"
sapply(test, class)  ## this will return *numerics* for all of the classes.
 Ack!!

    In the example I give above, the impact might seem small, but the
implications are *huge*.  This means that I am, in effect, not allowed to
use *any* of the vectoring functions in 'R', which avoid performing loops
thereby speeding up process time extraordinarily.  Many can sympathize that
'R' is ridiculously slow with "for" loops, compared to other languages.
 But that's theoretically OK, a good statistician or data analyst should be
able to work comfortably with matrices and vectors.  However, *'R' cannot
work comfortably* with matrices or vectors, *unless* they are using the
numeric or character classes.  Many of the classes suffer the problem I
just described, although I only used "difftime" in the example.  Factors
seem a bit more "comfortable", and can be handled most of the time, but not
as well as numerics, and at times functions working on factors can return
the numerical representation of the factor instead of the original factor.

    Is there any progress in guaranteeing that all core functions either
(a) ideally return exactly the classes, and hierarchy of classes, that they
received (e.g., a list of data frames with difftimes & dates & characters
would return a list of data frames with difftimes & dates & characters), or
(b) barring that, the function should at least error out with a clear error
explaining that sapply, for example, cannot vectorize on the class being
used?  Returning incorrect answers is far worse than returning an error,
from a perspective of stability.

    This is, by far, the largest Achilles' heel to 'R'.  Personally, as my
career advances and I work on more technical things, I am finding that I
have to leave 'R' by the wayside and use other languages for robust
numerical calculations and programming.  This saddens me, because there are
so many wonderful packages developed by the community.  The example above
came up because I am using the "forecast" library to great effect in
predicting how long our product cycle time will be.  However, I spend much
of my time fighting all these class & typing bugs in 'R' (and we have to
start recognizing that they are bugs, otherwise they may never get
resolved), such that many of the improvements in my productivity due to all
the wonderful computational packages are entirely offset by the time
I spend fighting this issue of poor classes.

                                     Thanks & Regards!
                                              Mike

---
XKCD <http://www.xkcd.com>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to