Hi,

I totally agree that having foo(x) <- foo(x) behave like a no-op
is a must. This is something I try to be careful about when I design
my own objects and their getters and setters.

Just wanted to mention though that there is notorious violation of
this:

  x <- list(3:-1, NULL)
  x[[2]] <- x[[2]]
  x
  # [[1]]
  # [1]  3  2  1  0 -1

Now of course, not just because there is a precedent means the factor
API shouldn't be improved.

Cheers,
H.


On 09/27/2016 12:33 PM, Dr. Jens Oehlschlägel wrote:
# A couple of years ago
# I helped making R's character NA handling more consistent
# Today I report an issue with R's factor NA handling
# The core problem is that
# levels(g) <- levels(g)
# can change the levels of g
# more details below
# Kind regards
# Jens Oehlschlägel

# Say I have an NA element in a vector or list

x <- c("a","b",NA)

# then using split() it gets lost

split(x, x)

# as it is (somewhat) when converting to a default factor

table(as.factor(x))

# for table the workaround is

table(as.factor(x), exclude=NULL)

# but for split we need

f <- factor(x, exclude=NULL)

split(x, f)

# conclusion: we MUST use an NA level

# so far so good

g <- f
levels(g)

# but re-assigning the levels changes them

levels(g) <- levels(g)
levels(g)

# which I consider a severe problem.
# Yes, I read the help page of levels<-
# about removing levels by assigning NAs to them
# but that implies: we MUST NOT use an NA level

# If a language suggests
# that we MUST and we MUST NOT use an NA level
# the language has limited usefulness
# (and a user who depends on the language
# is put into a DOUBLE BIND)
# SUGGESTION: assure the above assignment does not change levels

# trying to apply the levels of f to new data also fails

g <- factor(x, levels=levels(f))
g

# and giving both arguments even stops

h <- factor(x, levels=levels(f), labels=levels(f))

# I do understand that exclude= meaningfully has effect
# if levels= are to be determined automatically, but
# SUGGESTION: with explicit levels= exclude= should be ignored.

# SUGGESTION: give split(x, y, exclude=NA) an exclude= argument,
# which when set to NULL will prevent dropping NA levels
# when coercing y to factor
# (it still remains open what should have priority
# if y is a factor with an NA-level and exclude=NA)

table(f, exclude=NA)

# here existing levels win over exclude=
# which is consistent with my suggestion for factor(, levels=, exclude=)

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to