Re: [Rd] Subscripting issues unrelated to [Subscripting fails if name of element is "" (PR#8161)]

Prof Brian Ripley Fri, 07 Oct 2005 10:38:52 -0700

Jens,

This is a completely separate issue. In indexing, character NA matchesthe name "NA". That was a bug, but it is nothing to do with the subjectline or PR#8161, and for the record let's keep this separate. The`critical point' is not to build a theory around misunderstandings ofseveral unrelated examples.


You say

get("$")(lx, as.character(NA))

goes wrong. Now the documentation has x$name for 'name' a symbol or acharacter string, and you have passed an _expression_ and got anappropriate error message,


Error in get("$")(lx, as.character(NA)) : invalid subscript type

If you don't see that, please review the section in R-lang or the BlueBook. Equally

 get("$")(lx, as.character("a"))

Error in get("$")(lx, as.character("a")) : invalid subscript type

so it nothing to do with NA or "" or the subject line here.

On the other hand

substitute(lx$y, list(y=as.character(NA)))


is lx$NA, using a name (and no longer a character NA).

Brian


On Fri, 7 Oct 2005, "Jens Oehlschlägel" wrote:

Dear Brian,

Thanks for picking this up.
I think the critical point is that it is not a single isolated bug and it
would be a main effort to get this stuff consistent, because it (and
implications) seems to be spread all over the code. The to be applauded
efforts to properly sort out "NA" vs. as.character(NA) have not been fully
successful yet and "" is a similar issue. Please consider the following,
sorry for the length:


# ERROR 1

# I agree that c() disallows "" and NA names
# it makes sense discouraging users from using such names

c(as.character(NA)=1)

Fehler: Syntaxfehler in Zeile "c(as.character(NA)="

c("NA"=2, "a"=3)

NA  a
2  3

c(""=4)

Fehler: Versuch einen Variablennamen der Länge 0 zu nutzen

# however, "NA" must be expected as a legal name, e.g. when importing data
# and in your example specifying "no-name" in fact results in a "" name

names(c(a=1, 2))

[1] "a" ""


# My interpreteation is that the user specifies a mixture of elements with
and without names,
# and therefore the no-names must be co-erced to "" names, and in principle
that's completely fine

# a character vector is defined to have either as.character(NA) OR "NA" OR
"" or another positive length string
# (which is complicated enough)
# formally the names is an attribute (character vector) of an object and can
be manipulated as such

x <- 1:4
names(x) <- c(NA, "NA", "a", "")
names(x)

[1] NA   "NA" "a"  ""

# and in principle all of those can be properly distinguished
x[match(names(x), names(x))]

<NA>   NA    a
  1    2    3    4


# introducing a fifth non-name state that sometimes equals "" and sometimes
not, introduces inconsistency into the language
# e.g. the fact that elements can be selected by their name but not by their
non-name
# Thus currently selecting by names is a mess from a consistency perspective

x[names(x)]

<NA> <NA>    a <NA>
  1    1    3   NA

# in the following subscripting with "" works, but not with "NA"

for (i in names(x))

+ print(x[[i]])
[1] 1
[1] 1
[1] 3
[1] 4


# ERROR 1a: If failing on "NA" is not a bug, I switch from programming to
Kafka

x["NA"]

<NA>

  1
# ERROR 1b: clearly wrong

x[["NA"]]

[1] 1
# ERROR 1c: and from my humble understanding failing on "" is a bug as well

x[""]

<NA>
 NA
# wheras interestingly this is correct

x[[""]]

[1] 4


# I think it is obvious how to remove these inconsistencies
# (as long as we do not disallow "" in names alltogether,
#  which is almost impossible, since every users legally can set the names
vector in a variety of ways )

# these are not easy, but perfectly fine

x[as.character(NA)]

<NA>
  1

x[as.integer(NA)]

<NA>
 NA

# and these are really debatable difficult ones

x[NA]

<NA> <NA> <NA> <NA>
 NA   NA   NA   NA

x[as.logical(NA)]

<NA> <NA> <NA> <NA>
 NA   NA   NA   NA



## ERROR 2+3: the above inconsistencies generalize to lists

lx <- as.list(x)

lx

$"NA"         (ERROR 2a)
[1] 1

$"NA"
[1] 2

$a
[1] 3

[[4]]           (ERROR 2b)
[1] 4

# and should read

lx

$NA             (  or $as.character(NA) for clarity and warning )
[1] 1

$"NA"
[1] 2

$a
[1] 3

$""
[1] 4


# Note that - except for printing - match works perfectly in

lx[match(names(lx), names(lx))]

$"NA"
[1] 1

$"NA"
[1] 2

$a
[1] 3

[[4]]
[1] 4

# and also in

for (i in match(names(lx), names(lx)))

+ print(lx[[i]])
[1] 1
[1] 2
[1] 3
[1] 4


# Of course I consider the following behaviour as inconsistent

lx[names(lx)]

$"NA"
[1] 1

$"NA"
[1] 1           (ERROR 3a)

$a
[1] 3

$"NA"
NULL            (ERROR 3b)


# using [[ the second one fails

for (i in names(lx))

+ print(lx[[i]])
[1] 1
[1] 1           (ERROR 3c)
[1] 3
[1] 4           (interestingly correct)


# finally note that this works

eval(substitute(lx$y, list(y=as.character(NA))))

# but not this

get("$")(lx, as.character(NA))

Fehler in get("$")(lx, as.character(NA)) : ungültiger Indextyp
# and both go wrong with "NA"

--
Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko!
Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Subscripting issues unrelated to [Subscripting fails if name of element is "" (PR#8161)]

Reply via email to