Hi Tibor,
I'll try again. Your problem has nothing to do with factors and everything
to do with trying to bind a vector to a dataframe and not understanding that a
vector must be of one class and that a column in a data frame is a vector and
therefore must also be of one class. If you want to add a new row of data to
your existing data frame use a new data frame with one row.
df <- data.frame(
P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")),
ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))),
RT = round(runif(6, 7000, 16000), 0)
)
dq<-data.frame(
P = factor("in"),
ANSWER = factor("V>N"),
RT = round(runif(1,7000, 16000), 0)
)
df2 <- rbind(df,dq)
df2
In this approach R kindly adds the new class to the factor variables and the
numeric value to the numeric variable.
Keeping in mind that a vector can only be of one class will save you many
debugging hours later on.
Tim
-----Original Message-----
From: Sarah Goslee <[email protected]>
Sent: Tuesday, September 20, 2022 9:02 AM
To: [email protected]
Cc: Ebert,Timothy Aaron <[email protected]>; [email protected]
Subject: Re: [R] Question concerning side effects of treating invalid factor
levels
[External Email]
Hi Tibor,
No, you are misunderstanding the source of the problem. It has nothing to do
with factors.
Instead, it has to do with the inability of a vector to hold more than one
class.
You are using rbind() to add a new row to your data frame, but that vector is
being coerced to character. That's what is forcing your numeric column to
become character: you're adding a character to it.
> c("in", "V>N", round(runif(1, 7000, 16000), 0))
[1] "in" "V>N" "15709"
It has nothing whatsoever to do with factors or factor levels, and would occur
if you were adding it to a data frame with character values.
If you want to mix types, you cannot use a vector.
c2 <- data.frame(P = "in", ANSWER = "V>N", RT = round(runif(1, 7000, 16000), 0))
> str(rbind(df, c2))
'data.frame': 7 obs. of 3 variables:
$ P : Factor w/ 4 levels "mit","mittels",..: 2 1 2 3 1 1 4
$ ANSWER: Factor w/ 3 levels "OBJ>PP","PP>OBJ",..: 2 2 2 2 1 1 3
$ RT : num 10867 14808 11600 15881 8984 ...
Sarah
On Tue, Sep 20, 2022 at 8:45 AM Tibor Kiss via R-help <[email protected]>
wrote:
>
> Hi,
>
> this is a misunderstanding of my question. I wasn't worried about invalid
> factor levels that produce NA. My question was why a column changes its
> class, which I thought was a side effect. If you add a vector containing one
> character string, the class of the whole vector becomes _chr_. And after this
> element has been added to a column, we have two NAs for the column which are
> factors, and a character string, which is responsible for the change of a
> numerical vector into a character string vector (see ?c, where you find: "The
> output type is determined from the highest type of the components in the
> hierarchy NULL < raw < logical < integer < double < complex < character <
> list < expression.").
>
>
> Best
>
>
> Tibor
>
>
>
> > Am 19.09.2022 um 13:59 schrieb Ebert,Timothy Aaron <[email protected]>:
> >
> > In your example code, the variable remains a class factor, and all entries
> > are valid. The variables will behave as expected given the factor levels in
> > the original dataframe.
> >
> > (At least on my system R 4.2, in RStudio, in Windows) R returns a couple of
> > error messages warning me that I was bad.
> > What you get is NA for "not available", or "not appropriate" or a missing
> > value. You gave the system an invalid factor level so it was entered as
> > missing. If you get data that has a new factor level, you need to tell R to
> > expect a new factor level first.
> >
> > levels(f1) <- c(levels(f1),"New Level")
> > levels(f1) <- c(levels(f1),c("NL1","NL2"))
> >
> >
> > Tim
> > -----Original Message-----
> > From: R-help <[email protected]> On Behalf Of Tibor Kiss
> > via R-help
> > Sent: Monday, September 19, 2022 6:11 AM
> > To: [email protected]
> > Subject: [R] Question concerning side effects of treating invalid
> > factor levels
> >
> > [External Email]
> >
> > Dear List members,
> >
> > I have tried now for several times to find out about a side effect of
> > treating invalid factor levels, but did not find an answer. Various answers
> > on stackexchange etc. produce the stuff that irritates me without even
> > mentioning it.
> > So I am asking the list (apologies if this has been treated in the past).
> >
> > If you add an invalid factor level to a column in a data frame, this has
> > the side effect of turning a numerical column into a column with character
> > strings. Here is a simple example:
> >
> >> df <- data.frame(
> > P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")),
> > ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))),
> > RT = round(runif(6, 7000, 16000), 0))
> >
> >> str(df)
> > 'data.frame': 6 obs. of 3 variables:
> > $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1
> > $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1
> > $ RT : num 11157 13719 14388 14527 14686 ..
> >
> >> df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0)))
> >
> >> str(df)
> > 'data.frame': 7 obs. of 3 variables:
> > $ P : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA
> > $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA
> > $ RT : chr "11478" "15819" "8305" "8852" ...
> >
> > You see that RT has changed from _num_ to _chr_ as a side effect of adding
> > the invalid factor level as NA. I would appreciate understanding what the
> > purpose of the type coercion is.
> >
> > Thanks in advance
> >
> >
> > Tibor
> > ______________________________________________
> > [email protected] mailing list -- To UNSUBSCRIBE and more, see
> > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fst
> > at.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%4
> > 0ufl.edu%7C89299159318e49b22f6f08da9b087b9f%7C0d4da0f84a314d76ace60a
> > 62331e1b84%7C0%7C0%7C637992757973393617%7CUnknown%7CTWFpbGZsb3d8eyJW
> > IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C300
> > 0%7C%7C%7C&sdata=QDDwdMoIrtcm%2BQrDk4VirkyLIMSaZ29nDo2Ly5qkbjA%3
> > D&reserved=0 PLEASE do read the posting guide
> > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww
> > .r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.
> > edu%7C89299159318e49b22f6f08da9b087b9f%7C0d4da0f84a314d76ace60a62331
> > e1b84%7C0%7C0%7C637992757973393617%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
> > C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%
> > 7C%7C&sdata=PBnIfaLo0LvQ87W5EQETGbat%2BtWcDGCjNitIyYWnEMw%3D&
> > ;reserved=0 and provide commented, minimal, self-contained,
> > reproducible code.
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected] mailing list -- To UNSUBSCRIBE and more, see
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl
> .edu%7C89299159318e49b22f6f08da9b087b9f%7C0d4da0f84a314d76ace60a62331e
> 1b84%7C0%7C0%7C637992757973393617%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
> LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
> &sdata=QDDwdMoIrtcm%2BQrDk4VirkyLIMSaZ29nDo2Ly5qkbjA%3D&reserv
> ed=0 PLEASE do read the posting guide
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
> -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%
> 7C89299159318e49b22f6f08da9b087b9f%7C0d4da0f84a314d76ace60a62331e1b84%
> 7C0%7C0%7C637992757973393617%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&
> sdata=PBnIfaLo0LvQ87W5EQETGbat%2BtWcDGCjNitIyYWnEMw%3D&reserved=0
> and provide commented, minimal, self-contained, reproducible code.
--
Sarah Goslee (she/her)
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.numberwright.com%2F&data=05%7C01%7Ctebert%40ufl.edu%7C89299159318e49b22f6f08da9b087b9f%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637992757973393617%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qhuoctVqUZSBK4BKhVtLqfr2iQDXNwCswZaSlAqF5qQ%3D&reserved=0
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.