> scs2<-data.frame(lapply(scs2, factor)) Calling data.frame() on the output of lapply() can result in changing column names and will drop attributes that the input data.frame may have had. I prefer to modify the original data.frame instead of making a new one from scratch to avoid these problems.
Also, calling factor() on a factor will drop any unused levels, which you may not want to do. Calling as.factor will not. Compare the following three methods f1 <- function (dataFrame) { dataFrame[] <- lapply(dataFrame, factor) dataFrame } f2 <- function (dataFrame) { dataFrame[] <- lapply(dataFrame, as.factor) dataFrame } f3 <- function (dataFrame) { data.frame(lapply(dataFrame, factor)) } on the following data.frame x <- data.frame(stringsAsFactors=FALSE, check.names=FALSE, "No/Yes" = factor(c("Yes","Yes","Yes"), levels=c("No","Yes")), "Size" = ordered(c("Small","Large","Medium"), levels=c("Small","Medium","Large")), "Name" = c("Adam","Bill","Chuck")) attr(x, "Date") <- as.POSIXlt("2013-02-21") > str(x) 'data.frame': 3 obs. of 3 variables: $ No/Yes: Factor w/ 2 levels "No","Yes": 2 2 2 $ Size : Ord.factor w/ 3 levels "Small"<"Medium"<..: 1 3 2 $ Name : chr "Adam" "Bill" "Chuck" - attr(*, "Date")= POSIXlt, format: "2013-02-21" > str(f1(x)) # drops unused levels 'data.frame': 3 obs. of 3 variables: $ No/Yes: Factor w/ 1 level "Yes": 1 1 1 $ Size : Ord.factor w/ 3 levels "Small"<"Medium"<..: 1 3 2 $ Name : Factor w/ 3 levels "Adam","Bill",..: 1 2 3 - attr(*, "Date")= POSIXlt, format: "2013-02-21" > str(f2(x)) 'data.frame': 3 obs. of 3 variables: $ No/Yes: Factor w/ 2 levels "No","Yes": 2 2 2 $ Size : Ord.factor w/ 3 levels "Small"<"Medium"<..: 1 3 2 $ Name : Factor w/ 3 levels "Adam","Bill",..: 1 2 3 - attr(*, "Date")= POSIXlt, format: "2013-02-21" > str(f3(x)) # mangles column names, drops unused levels, drops Date attribute 'data.frame': 3 obs. of 3 variables: $ No.Yes: Factor w/ 1 level "Yes": 1 1 1 $ Size : Ord.factor w/ 3 levels "Small"<"Medium"<..: 1 3 2 $ Name : Factor w/ 3 levels "Adam","Bill",..: 1 2 3 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of Mark Lamias > Sent: Wednesday, February 20, 2013 6:51 PM > To: Daniel Lopez; R help (r-help@r-project.org) > Subject: Re: [R] Having trouble converting a dataframe of character vectors > to factors > > How about this? > > scs2<-data.frame(lapply(scs2, factor)) > > > > > ________________________________ > From: "Lopez, Dan" <lopez...@llnl.gov> > To: "R help (r-help@r-project.org)" <r-help@r-project.org> > Sent: Wednesday, February 20, 2013 7:09 PM > Subject: [R] Having trouble converting a dataframe of character vectors to > factors > > R Experts, > > I have a dataframe made up of character vectors--these are results from survey > questions. I need to convert them to factors. > > I tried the following which did not work: > scs2<-sapply(scs2,as.factor) > also this didn't work: > scs2<-sapply(scs2,function(x) as.factor(x)) > > After doing either of above I end up with > >str(scs2) > > chr [1:10, 1:10] "very important" "very important" "very important" "very > important" ... > > - attr(*, "dimnames")=List of 2 > > ..$ : NULL > > ..$ : chr [1:10] "Q1_1" "Q1_2" "Q1_3" "Q1_4" ... > > >class(scs2) > "matrix" > > But when I do it one at a time it works: > scs2$Q1_1<-as.factor(scs2$Q1_1) > scs2$Q1_2<- as.factor(scs2$Q1_2) > > What am I doing wrong? How do I accomplish this with sapply or similar > function? > > Data for reproducibility: > > > scs2<-structure(list(Q1_1 = c("very important", "very important", "very > important", > > "very important", "very important", "very important", "very important", > > "somewhat important", "important", "very important"), Q1_2 = c("important", > > "somewhat important", "very important", "important", "important", > > "very important", "somewhat important", "somewhat important", > > "very important", "very important"), Q1_3 = c("very important", > > "important", "very important", "very important", "important", > > "very important", "very important", "somewhat important", "not important", > > "important"), Q1_4 = c("very important", "important", "very important", > > "very important", "important", "important", "important", "very important", > > "somewhat important", "important"), Q1_5 = c("very important", > > "not important", "important", "very important", "not important", > > "important", "somewhat important", "important", "somewhat important", > > "not important"), Q1_6 = c("very important", "not important", > > "important", "very important", "somewhat important", "very important", > > "very important", "very important", "important", "important"), > > Q1_7 = c("very important", "somewhat important", "important", > > "somewhat important", "important", "important", "very important", > > "very important", "somewhat important", "not important"), > > Q2 = c("Somewhat", "Very Much", "Somewhat", "Very Much", > > "Very Much", "Very Much", "Very Much", "Very Much", "Very Much", > > "Very Much"), Q3 = c("yes", "yes", "yes", "yes", "yes", "yes", > > "yes", "yes", "yes", "yes"), Q4 = c("None", "None", "None", > > "None", "Confirmed Field of Study", "Confirmed Field of Study", > > "Confirmed Field of Study", "None", "None", "None")), .Names = c("Q1_1", > > "Q1_2", "Q1_3", "Q1_4", "Q1_5", "Q1_6", "Q1_7", "Q2", "Q3", "Q4" > > ), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L, > > 172L, 110L), class = "data.frame") > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.