On Aug 21, 2009, at 3:41 PM, Alexander Shenkin wrote:
Thanks everyone for their replies, both on- and off-list. I should
clarify, since I left out some important information. My original
dataframe has some numeric columns, which get changed to character by
gsub when I replace spaces with NAs.
If you used is.na() <- that would not happen to a true _numeric_
vector (but, of course, a numeric vector in a data.frame could not
have spaces, so you are probably not using precise terminology). You
would be well advised to include the actual code rather than applying
loose terminology subject you your and our misinterpretation.
?is.na
I am guessing that you were using read.table() on the original data,
in which case you should look at the colClasses parameter.
--
David Winsemius
Thus, in going back to a
dataframe, those (now character) columns get converted to factors. I
recently added stringsAsFactors to get characters to make things a bit
easier. I wrote the column-type reset function below, but it feels
kludgey, so was wondering if there was some other way to specify how
one
might want as.data.frame to handle the columns.
str(final_dataf)
'data.frame': 1127 obs. of 43 variables:
$ block : Factor w/ 1 level "2": 1 1 1 1 1 1 1 1 1 1 ...
$ treatment : Factor w/ 4 levels "I","M","N","T": 1 1 1 1 1 1 1 1 1
1 ...
$ transect : Factor w/ 1 level "4": 1 1 1 1 1 1 1 1 1 1 ...
$ tag : chr NA "121AL" "122AL" "123AL" ...
...
$ h1 : num NA NA NA NA NA NA NA NA NA NA ...
...
reset_col_types <- function (df, col_types) {
# Function to reset column types in dataframes. col_types can be
constructed
# by using lapply(class,df)
coerce_fun = list (
"character" = `as.character`,
"factor" = `as.factor`,
"numeric" = `as.numeric`,
"integer" = `as.integer`,
"POSIXct" = `as.POSIXct`,
"logical" = `as.logical` )
for (i in 1:length(df)) {
df[,i] = coerce_fun[[ col_types[i] ]]( df[,i] ) #apply coerce
function
}
return(df)
}
col_types = lapply(final_dataf, class)
col_types = lapply(col_types, function(x) x[length(x)]) # for posix,
take the more specified class
names(col_types)=NULL
col_types = unlist(col_types)
final_dataf = as.data.frame(lapply(final_dataf, function(x)
gsub('^\\s*$',NA,x)), stringsAsFactors = FALSE)
final_dataf = reset_col_types(final_dataf, col_types)
Thanks,
Allie
On 8/21/2009 10:54 AM, Steve Lianoglou wrote:
Hi Allie,
On Aug 21, 2009, at 11:47 AM, Alexander Shenkin wrote:
Hello all,
I have a list which I'd like to convert to a data frame, while
maintaining control of the columns' data types (akin to the
colClasses
argument in read.table). My numeric columns, for example, are
getting
converted to factors by as.data.frame. Is there a way to do this,
or
will I have to do as I am doing right now: allow as.data.frame to
coerce
column-types as it sees fit, and then convert them back manually?
This doesn't sound right ... are there characters buried in your
numeric columns somewhere that might be causing this?
I'm pretty sure this shouldn't happen, and a small test case here
goes
along with my intuition:
R> a <- list(a=1:10, b=rnorm(10), c=LETTERS[1:10])
R> df <- as.data.frame(a)
R> sapply(df, is.factor)
a b c
FALSE FALSE TRUE
Can you check to see if your data's wonky somehow?
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.