You can see what the offending strings are with
  > with(waterchem, levels(SC)[is.na(as.numeric(levels(SC)))])
  [1] "-" "+"
  Warning message:
  In eval(expr, envir, enclos) : NAs introduced by coercion
but it may be easiest to use the colClasses argument to read.table
to force that column to be numeric (with NA's for strings that
could not be interpretted as numbers).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Rich Shepard
> Sent: Tuesday, November 29, 2011 11:19 AM
> To: r-help@r-project.org
> Subject: [R] Why Numeric Values Become Factors in Data Frame
> 
>    I have a data frame with 1 factor, one date, and 37 numeric values:
> str(waterchem)
> 'data.frame': 3525 obs. of  39 variables:
>    site      : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 ...
>   $ sampdate  : Date, format: "2007-12-12" "2008-03-15" ...
>   $ CO3       : num  1 1 6.7 1 1 1 1 1 1 1 ...
>   $ HCO3      : num  231 228 118 246 157 208 338 285 260 240 ...
>   $ Ca        : num  100 88.4 63.4 123 78.2 103 265 213 178 166 ...
>   $ DO        : num  4.96 9.91 4.32 2.58 1.81 5.09 3.98 5.46 1.9 2.52 ...
>   ...
>   $ SC        : Factor w/ 841 levels "1.090","10.000",..: 635 638 363
> 
>    All the numeric categories are read in as numbers except for some of those
> in column 'SC'. I have been looking in the source file for a couple of hours
> trying to learn why values such as 1.090 and 10.000 are seen as characters
> rather than numbers. I've not see the reason.
> 
>    The source file is 860K and looks like this:
> 
> site|sampdate|'Ag'|'Al'|'CO3'|'HCO3'|'Alk-
> Tot'|'As'|'Ba'|'Be'|'Bi'|'Ca'|'Cd'|'Cl'|'Co'|'Cr'|'Cu'|'DO'|'Fe'|'Hg'|'K'|'Mg'|'Mn'|'Mo'|'Na'|'NH4'|'N
> O3-NO2'|'Oil-grease'|'Pb'|'pH'|'Sb'|'SC'|'Se'|'SO4'|'Sr'|'TDS'|'Tl'|'V'|'Zn'
> 'D-1'|'2007-12-
> 12'|0.000|0.106|1.000|231.000|231.000|0.011|0.000|0.002|0.000|100.000|0.000|1.430|0.000|0.006|0.024|4.
> 960|4.110|NA|0.000|9.560|0.035|0.000|0.970|0.010|0.293|NA|0.025|7.800|0.001|630.000|0.001|65.800|0.000
> |320.000|0.001|0.000|11.400
> 'D-1'|'2008-03-
> 15'|0.000|0.080|1.000|228.000|228.000|0.001|0.000|0.002|0.000|88.400|0.000|1.340|0.000|0.006|0.014|9.9
> 10|0.309|0.000|0.000|9.150|0.047|0.000|0.820|0.224|0.020|NA|0.025|7.940|0.001|633.000|0.001|75.400|0.0
> 00|300.000|0.001|0.000|12.400
> 
>    The R command used to create the data frame is:
>          waterchem <- read.table('wqR.txt', header = TRUE, sep = '|')
> 
>    Pointers on how to determine why this one variable has some values and
> characters rather than as numerics are needed.
> 
> Rich
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to