On Thu, Aug 21, 2008 at 04:20:57PM +0100, Williams, Robin wrote: > Hi Dan, > Thanks for the reply, yes, I am using read.csv on the attached file.
OK, so how about using the colClasses argument. Your problem is that some malfunctioning software has inserted the value "#VALUE!" into some of your supposedly numeric cells. So deal with that with the na.strings argument. Like I said, when reading in data, it's worth spending a minute looking at the documentation for read.table/read.csv rather than spending an hour messing about with the results of not doing so. > Southwest <- read.csv("southwest.csv", > colClasses=c("character",rep("numeric",10), "character"), > na.strings="#VALUE!") > str(Southwest) 'data.frame': 1530 obs. of 12 variables: $ date : chr "5/1/1997" "5/2/1997" "5/3/1997" "5/4/1997" ... $ maxtemp : num 18.8 21.8 16.6 14.9 14.2 9.3 9.9 12.7 12.8 13.2 ... $ mintemp : num 7.7 9.8 11 12.2 11.3 4.5 2.1 5.7 6.7 7.3 ... $ pressure : num 1028 1023 1015 1001 989 ... $ humid : num 59 44 83 80 87 57 64 83 70 69 ... $ wind : num 8.4 11.1 8.2 17.4 13.8 16.2 11.1 14.9 12.7 16.6 ... $ rain : num 0 0 6 1 3.3 2.6 4.3 6 3.2 1.6 ... $ index : num 1 2 3 4 5 6 7 8 9 10 ... $ admissions: num 5.00 4.72 5.16 3.67 3.62 ... $ detrended : num 4.79 4.47 5.30 3.91 3.51 ... $ detrended2: num 4.79 4.47 5.30 3.91 3.51 ... $ d.o.w. : chr "Thu" "Fri" "Sat" "Sun" ... NB you could coerce those dates to a date class rather than character but I'll leave that up to you. str() is your friend. Dan > However, as when I do > Southwest <- data.frame(read.csv("southwest.csv") read.csv returns a data frame; no need to wrap it in data.frame() > Names(southwest) > the output is the column headings (i.e. the variables), and looking at > the data I only get the numbers, I assume the column headings haven't > become confused with the data. > I.e. if I just do > Southwest$pressure > The output is correct, i.e. the values contained in the pressure column. > > Appologies for my repeated question, but I'm somewhat confused on this > one and my lack of experience with R isn't helping matters. I don't even > understand why R is interpreting these figures as factors in the first > place, doesn't this imply that any similar data would be interpreted as > factors? > Thanks for any further help. > Robin Williams > Met Office summer intern - Health Forecasting > [EMAIL PROTECTED] > -----Original Message----- > From: Dan Davison [mailto:[EMAIL PROTECTED] > Sent: Thursday, August 21, 2008 4:11 PM > To: Williams, Robin > Cc: r-help@r-project.org > Subject: Re: [R] Very confused with class > > Hi Robin, > > You haven't said where you're getting the data from. But if the answer > is that you're using read.table, read.csv or similar to read the data > into R, then I advise you to go back to that stage and get it right from > the outset. It's very, very common to see people who are relatively new > to R splattering their code with calls to as.numeric, just because they > haven't read the data in properly in the first place. It's also common > in those who aren't new to R... So e.g. if you are using read.table, > then use the colClasses argument to specify the classes of your columns, > and use str() on the result until you're happy with the data frame > produced. > > It's not entirely clear why you would have ended up with factors if your > data are numeric. That often happens when people mix characters with > numbers. Perhaps you have mixed the header row up with the data? > > Anyway, what you are seeing are the integer encodings of the factors. > E.g. > > > f <- factor(11:20) > > str(f) > Factor w/ 10 levels "11","12","13",..: 1 2 3 4 5 6 7 8 9 10 > > as.numeric(f) > [1] 1 2 3 4 5 6 7 8 9 10 > > But don't mess with them. Just make sure that things which shouldn't be > factors never become factors. > > Dan > > On Thu, Aug 21, 2008 at 03:40:58PM +0100, Williams, Robin wrote: > > Hi all, > > I am very confused with class. > > I am looking at some weather data which I want to use as explanatory > > > variables in an lm. R has treated these variables as factors (i.e. > > with different levels), whereas I want them treated as discretely > > measured continuous variables. So I need to reassign the class of > > these variables, right? > > Indeed, doing > > class(southwest$pressure) > > (pressure being air pressure), I get > > #> factor. > > Now what class should I use to reassign them so that my model > > fitting process goes as I want it to? I have obviously done something > > wrong. I did southwest$pressure <- as(southwest$pressure,"numeric") > > numeric seeming like a reasonable class to assign to this variable. > > However, doing some summary stats like > > mean(southwest$pressure) > > #> 341, > > max(southwest$pressure) > > #> 761, > > which is clearly nonsense, as my maximum value is around 1040. > > Something similar has happened to maxtemp (maximum temperature), which > > > I also reassigned from a factor to class numeric, which now apparently > > > has a maximum value of 147! > > Clearly it must be the reassignment of class that has caused these > > problems, as summary stats on the data before I reassigned the classes > > > were fine. What is wrong with the class numeric? Reading the numeric > > help page didn't reveal anything to me. Can someone suggest the > > correct class? > > Many thanks for any help. > > Robin Williams > > Met Office summer intern - Health Forecasting > > [EMAIL PROTECTED] > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > http://www.stats.ox.ac.uk/~davison -- http://www.stats.ox.ac.uk/~davison ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.