Hello William, Ivan and Jim I appreciate your replies.
I did suppress the factors using stringsAsFactors=FALSE and in that way was able to progress some more on getting a sense of the data set, so thanks for that suggestion. I had previously overlooked it. Also thanks William, I never understood what those thick line segs were - now I do. That had been about the best I could get by that point and still not with the names on the x axis. Unfortunately using William's suggestion of 'with' gave me errors: > with(MHP.def, {plot(as.integer(MHP.def$Names),cH.E, axes=FALSE, xlab='Area') axis(side=2) axis(side=1, at=seq_along(levels(MHP.def$Names)), lab=levels(MHP.def$Names))}) Error: unexpected symbol in "with(MHP.def, {plot(as.integer(MHP.def$Names), MHP.def$cH.E, axes=FALSE, xlab='Area') axis" This may have something to do with the period between cH and E or perhaps from the $ to access data from a column? I have now installed ggplot2 and with the help of the graphics cookbook will see if I can make some headway like this, at least for now. I think William's suggestion about learning to work with factors is fundamentally sound and something I will need to get my head around. For now though, I think I'll stick to exploring ggplot2 so that I can visualise this data set more easily. Thanks again. Best Sun On 11/12/14 16:06, William Dunlap wrote: > Here is a reproducible example > > d <- read.csv(text="Name,Age\nBob,2\nXavier,25\nAdam,1") > > str(d) > 'data.frame': 3 obs. of 2 variables: > $ Name: Factor w/ 3 levels "Adam","Bob","Xavier": 2 3 1 > $ Age : int 2 25 1 > > Do you get something similar? If not, show us what you have (you > could trim it down to a few columns). > > Let's try some plots. > > plot(d$Age) > This shows a plot of d$Age (on y axis) vs "Index", where Index is > 1:length(d$Age). The points are at (1,2), (2,25), and (3,1). You gave > plot() no information about what should be on the x axis so it gave > you the index numbers. > > Now asking for d$Name on the x axis and d$Age on the y. > > plot(d$Name, d$Age) > This put the names, in alphabetical order on the x axis. The y axis > ranges from about 0 to 25 and neither axis is labelled. There are > thick horizontal line segments where you expect the the points to > be. These are degenerate boxplots - when you ask to plot a > 'factor' variable on the x axis and numbers on the y you get such > a plot. > > Some folks suggested you avoid factors by adding stringsAsFactors=FALSE > (or as.is <http://as.is>=TRUE) to your call to read.csv. Let's try that > > d2 <- read.csv(stringsAsFactors=FALSE, > text="Name,Age\nBob,2\nXavier,25\nAdam,1") > > plot(d2$Name, d2$Age) > Error in plot.window(...) : need finite 'xlim' values > In addition: Warning messages: > 1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion > 2: In min(x) : no non-missing arguments to min; returning Inf > 3: In max(x) : no non-missing arguments to max; returning -Inf > You get no plot at all. > > You can get closer to what I think you want with > with(d, { > plot(as.integer(Name), Age, axes=FALSE, xlab="Name") > axis(side=2) # draw the usual y axis > axis(side=1, at=seq_along(levels(Name)), lab=levels(Name)) > }) > If you want the names in a different order on the x axis, then reconstruct > the factor object d$Name with a different order of levels. E.g., > d$Name <- factor(d$Name, levels=c("Xavier", "Bob", "Adam")) > and replot. > > There are various plotting packages, e.g., ggplot2, that can make this > sort of thing easier, but I think the recommendation not to use factors > is wrong. You do need to learn how to use them to your advantage. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com <http://tibco.com> > > On Thu, Dec 11, 2014 at 5:00 AM, Sun Shine <phaedr...@gmail.com > <mailto:phaedr...@gmail.com>> wrote: > > Hello > > I am struggling with data frames and would appreciate some help > please. > > I have a data set of 13 observations and 80 variables. The first > column is the names of different political area boundaries (e.g. > MHad, LBNW, etc), the first row is a vector of variable names > concerning various census data (e.g. age.T, hse.Unk, etc.). The > first cell [1,1] is blank. > > I have loaded this via read.csv('path.to/data.set.csv' > <http://path.to/data.set.csv%27>), and now want to run some > analyses on this data frame. If I want to get a list of the names > of the political areas (i.e. the first column), the result is a > vector of numbers which appear to correlate with the factors, but > I don't get the text names, just the corresponding number. So, if > I want to plot something basic, like the area that uses the most > gas for central heating, for example: > > > plot(data.set$ch.Gas) > > The result is the y-axis gives the gas usage for the areas, but > the x-axis gives only the numbers of the areas, not the names of > the areas (which is preferred). > > So, two questions: > > (1) have I set up my csv file correctly to be read as a data frame > as the first row of all of the remaining columns with the values > for that political area in the corresponding row in the column > with the specific variable name? So far, looking through tutorials > and books seems to suggest yes, but at this point I'm no longer sure. > > (2) How can I access the names of the political areas when > plotting so that these are given on the x-axis instead of the numbers? > > Thanks for any help. > > Cheers > Sun > > ______________________________________________ > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.