Hi Simon, The function below should do it or at least get you started...
getPlotData <- function (datalist, response, times) { qdata <- sapply(datalist[times], function(df) { irow <- grepl(response, df$Response) df[irow, 2:5] } ) # qdata is a matrix with rows Q1:Q4 and cols for times; # we turn it into a two col matrix with col 1 = time index # and col 2 = value time.index <- seq(4 * ncol(qdata)) out <- cbind(time.index, as.numeric(qdata)) rownames(out) <- paste(time.index, rownames(qdata), sep=".") colnames(out) <- c("time", response) out } #Example, get data for times 10:15 where Response contains "Economy" x <- getPlotData(r, "Economy", 10:15) Michael On 11 October 2010 03:35, Simon Kiss <sjk...@gmail.com> wrote: > Hello all, > > I changed the subject line of the e-mail, because the question I''m posing > now is different than the first one. I hope that this is proper etiquette. > However, the original chain is included below. > > I've incorporated bits of both Ethan and Brian's code into the script below, > but there's one aspect I can't get my head around. I'm totally new to > programming with control structures. The reproducible code below creates a > list containing 19 data frames, one each for the "Most Important Problem" > survey data for Canada. > > What I'd like at this stage is a loop where I can search through all the data > frames for rows containing the search term and then bind the rows together in > a plotable (sp?) format. > > At the bottom of the code below, you'll find my first attempt to make use of > a search string and to put it into a plotable format. It only partially > works. I can only get the numbers for one year, where I'd like to be able to > get a string of numbers for several years.But, on the upside, grep appears to > do the trick in terms of selecting rows. > > Can any one suggest a solution? > Yours truly, > Simon Kiss > > #This is the reproducible code to set-up all the data frames > require("XML") > library(XML) > #This gets the data from the web and lists them > mylist <- paste ("http://www.queensu.ca/cora/_trends/mip_", > c(1987:2001,2003:2006), ".htm", sep="") > alltables <- lapply(mylist, readHTMLTable) > > #convert to dataframes > r<-lapply(alltables, function(x) {as.data.frame(x)} ) > > #This is just some house-cleaning; structuring all the tables so they are > uniform > r[[1]][3]<-r[[1]][2] > r[[1]][2]<-c(" ") > r[[2]][4]<-r[[2]][2] > r[[2]][5]<-r[[2]][3] > r[[2]][2:3]<-c(" ") > r[[3]][4:5]<-r[[3]][3:4] > r[[3]][3]<-c(" ") > > #This loop deletes some superfluous columns and rows, turns the first column > in to character strings and the data into numeric > for (i in 1:19) { > n.rows<-dim(r[[i]])[1] > r[[i]] <- r[[i]][15:n.rows-3, 1:5] > n.rows<-dim(r[[i]])[1] > row.names(r[[i]]) <-NULL > names(r[[i]]) <- c("Response", "Q1", "Q2", "Q3", "Q4") > > r[[i]][, 1]<-as.character(r[[i]][,1]) > #r[[i]][,2:5]<-as.numeric(as.character(r[[i]][,2:5])) > r[[i]][, 2:5]<-lapply(r[[i]][, 2:5], function(x) > {as.numeric(as.character(x))}) > #n.rows<-dim(r[[i]])[1] > #r[[i]]<-r[[i]][9 > } > > #This code is my first attempt at introducing a search string, getting the > rows, binding and plotting; > economy<-r[[10]][grep('Economy', r[[10]][,1]),] > economy_2<-r[[11]][grep('Economy', r[[11]][,1]),] > test<-cbind(economy, economy_2) > plot(as.numeric(test), type='l') > > #here's another attempt I'm trying.... > economy<-data.frame > for (i in 15:19) { > economy[i,] <-r[[i]][grep('Economy', r[[i]][,1]), ] > } > > Begin forwarded message: > >> From: Simon Kiss <sjk...@gmail.com> >> Date: October 7, 2010 4:59:46 PM EDT >> To: Simon Kiss <simonjk...@yahoo.ca> >> Subject: Fwd: [R] Converting scraped data >> >> >> >> Begin forwarded message: >> >>> From: Ethan Brown <ethancbr...@gmail.com> >>> Date: October 6, 2010 4:22:41 PM GMT-04:00 >>> To: Simon Kiss <sjk...@gmail.com> >>> Cc: r-help@r-project.org >>> Subject: Re: [R] Converting scraped data >>> >>> Hi Simon, >>> >>> You'll notice the "test" data.frame has a whole mix of characters in >>> the columns you're interested, including a "-" for missing values, and >>> that the columns you're interested in are in fact factors. >>> >>> as.numeric(factor) returns the level of the factor, not the value of >>> the level. (See ?levels and ?factor)--that's why it's giving you those >>> irrelevant integers. I always end up using something like this handy >>> code snippet to deal with the situation: >>> >>> unfactor <- function(factors) >>> # From http://psychlab2.ucr.edu/rwiki/index.php/R_Code_Snippets#unfactor >>> # Transform a factor back into its factor names >>> { >>> return(levels(factors)[factors]) >>> } >>> >>> Then, to get your data to where you want it, I'd do this: >>> >>> require(XML) >>> theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm" >>> tables <- readHTMLTable(theurl) >>> n.rows <- unlist(lapply(tables, function(t) dim(t)[1])) >>> class(tables) >>> test<-data.frame(tables, stringsAsFactors=FALSE) >>> >>> >>> result <- test[11:42, 1:5] #Extract the actual data we want >>> names(result) <- c("Response", "Q1", "Q2","Q3","Q4") >>> for(i in 2:5) { >>> # Convert columns to factors >>> result[,i] <- as.numeric(unfactor(result[,i])) >>> } >>> result >>> >>> From here you should be able to plot or do whatever else you want. >>> >>> Hope this helps, >>> Ethan Brown >>> >>> >>> On Wed, Oct 6, 2010 at 9:52 AM, Simon Kiss <sjk...@gmail.com> wrote: >>>> Dear Colleagues, >>>> I used this code to scrape data from the URL conatined within. This code >>>> should be reproducible. >>>> >>>> require("XML") >>>> library(XML) >>>> theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm" >>>> tables <- readHTMLTable(theurl) >>>> n.rows <- unlist(lapply(tables, function(t) dim(t)[1])) >>>> class(tables) >>>> test<-data.frame(tables, stringsAsFactors=FALSE) >>>> test[16,c(2:5)] >>>> as.numeric(test[16,c(2:5)]) >>>> quartz() >>>> plot(c(1:4), test[15, c(2:5)]) >>>> >>>> calling the values from the row of interest using test[16, c(2:5)] can >>>> bring >>>> them up as represented on the screen, plotting them or coercing them to >>>> numeric changes the values and in a way that doesn't make sense to me. My >>>> intuitino is that there is something going on with the way the characters >>>> are coded or classed when they're scraped into R. I've looked around the >>>> help files for converting from character to numeric but can't find a >>>> solution. >>>> >>>> I also tried this: >>>> >>>> as.numeric(as.character(test[16,c(2:5)] and that also changed the values >>>> from what they originally were. >>>> >>>> I'm grateful for any suggestions. >>>> Yours, Simon Kiss >>>> >>>> >>>> >>>> ********************************* >>>> Simon J. Kiss, PhD >>>> Assistant Professor, Wilfrid Laurier University >>>> 73 George Street >>>> Brantford, Ontario, Canada >>>> N3T 2C9 >>>> Cell: +1 519 761 7606 >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >> >> ********************************* >> Simon J. Kiss, PhD >> Assistant Professor, Wilfrid Laurier University >> 73 George Street >> Brantford, Ontario, Canada >> N3T 2C9 >> Cell: +1 519 761 7606 >> >> >> >> >> >> >> >> >> >> > > ********************************* > Simon J. Kiss, PhD > Assistant Professor, Wilfrid Laurier University > 73 George Street > Brantford, Ontario, Canada > N3T 2C9 > Cell: +1 519 761 7606 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.