I am trying to extract information “OS Vendor” and “OS Name” from the following text file online.
http://spec.org/jEnterprise2010/results/res2013q3/jEnterprise2010-20130904-00045.txt My goal is to extract these two attributes from all the text files available from this link given below and put it in a dataframe as follows. OS Vendor OS Name Oracle Corporation Oracle Solaris 11.1 64-bit SRU 10.5"k Text files link : https://www.spec.org/jEnterprise2010/results/jEnterprise2010.html I got a list of all text files from the HTML page. I am trying to write a function that can pick one link at a time from getlinks and extract the attributes and then put it in a dataframe. I do not know how to read the files from getlinks object that contains the links. I tried converting getlinks to a dataframe via as.data.frame(getlinks) but that got rid of the quotes that I need in order to read them one by one. Also once I get the attributes how do I put them side by side in the dataframe format. ###code######### install.packages(c("RCurl","XML")) library(bitops) library(RCurl) library(XML) webpage = htmlParse("http://spec.org/jEnterprise2010/results/jEnterprise2010.html",error=function(...){}, useInternalNodes = TRUE) links<- xpathSApply(webpage,"//a/@href") getlinks<-links[grep(".txt",links)] ######### function to read all text files and extract attributes########## readfiles=function(x) { a<-readLines(x) sm <- "Java EE AppServer & Database Server HW (SUT hardware)" s<-grep(sm, a, fixed=TRUE) e<-grep("^\\S", a[-(1:s)])[1] grep("OS Vendor", a[(s+1):(s+e-1)], fixed=T, value=T)[1] grep("OS Name", a[(s+1):(s+e-1)], fixed=T, value=T)[1] } ######### For single file was able to extract the attributes ######### txt1<-readLines("http://spec.org/jEnterprise2010/results/res2013q3/jEnterprise2010-20130904-00045.txt") #Get the OS Vendor and OS Name sm<- "Java EE AppServer & Database Server HW (SUT hardware)" s<-grep(sm,txt1, fixed=TRUE) e<-grep("^\\S",txt1[-(1:s)])[1] grep("OS Vendor", txt1[(s+1):(s+e-1)], fixed=T, value=T)[1] grep("OS Name", txt1[(s+1):(s+e-1)], fixed=T, value=T)[1] Will appreciate any help ! Thanks. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.