Use XPATH query: web.pg <- htmlTreeParse(file=web.pg, ignoreBlanks=TRUE, useInternalNodes = TRUE)
# Job title xpathApply(web.pg, "//span[@class='normal']//b", xmlValue) On Wed, Oct 26, 2011 at 9:36 PM, Moser, Gary <[email protected]> wrote: > Greetings, > > > > I am trying to get all of the text from a web page as if I "selected > all" on the page, pasted into a text file, and then read in the text > file with read.csv(). > > > > # this is the actual page I'm trying to acquire text from: > > web.pg <- readLines("http://www.airweb.org/?page=574") > > > > # then parsed in hopes of an easier structure to work with: > > web.pg <- htmlTreeParse(file=web.pg, ignoreBlanks=TRUE) > > > > Now I have a lovely html tree, but don't know the best way to get just > the text components (job descriptions, job titles, etc...) as they > appear on the web site. I'd like to do a little text mining and make a > wordcloud using the text. Can anybody suggest a method to achieve this > result? > > > > Thank you, > > > > Gary R. Moser > > Institutional Research Analyst > > Heald College > > p <- 415.808.1533 > > f <- 415.808.1598 > > [email protected] <mailto:[email protected]> > > > > > > Disclaimer: This communication may contain Heald College confidential and > proprietary data. This message is intended only for the personal and > confidential use of the designated recipients named above. If you are not the > intended recipient of this message you are hereby notified that any review, > dissemination, distribution or copying of this message is strictly > prohibited. In addition, if you have received this message in error, please > advise the sender by reply email and delete the message. > > > [[alternative HTML version deleted]] > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

