Greetings,
I am trying to get all of the text from a web page as if I "selected all" on the page, pasted into a text file, and then read in the text file with read.csv(). # this is the actual page I'm trying to acquire text from: web.pg <- readLines("http://www.airweb.org/?page=574") # then parsed in hopes of an easier structure to work with: web.pg <- htmlTreeParse(file=web.pg, ignoreBlanks=TRUE) Now I have a lovely html tree, but don't know the best way to get just the text components (job descriptions, job titles, etc...) as they appear on the web site. I'd like to do a little text mining and make a wordcloud using the text. Can anybody suggest a method to achieve this result? Thank you, Gary R. Moser Institutional Research Analyst Heald College p <- 415.808.1533 f <- 415.808.1598 gary_mo...@heald.edu <mailto:gary_mo...@heald.edu> Disclaimer: This communication may contain Heald College confidential and proprietary data. This message is intended only for the personal and confidential use of the designated recipients named above. If you are not the intended recipient of this message you are hereby notified that any review, dissemination, distribution or copying of this message is strictly prohibited. In addition, if you have received this message in error, please advise the sender by reply email and delete the message. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.