Greetings,

 

I am trying to get all of the text from a web page as if I "selected
all" on the page, pasted into a text file, and then read in the text
file with read.csv().

 

# this is the actual page I'm trying to acquire text from:

web.pg <- readLines("http://www.airweb.org/?page=574";)

 

# then parsed in hopes of an easier structure to work with:

web.pg <- htmlTreeParse(file=web.pg, ignoreBlanks=TRUE)

 

Now I have a lovely html tree, but don't know the best way to get just
the text components (job descriptions, job titles, etc...) as they
appear on the web site. I'd like to do a little text mining and make a
wordcloud using the text. Can anybody suggest a method to achieve this
result?

 

Thank you,

 

Gary R. Moser

Institutional Research Analyst

Heald College

p <- 415.808.1533

f <- 415.808.1598

gary_mo...@heald.edu <mailto:gary_mo...@heald.edu> 

 



Disclaimer: This communication may contain Heald College confidential and 
proprietary data. This message is intended only for the personal and 
confidential use of the designated recipients named above. If you are not the 
intended recipient of this message you are hereby notified that any review, 
dissemination, distribution or copying of this message is strictly prohibited. 
In addition, if you have received this message in error, please advise the 
sender by reply email and delete the message.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to