Use XPATH query:

web.pg <- htmlTreeParse(file=web.pg, ignoreBlanks=TRUE, useInternalNodes = TRUE)

# Job title
xpathApply(web.pg, "//span[@class='normal']//b", xmlValue)

On Wed, Oct 26, 2011 at 9:36 PM, Moser, Gary <[email protected]> wrote:
> Greetings,
>
>
>
> I am trying to get all of the text from a web page as if I "selected
> all" on the page, pasted into a text file, and then read in the text
> file with read.csv().
>
>
>
> # this is the actual page I'm trying to acquire text from:
>
> web.pg <- readLines("http://www.airweb.org/?page=574";)
>
>
>
> # then parsed in hopes of an easier structure to work with:
>
> web.pg <- htmlTreeParse(file=web.pg, ignoreBlanks=TRUE)
>
>
>
> Now I have a lovely html tree, but don't know the best way to get just
> the text components (job descriptions, job titles, etc...) as they
> appear on the web site. I'd like to do a little text mining and make a
> wordcloud using the text. Can anybody suggest a method to achieve this
> result?
>
>
>
> Thank you,
>
>
>
> Gary R. Moser
>
> Institutional Research Analyst
>
> Heald College
>
> p <- 415.808.1533
>
> f <- 415.808.1598
>
> [email protected] <mailto:[email protected]>
>
>
>
>
>
> Disclaimer: This communication may contain Heald College confidential and 
> proprietary data. This message is intended only for the personal and 
> confidential use of the designated recipients named above. If you are not the 
> intended recipient of this message you are hereby notified that any review, 
> dissemination, distribution or copying of this message is strictly 
> prohibited. In addition, if you have received this message in error, please 
> advise the sender by reply email and delete the message.
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to