If you only need to grab text it can be conveniently done with lynx.  This
example is for Windows but its nearly the same on other platforms:

> out <- shell("lynx.bat --dump --nolist http://www.google.com";, intern =
TRUE)
> head(out)
[1] ""
[2] "   Web Images Videos Maps News Books Gmail more ยป"
[3] "   iGoogle | Search settings | Sign in"
[4] "   "
[5] "                                   Google"
[6] "                                      "

On Thu, Dec 3, 2009 at 5:29 PM, Michael Conklin <
michael.conk...@markettools.com> wrote:

> I would like to be able to submit a list of URLs of various webpages and
> extract the "content" i.e. not the mark-up of those pages. I can find plenty
> of examples in the XML library of extracting links from pages but I cannot
> seem to find a way to extract the text.  Any help would be greatly
> appreciated - I will not know the structure of the URLs I would submit in
> advance.  Any suggestions on where to look would be greatly appreciated.
>
> Mike
>
> W. Michael Conklin
> Chief Methodologist
>
> MarketTools, Inc. | www.markettools.com<http://www.markettools.com>
> 6465 Wayzata Blvd | Suite 170 |  St. Louis Park, MN 55426.  PHONE:
> 952.417.4719 | CELL: 612.201.8978
> This email and attachment(s) may contain confidential and/or proprietary
> information and is intended only for the intended addressee(s) or its
> authorized agent(s). Any disclosure, printing, copying or use of such
> information is strictly prohibited. If this email and/or attachment(s) were
> received in error, please immediately notify the sender and delete all
> copies
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to