Re: [R] Scrap java scripts and styles from an html document

2011-04-07 Thread Mike Marchywka
> Date: Thu, 7 Apr 2011 04:15:50 -0700 > From: antuj...@gmail.com > To: r-help@r-project.org > Subject: Re: [R] Scrap java scripts and styles from an html document > > Hi , > > I am working on developing a web crawler. Co

Re: [R] Scrap java scripts and styles from an html document

2011-04-07 Thread antujsrv
Hi , I am working on developing a web crawler. Removing javascripts and styles is a part of the cleaning of the html document. What I want is a cleaned html document with only the html tags and textual information, so that i can figure out the pattern of the web page. This is being done to extra

Re: [R] Scrap java scripts and styles from an html document

2011-03-29 Thread Duncan Temple Lang
On 3/28/11 11:38 PM, antujsrv wrote: > Hi, > > I am working on developing a web crawler in R and I needed some help with > regard to removal of javascripts and style sheets from the html document of > a web page. > > i tried using the xml package, hence the function xpathApply > library(XML) >

[R] Scrap java scripts and styles from an html document

2011-03-29 Thread antujsrv
Hi, I am working on developing a web crawler in R and I needed some help with regard to removal of javascripts and style sheets from the html document of a web page. i tried using the xml package, hence the function xpathApply library(XML) txt = xpathApply(html,"//body//text()[not(ancestor::scrip