Hi,

The R4X package can help you. (I have wrapped your td's into one tr)

> x <- xml( "<tr><td><a href='2005-01.html'>2005-01</a></td><td><a
+ href='2006-01.html'>2006-01</a></td><td><a
+ href='2007-01.html'>2007-01</a></td><td><a
+ href='2008-01.html'>2008-01</a></td><td><a
+ href='2009-01.html'>2009-01</a></td></tr>" )

> x["td/a/#"]
       td        td        td        td        td
"2005-01" "2006-01" "2007-01" "2008-01" "2009-01"
> x["td/a/@href"]
            td             td             td             td             td
"2005-01.html" "2006-01.html" "2007-01.html" "2008-01.html" "2009-01.html"

Romain

On 09/23/2009 02:29 PM, Rene wrote:

Dear All,

Can someone please guide me how to get the certain part from a long html
language?

e.g.



"<td><a href='2005-01.html'>2005-01</a></td><td><a
href='2006-01.html'>2006-01</a></td><td><a
href='2007-01.html'>2007-01</a></td><td><a
href='2008-01.html'>2008-01</a></td><td><a
href='2009-01.html'>2009-01</a></td>"



How to get only the wording of  "2005-01.html", "2006-01.html",
"2007-01.html"," 2008-01.html"," 2009-01.html" from the above html code? I
have tried to use gsub function, but not working.



Please guide me on this.



Thanks a lot.

Rene.

--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/ztCu : RGG #158:161: examples of package IDPmisc
|- http://tr.im/yw8E : New R package : sos
`- http://tr.im/y8y0 : search the graph gallery from R

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to