Hi, The R4X package can help you. (I have wrapped your td's into one tr)
> x <- xml( "<tr><td><a href='2005-01.html'>2005-01</a></td><td><a + href='2006-01.html'>2006-01</a></td><td><a + href='2007-01.html'>2007-01</a></td><td><a + href='2008-01.html'>2008-01</a></td><td><a + href='2009-01.html'>2009-01</a></td></tr>" ) > x["td/a/#"] td td td td td "2005-01" "2006-01" "2007-01" "2008-01" "2009-01" > x["td/a/@href"] td td td td td "2005-01.html" "2006-01.html" "2007-01.html" "2008-01.html" "2009-01.html" Romain On 09/23/2009 02:29 PM, Rene wrote:
Dear All, Can someone please guide me how to get the certain part from a long html language? e.g. "<td><a href='2005-01.html'>2005-01</a></td><td><a href='2006-01.html'>2006-01</a></td><td><a href='2007-01.html'>2007-01</a></td><td><a href='2008-01.html'>2008-01</a></td><td><a href='2009-01.html'>2009-01</a></td>" How to get only the wording of "2005-01.html", "2006-01.html", "2007-01.html"," 2008-01.html"," 2009-01.html" from the above html code? I have tried to use gsub function, but not working. Please guide me on this. Thanks a lot. Rene.
-- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://tr.im/ztCu : RGG #158:161: examples of package IDPmisc |- http://tr.im/yw8E : New R package : sos `- http://tr.im/y8y0 : search the graph gallery from R ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.