Humphrey - Any "correct" method requires you to specify _uniquely_ what you are looking for. If the bookmark keyword is necessary and unique, it appears you have a working solution. Or what else where you trying to accomplish?
Cheers, Boris On Jun 16, 2015, at 9:01 AM, Humphrey Zhao <[email protected]> wrote: > Dear Sir/Madam: > > Thank you for your attention to my question. I have downloaded the source > code of some web pages by RCurl, and I am trying to extract the URL from > them. In these web pages, there are many nodes contains the same URL, such > like the followings: > > <a href=\"http://cos.name/2015/05/the-data-wisdom-for-data-science/\" > rel=\"bookmark\"> > > <a > href=\"http://blog.shakirm.com/2015/03/a-statistical-view-of-deep-learning-ii-auto-encoders-and-free-energy/\" > target=\"_blank\"> > > <a > href=\"http://cos.name/2015/05/the-data-wisdom-for-data-science/#more-10947\" > class=\"more-link\"> > > I want to accurately choose the URL I need(the "href" in the first one), and > I tried many ways the most accuracy is just like the following: > > library(XML) > > #links<-getHTMLLinks(base.html, xpQuery = "//a/@href") > > links<-getHTMLLinks(base.html, xpQuery = c("//a/href[@rel='bookmark']")) > > However, I still believe that there is a correct method to do this very well, > but I could not find it. I wonder if you could give me some advice on solving > this problem. And I would be most grateful if you could reply at your > earliest convenience. Looking forward to hearing from you. Thank you very > much. > > Sincerely yours > > Humphrey Zhao > [[alternative HTML version deleted]] > > ______________________________________________ > [email protected] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [email protected] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

