Here is another approach:
> thePrices<- + c("id=\"p0\">$69.95</div>", "id=\"p1\">$44.95</div>", "id=\"p2\">$69.95</div>", + "id=\"p3\">$59.95</div>", "id=\"p4\">$69.95</div>", "id=\"p5\">$79.95</div>", + "id=\"p6\">$89.95</div>", "id=\"p7\">$59.95</div>", "id=\"p8\">$59.95</div>", + "id=\"p9\">$79.95</div>", "id=\"p10\">$79.95</div>", "id=\"p11\">$89.95</div>", + "id=\"p12\">$89.95</div>", "id=\"p13\">$79.95</div>", "id=\"p14\">$89.95</div>", + "id=\"p15\">$79.95</div>", "id=\"p16\">$39.95</div>", "id=\"p17\">$59.95</div>", + "id=\"p18\">$69.95</div>", "id=\"p19\">$83.95</div>", "id=\"p20\">$73.95</div>", + "id=\"p21\">$83.95</div>", "id=\"p22\">$93.95</div>", "id=\"p23\">$87.95</div>", + "id=\"p24\">$91.95</div>", "id=\"p25\">$99.95</div>", "id=\"p26\">$61.95</div>\"" + ) > require(gsubfn) > as.numeric(gsubfn(".*>.([0-9.]+).*", "\\1", thePrices)) [1] 69.95 44.95 69.95 59.95 69.95 79.95 89.95 59.95 59.95 79.95 79.95 89.95 89.95 [14] 79.95 89.95 79.95 39.95 59.95 69.95 83.95 73.95 83.95 93.95 87.95 91.95 99.95 [27] 61.95 > Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Wed, Jan 29, 2014 at 9:29 AM, Keith S Weintraub <kw1...@gmail.com> wrote: > Folks, > > I got the following prices by scraping a web page just for my own edification: > > thePrices<- > c("id=\"p0\">$69.95</div>", "id=\"p1\">$44.95</div>", > "id=\"p2\">$69.95</div>", > "id=\"p3\">$59.95</div>", "id=\"p4\">$69.95</div>", "id=\"p5\">$79.95</div>", > "id=\"p6\">$89.95</div>", "id=\"p7\">$59.95</div>", "id=\"p8\">$59.95</div>", > "id=\"p9\">$79.95</div>", "id=\"p10\">$79.95</div>", > "id=\"p11\">$89.95</div>", > "id=\"p12\">$89.95</div>", "id=\"p13\">$79.95</div>", > "id=\"p14\">$89.95</div>", > "id=\"p15\">$79.95</div>", "id=\"p16\">$39.95</div>", > "id=\"p17\">$59.95</div>", > "id=\"p18\">$69.95</div>", "id=\"p19\">$83.95</div>", > "id=\"p20\">$73.95</div>", > "id=\"p21\">$83.95</div>", "id=\"p22\">$93.95</div>", > "id=\"p23\">$87.95</div>", > "id=\"p24\">$91.95</div>", "id=\"p25\">$99.95</div>", > "id=\"p26\">$61.95</div>\"" > ) > > Using lapply and strsplit (twice) unlist etc. I was able to get the result I > wanted (the prices as numbers e.g. 59.95) but I am sure that there is a much > better way that someone might be able to point out for me. > > Note that I tried various regexes which didn't work. > > Is part of the difficulty that the strings in thePrices have multiple \"'s in > them? > > Thanks for your time, > Best, > KW > > -- > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.