I am not sure if below ways are better. sub(".*>(.*)<.*","\\1",thePrices)
sapply(thePrices, function(x){s=gregexpr(pattern ='\\$',x)[[1]][1];e=gregexpr(pattern ='<',x)[[1]][1];return(substr(x,s,e-1))}) On Wed, Jan 29, 2014 at 9:29 AM, Keith S Weintraub <kw1...@gmail.com> wrote: > Folks, > > I got the following prices by scraping a web page just for my own > edification: > > thePrices<- > c("id=\"p0\">$69.95</div>", "id=\"p1\">$44.95</div>", > "id=\"p2\">$69.95</div>", > "id=\"p3\">$59.95</div>", "id=\"p4\">$69.95</div>", > "id=\"p5\">$79.95</div>", > "id=\"p6\">$89.95</div>", "id=\"p7\">$59.95</div>", > "id=\"p8\">$59.95</div>", > "id=\"p9\">$79.95</div>", "id=\"p10\">$79.95</div>", > "id=\"p11\">$89.95</div>", > "id=\"p12\">$89.95</div>", "id=\"p13\">$79.95</div>", > "id=\"p14\">$89.95</div>", > "id=\"p15\">$79.95</div>", "id=\"p16\">$39.95</div>", > "id=\"p17\">$59.95</div>", > "id=\"p18\">$69.95</div>", "id=\"p19\">$83.95</div>", > "id=\"p20\">$73.95</div>", > "id=\"p21\">$83.95</div>", "id=\"p22\">$93.95</div>", > "id=\"p23\">$87.95</div>", > "id=\"p24\">$91.95</div>", "id=\"p25\">$99.95</div>", > "id=\"p26\">$61.95</div>\"" > ) > > Using lapply and strsplit (twice) unlist etc. I was able to get the result > I wanted (the prices as numbers e.g. 59.95) but I am sure that there is a > much better way that someone might be able to point out for me. > > Note that I tried various regexes which didn't work. > > Is part of the difficulty that the strings in thePrices have multiple \"'s > in them? > > Thanks for your time, > Best, > KW > > -- > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.