Here is another approach:

> thePrices<-
+     c("id=\"p0\">$69.95</div>", "id=\"p1\">$44.95</div>",
"id=\"p2\">$69.95</div>",
+       "id=\"p3\">$59.95</div>", "id=\"p4\">$69.95</div>",
"id=\"p5\">$79.95</div>",
+       "id=\"p6\">$89.95</div>", "id=\"p7\">$59.95</div>",
"id=\"p8\">$59.95</div>",
+       "id=\"p9\">$79.95</div>", "id=\"p10\">$79.95</div>",
"id=\"p11\">$89.95</div>",
+       "id=\"p12\">$89.95</div>", "id=\"p13\">$79.95</div>",
"id=\"p14\">$89.95</div>",
+       "id=\"p15\">$79.95</div>", "id=\"p16\">$39.95</div>",
"id=\"p17\">$59.95</div>",
+       "id=\"p18\">$69.95</div>", "id=\"p19\">$83.95</div>",
"id=\"p20\">$73.95</div>",
+       "id=\"p21\">$83.95</div>", "id=\"p22\">$93.95</div>",
"id=\"p23\">$87.95</div>",
+       "id=\"p24\">$91.95</div>", "id=\"p25\">$99.95</div>",
"id=\"p26\">$61.95</div>\""
+     )
> require(gsubfn)
> as.numeric(gsubfn(".*>.([0-9.]+).*", "\\1", thePrices))
 [1] 69.95 44.95 69.95 59.95 69.95 79.95 89.95 59.95 59.95 79.95 79.95
89.95 89.95
[14] 79.95 89.95 79.95 39.95 59.95 69.95 83.95 73.95 83.95 93.95 87.95
91.95 99.95
[27] 61.95
>

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Wed, Jan 29, 2014 at 9:29 AM, Keith S Weintraub <kw1...@gmail.com> wrote:
> Folks,
>
> I got the following prices by scraping a web page just for my own edification:
>
> thePrices<-
> c("id=\"p0\">$69.95</div>", "id=\"p1\">$44.95</div>", 
> "id=\"p2\">$69.95</div>",
> "id=\"p3\">$59.95</div>", "id=\"p4\">$69.95</div>", "id=\"p5\">$79.95</div>",
> "id=\"p6\">$89.95</div>", "id=\"p7\">$59.95</div>", "id=\"p8\">$59.95</div>",
> "id=\"p9\">$79.95</div>", "id=\"p10\">$79.95</div>", 
> "id=\"p11\">$89.95</div>",
> "id=\"p12\">$89.95</div>", "id=\"p13\">$79.95</div>", 
> "id=\"p14\">$89.95</div>",
> "id=\"p15\">$79.95</div>", "id=\"p16\">$39.95</div>", 
> "id=\"p17\">$59.95</div>",
> "id=\"p18\">$69.95</div>", "id=\"p19\">$83.95</div>", 
> "id=\"p20\">$73.95</div>",
> "id=\"p21\">$83.95</div>", "id=\"p22\">$93.95</div>", 
> "id=\"p23\">$87.95</div>",
> "id=\"p24\">$91.95</div>", "id=\"p25\">$99.95</div>", 
> "id=\"p26\">$61.95</div>\""
> )
>
> Using lapply and strsplit (twice) unlist etc. I was able to get the result I 
> wanted (the prices as numbers e.g. 59.95)  but I am sure that there is a much 
> better way that someone might be able to point out for me.
>
> Note that I tried various regexes which didn't work.
>
> Is part of the difficulty that the strings in thePrices have multiple \"'s in 
> them?
>
> Thanks for your time,
> Best,
> KW
>
> --
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to