I am trying to read some data off the zillow site. Newbie to xml, html, parsing and the xml package. I've been able to load the web page I'm interested with the following code but I'm not sure of the next step to get the information I'm interested in into R :
library(XML) url <- "http://www.zillow.com/homes/511 W Lafayette St, Norristown, PA_rb" doc <-doc <- htmlTreeParse(url1, isURL=TRUE) doc I'd like to be able to pull the following information into R href home details string : /homedetails/236-Arundel-Ave-Horsham-PA-19044/9933810_zpid/#{scid=hdp-site-map-bubble-address} value for Zestimate \ Price: $239,000 Beds : 3 Baths: 1.0 Sqft :1630 I noticed all that information is in "doc". The section of doc where the information is contained is shown below. How do I go about extracting this information and getting it into R for the general case where the address in url will change ? LatLong.createFromDegrees(40.187567, -75.125861), "<div class=\"map-bubble property-bubble\"> <div class=\"search-result\"> <div class=\"plisting\"> <div id=\"bubble-photoex-up\" class=\"photoex hide\"> <div class=\"photoex-photos\"> </div> <div class=\"mapsViews hide\"> </div> </div> <div id=\"property-zpid\" class=\"hide\">9933810</div> <div id=\"property-home-info\"> <div id=\"pinfo-block\" class=\"property-info\"> <div class=\"adr\"> \"/homedetails/236-Arundel-Ave-Horsham-PA-19044/9933810_zpid/#{scid=hdp-site-map-bubble-address}\" 236 Arundel Ave, Horsham, PA </div> <ul class=\"value-info\"> <li class=\"type-allHomes\"> Zestimate<sup>®</sup>: $239,000 \"#\" <div id=\"zest-tip-bubble_toggleArea\" class=\"tooltip hide\"> Close <dl> <dt>Zestimate</dt> <dd> A <strong>Zestimate®</strong> home valuation is Zillow's estimated market value. It is not an appraisal. Use it as a starting point to determine a home's value. <a href=\"/wikipages/What-is-a-Zestimate/\" href=\"#\">Learn more </dd> </dl> </div> </li> <li class=\"secondary monthly-payment\"> Mortgage payment: $963/mo <ul class=\"carrot view-rates-aftertext\"> <li> \"/mortgage-rates/#{scid=mor-site-mapbubrates}\" See rates </li></ul> </li> </ul> <ul class=\"attributes\"> <li class=\"prop-cola\">Beds: 3<br /> Baths: 1.0</li> <li class=\"prop-colb\">Sqft: 1,630<br /> Lot: 21,745</li> </ul> </div> <ul class=\"has-photo actions clearfix\"> <li class=\"hinfo ztsa\"> \"/homedetails/236-Arundel-Ave-Horsham-PA-19044/9933810_zpid/#{scid=hdp-site-map-bubble-details}\" Details </li> <li class=\"mapHome ztsa\" zpid=\"9933810\"> \"#\" Views </li> <li class=\"faves ztsa\"> <a onclick=\"trackLink(this, 'Save', { 'events': 'event18', 'eVar4': 'Map Bubble' }); return favoriteManager.addFavorite(9933810, favoriteManager.doneSaving(this), event, true);\" class=\"not-saved\" rel=\"nofollow\">Save </li> </ul> </div> Close <div id=\"bubble-photoex-down\" class=\"photoex hide\"> <div class=\"photoex-photos\"> </div> <div class=\"mapsViews hide\"> </div> </div> </div> </div> <div class=\"bubble-beak\"> </div></div>" ) -- View this message in context: http://r.789695.n4.nabble.com/newbie-xml-parsing-question-tp3558067p3558067.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.