Hi, R-Help members, I'm doing some webscraping. This time i need the image (url) of the products of an ecommerce. I can get the nodes where the urls are, but when trying to extract the URL, i need to take 1 additional step:
"src" vs "data-original": in the source code, some urls are in the "src" attribute, while others in the "data-original" attribute. How to make a loop of an apply function to: if node element contains "data-original" do: ... %>% html_attr("data-original") else do: ... %>% html_attr("src") The result should be a vector with the urls. My code: 1.- I can get the nodes for the images: ########################################################## #This result in a "XMLNodeSet" object library(rvest) PCs <- html("http://www.linio.cl/computacion/pc-escritorio/") %>% html_nodes(".product-item-img") %>% html_nodes("img") ########################################################### #for the attr "data-original" PCs2 <- html("http://www.linio.cl/computacion/pc-escritorio/") %>% html_nodes(".product-item-img") %>% html_nodes("img") %>% html_attr("data-original") Gives the urls for the attr "data-original", and NAs where there isn't this attr. #for the attr "src" PCs3 <- html("http://www.linio.cl/computacion/pc-escritorio/") %>% html_nodes(".product-item-img") %>% html_nodes("img") %>% html_attr("src") Gives the content for the "src" attr. How ever, in some products the url needed is in the "data-original" attr, and not here. #### combination throwing NAs as result ##### PCs4 <- html("http://www.linio.cl/computacion/pc-escritorio/") %>% html_nodes(".product-item-img") %>% html_nodes("img") %>% html_attr("data-original|src") ################################################ I've also tried something like this: lapply(PCs, function(e) { if ("data-original" %in% i) { print("ok") } }) but get this: Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments Thanks. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.