[R] Web scraping - Having trouble figuring out how to approach this problem

henrique monte Thu, 23 Feb 2017 06:21:57 -0800

Sometimes I need to get some data from the web organizing it into a
dataframe and waste a lot of time doing it manually. I've been trying to
figure out how to optimize this proccess, and I've tried with some R
scraping approaches, but couldn't get to do it right and I thought there
could be an easier way to do this, can anyone help me out with this?


Fictional example:

Here's a webpage with countries listed by continents:
https://simple.wikipedia.org/wiki/List_of_countries_by_continents

Each country name is also a link that leads to another webpage (specific of
each country, e.g. https://simple.wikipedia.org/wiki/Angola).

I would like as a final result to get a data frame with number of
observations (rows) = number of countries listed and 4 variables (colums)
as ID=Country Name, Continent=Continent it belongs to, Language=Official
language (from the specific webpage of the Countries) and Population = most
recent population count (from the specific webpage of the Countries).

...

The main issue I'm trying to figure out is handling several webpages, like,
would it be possible to scrape from the first link of the problem the
countries as a list with the links of the countries webpages and then
create and run a function to run a scraping command in each of those links
from the list to get the specific data I'm looking for?

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Web scraping - Having trouble figuring out how to approach this problem

Reply via email to