Hi, this question is a little off topic, but I thought since so many people
on this are probably experts in this field, someone may know.

I'm experimenting with my own semantic-based search engine, but I want to
test it with a large corpus of web pages.  Ideally I would like to have a
list of the top 10M or top 100M page-ranked URL's in the world.

Short of using Nutch to crawl the entire web and build this page-rank, is
there any other ways?  What other ways or resources might be available for
me to get this (smaller) corpus of top webpages?

Thanks, Ian

Reply via email to