Re: getting a list of top page-ranked webpages

2010-09-17 Thread Dennis Gearon
) Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri, 9/17/10, Ian Upright wrote: > From: Ian Upright > Subject: Re: getting a list of top page-ranke

Re: getting a list of top page-ranked webpages

2010-09-17 Thread Ian Upright
On Thu, 16 Sep 2010 15:31:02 -0700, you wrote: >The public terabyte dataset project would be a good match for what you >need. > >http://bixolabs.com/datasets/public-terabyte-dataset-project/ > >Of course, that means we have to actually finish the crawl & finalize >the Avro format we use for th

Re: getting a list of top page-ranked webpages

2010-09-17 Thread Ian Upright
On Fri, 17 Sep 2010 04:46:44 -0700 (PDT), kenf_nc wrote: >A slightly different route to take, but one that should help test/refine a >semantic parser is wikipedia. They make available their entire corpus, or >any subset you define. The whole thing is like 14 terabytes, but you can get >smaller se

Re: getting a list of top page-ranked webpages

2010-09-17 Thread kenf_nc
A slightly different route to take, but one that should help test/refine a semantic parser is wikipedia. They make available their entire corpus, or any subset you define. The whole thing is like 14 terabytes, but you can get smaller sets. -- View this message in context: http://lucene.472066.n

Re: getting a list of top page-ranked webpages

2010-09-16 Thread Dennis Gearon
Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 9/16/10, Dennis Gearon wrote: > From: Dennis Gearon > Subject: Re: getting a list of top page-ranked webpages > To: solr-user@lucene.apache.org, i...@upright.net > Date: Thursday, September 16, 2010, 1

Re: getting a list of top page-ranked webpages

2010-09-16 Thread Dennis Gearon
There's a great web page somewhere that shows the popularity as the subway map of tokyo. And, most popular in the world, per dominant culture in each country, per religious majority, per language culture . . . Dennis Gearon Signature Warning EARTH has a Right To Life, otherw

Re: getting a list of top page-ranked webpages

2010-09-16 Thread Ken Krugler
Hi Ian, On Sep 16, 2010, at 2:44pm, Ian Upright wrote: Hi, this question is a little off topic, but I thought since so many people on this are probably experts in this field, someone may know. I'm experimenting with my own semantic-based search engine, but I want to test it with a large co