Re: getting a list of top page-ranked webpages

2010-09-17 Thread Dennis Gearon
) Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri, 9/17/10, Ian Upright wrote: > From: Ian Upright > Subject: Re: getting a list of top page-ranke

Re: getting a list of top page-ranked webpages

2010-09-17 Thread Ian Upright
On Thu, 16 Sep 2010 15:31:02 -0700, you wrote: >The public terabyte dataset project would be a good match for what you >need. > >http://bixolabs.com/datasets/public-terabyte-dataset-project/ > >Of course, that means we have to actually finish the crawl & finalize >the Avro format we use for th

Re: getting a list of top page-ranked webpages

2010-09-17 Thread Ian Upright
On Fri, 17 Sep 2010 04:46:44 -0700 (PDT), kenf_nc wrote: >A slightly different route to take, but one that should help test/refine a >semantic parser is wikipedia. They make available their entire corpus, or >any subset you define. The whole thing is like 14 terabytes, but you can get >smaller se

Re: getting a list of top page-ranked webpages

2010-09-17 Thread kenf_nc
.n3.nabble.com/getting-a-list-of-top-page-ranked-webpages-tp1515311p1516649.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: getting a list of top page-ranked webpages

2010-09-16 Thread Dennis Gearon
Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 9/16/10, Dennis Gearon wrote: > From: Dennis Gearon > Subject: Re: getting a list of top page-ranked webpages > To: solr-user@lucene.apache.org, i...@upright.net > Date: Thursday, September 16, 2010, 1

Re: getting a list of top page-ranked webpages

2010-09-16 Thread Dennis Gearon
Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 9/16/10, Ian Upright wrote: > From: Ian Upright > Subject: getting a list of top page-ranked webpages > To: solr-user@lucene.apache.org > Date: Thursday, September 1

Re: getting a list of top page-ranked webpages

2010-09-16 Thread Ken Krugler
Hi Ian, On Sep 16, 2010, at 2:44pm, Ian Upright wrote: Hi, this question is a little off topic, but I thought since so many people on this are probably experts in this field, someone may know. I'm experimenting with my own semantic-based search engine, but I want to test it with a large co

getting a list of top page-ranked webpages

2010-09-16 Thread Ian Upright
Hi, this question is a little off topic, but I thought since so many people on this are probably experts in this field, someone may know. I'm experimenting with my own semantic-based search engine, but I want to test it with a large corpus of web pages. Ideally I would like to have a list of the