)
Dennis Gearon
Signature Warning
EARTH has a Right To Life,
otherwise we all die.
Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--- On Fri, 9/17/10, Ian Upright wrote:
> From: Ian Upright
> Subject: Re: getting a list of top page-ranke
On Thu, 16 Sep 2010 15:31:02 -0700, you wrote:
>The public terabyte dataset project would be a good match for what you
>need.
>
>http://bixolabs.com/datasets/public-terabyte-dataset-project/
>
>Of course, that means we have to actually finish the crawl & finalize
>the Avro format we use for th
On Fri, 17 Sep 2010 04:46:44 -0700 (PDT), kenf_nc
wrote:
>A slightly different route to take, but one that should help test/refine a
>semantic parser is wikipedia. They make available their entire corpus, or
>any subset you define. The whole thing is like 14 terabytes, but you can get
>smaller se
.n3.nabble.com/getting-a-list-of-top-page-ranked-webpages-tp1515311p1516649.html
Sent from the Solr - User mailing list archive at Nabble.com.
Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--- On Thu, 9/16/10, Dennis Gearon wrote:
> From: Dennis Gearon
> Subject: Re: getting a list of top page-ranked webpages
> To: solr-user@lucene.apache.org, i...@upright.net
> Date: Thursday, September 16, 2010, 1
Life,
otherwise we all die.
Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php
--- On Thu, 9/16/10, Ian Upright wrote:
> From: Ian Upright
> Subject: getting a list of top page-ranked webpages
> To: solr-user@lucene.apache.org
> Date: Thursday, September 1
Hi Ian,
On Sep 16, 2010, at 2:44pm, Ian Upright wrote:
Hi, this question is a little off topic, but I thought since so many
people
on this are probably experts in this field, someone may know.
I'm experimenting with my own semantic-based search engine, but I
want to
test it with a large co
Hi, this question is a little off topic, but I thought since so many people
on this are probably experts in this field, someone may know.
I'm experimenting with my own semantic-based search engine, but I want to
test it with a large corpus of web pages. Ideally I would like to have a
list of the