Re: getting a list of top page-ranked webpages

Dennis Gearon Fri, 17 Sep 2010 15:43:05 -0700

That's pretty good stuff to know, thanks everybody.

For my application, it's pretty hard to do crawling and universally assign 
desired fields from the text returned.


However, I would WELCOME someone with that expertise into the company when it 
gets funded, to prove me wrong :-)


Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Fri, 9/17/10, Ian Upright <i...@upright.net> wrote:

> From: Ian Upright <i...@upright.net>
> Subject: Re: getting a list of top page-ranked webpages
> To: solr-user@lucene.apache.org
> Date: Friday, September 17, 2010, 10:50 AM
> On Fri, 17 Sep 2010 04:46:44 -0700
> (PDT), kenf_nc
> <ken.fos...@realestate.com>
> wrote:
> 
> >A slightly different route to take, but one that should
> help test/refine a
> >semantic parser is wikipedia. They make available their
> entire corpus, or
> >any subset you define. The whole thing is like 14
> terabytes, but you can get
> >smaller sets. 
> 
> Actually, I do heavy analysis of the entire wikipedia, plus
> 1m top webpages
> from Alexa, and all of dmoz url's, in order to build the
> semantic engine in
> the first place.  However, an outside corpus is
> required to test it's
> quality outside of this space.
> 
> Cheers, Ian
>

Re: getting a list of top page-ranked webpages

Reply via email to