Hi George,
Thank you for your kind words about Lucene in Action. :)

I wouldn't compare Solr and Nutch, they are really made for different things.  
I was suggesting Nutch instead of Heritrix, not instead of Solr.  The 
Solr+Nutch patch is in JIRA and there is a fresh patch in there....still warm, 
try it out.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: George Everitt <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, November 22, 2007 10:58:08 PM
Subject: Re: Heritrix and Solr

Otis:

There are many reasons I prefer Solr to Nutch:

1. I actually tried to do some of the crawling with Nutch, but found  
the crawling options less flexible than I would have liked.
2. I prefer the Solr approach in general.  I have a long background in
  
Verity and Autonomy search, and Solr is a bit closer to them than
 Nutch.
3. I really like the schema support in Solr.
4. I really really like the facets/parametric search in Solr.
5. I really really really like the REST interface in Solr.
6. Finally, and not to put too fine a point on it, hadoop frightens  
the bejeebers out of me.  I've skimmed some of the papers and it looks
  
like a lot of study before I will fully understand it.  I'm not saying
  
I'm stupid and lazy, but if the map-reduce algorithm fits, I'll wear  
it.  Plus, I'm trying to get a mental handle on Jeff Hawkins' HTM and  
it's application to the real world.   It all makes my cerebral cortex  
itchy.

Thanks for the suggestion, though.   I'll probably revisit Nutch again
  
if Heritrix lets me down.  I had no luck getting the Nutch crawler  
Solr patch to work, either.   Sadly, I'm the David Lee Roth of Java  
programmers - I may think that I"m hard-core, but I'm not, really. And
  
my groupies are getting a bit saggy.

BTW - add my voice to the paeans of praise for Lucene in Action.   You
  
and Erik did a bang up job, and I surely appreciate all the feedback  
you give on this forum, Especially over the past few months as I feel  
my way through Solr and Lucene.



On Nov 22, 2007, at 10:10 PM, Otis Gospodnetic wrote:

> The answer to that question, Norberto, would depend on versions.
>
> George: why not just use straight Nutch and forget about Heritrix?
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> ----- Original Message ----
> From: Norberto Meijome <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Cc: [EMAIL PROTECTED]
> Sent: Thursday, November 22, 2007 5:54:32 PM
> Subject: Re: Heritrix and Solr
>
> On Thu, 22 Nov 2007 10:41:41 -0500
> George Everitt <[EMAIL PROTECTED]> wrote:
>
>> After a lot of googling, I came across Heritrix, which seems to be
> the
>> most robust well supported open source crawler out there.   Heritrix
>
>> has an integration with Nutch (NutchWax), but not with Solr.   I'm
>> wondering if anybody can share any experience using Heritrix with
> Solr.
>
> out on a limb here... both Nutch and SOLR use Lucene for the actual
> indexing / searching. Would the indexes generated with Nutch be  
> compatible
> / readable with SOLR?
>
> _________________________
> {Beto|Norberto|Numard} Meijome
>
> "Why do you sit there looking like an envelope without any address on
> it?"
>  Mark Twain
>
> I speak for myself, not my employer. Contents may be hot. Slippery  
> when
> wet. Reading disclaimers makes you go blind. Writing them is worse.
> You have been Warned.
>
>
>
>




Reply via email to