PM
Subject: Re: Heritrix and Solr
On Thu, 22 Nov 2007 19:10:46 -0800 (PST)
Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> The answer to that question, Norberto, would depend on versions.
Otis, would that relate to what underlying version of Lucene is being
used in
On Thu, 22 Nov 2007 19:10:46 -0800 (PST)
Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> The answer to that question, Norberto, would depend on versions.
Otis, would that relate to what underlying version of Lucene is being used in
either Solr & Nutch?
_
{Beto|Norberto|Nu
warm,
try it out.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: George Everitt <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, November 22, 2007 10:58:08 PM
Subject: Re: Heritrix and Solr
Otis:
There are many reaso
t.com/ -- Lucene - Solr - Nutch
- Original Message
From: Norberto Meijome <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Cc: [EMAIL PROTECTED]
Sent: Thursday, November 22, 2007 5:54:32 PM
Subject: Re: Heritrix and Solr
On Thu, 22 Nov 2007 10:41:41 -0500
George Everitt <[E
r-user@lucene.apache.org
Cc: [EMAIL PROTECTED]
Sent: Thursday, November 22, 2007 5:54:32 PM
Subject: Re: Heritrix and Solr
On Thu, 22 Nov 2007 10:41:41 -0500
George Everitt <[EMAIL PROTECTED]> wrote:
> After a lot of googling, I came across Heritrix, which seems to be
the
> most robust well sup
On Thu, 22 Nov 2007 10:41:41 -0500
George Everitt <[EMAIL PROTECTED]> wrote:
> After a lot of googling, I came across Heritrix, which seems to be the
> most robust well supported open source crawler out there. Heritrix
> has an integration with Nutch (NutchWax), but not with Solr. I'm
>
I have some sort of same requirement where I need to move to a good crawler.
Currently I am using a custom crawler, I mean my own crawler to crawl some
public domains and uses Lucene to index all downloaded pages. After doing lots
of research I came across JSpider with Lucene.
ALso I was looki
I am interested in this too. any ideas?
A. Banji Oyebisi
Choicegen, LLC.
Email: [EMAIL PROTECTED]
Web URL: http://www.choicegen.com
Choicegen... Helping you make better choices!
Notice: This email message, together with any attachments, may contain information of Choicegen, LLC.,
I'm looking for a web crawler to use with Solr. The objective is to
crawl about a dozen public web sites regarding a specific topic.
After a lot of googling, I came across Heritrix, which seems to be the
most robust well supported open source crawler out there. Heritrix
has an integratio