Mitch,

If you use Nutch+Solr then you wouldn't *index* the fetched content with Nutch.
Solr doesn't know anything about OPIC, but I suppose you can feed the OPIC 
score computed by Nutch into a Solr field and use it during scoring, if you 
want, say with a function query.

Yes, ES has built-in support for sharding and replication.  It also makes it 
easy to implement custom scoring, which may work for OPIC here.


Yes, ask questions here. :)

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: MitchK <mitc...@web.de>
> To: solr-user@lucene.apache.org
> Sent: Thu, June 17, 2010 1:52:32 AM
> Subject: RE: Re: Re: Solr and Nutch/Droids - to use or not to use?
> 
> 
Good morning!

Great feedback from you all. This really helped a lot 
> to get an impression
of what is possible and what is not.

What is 
> interesting to me are some detail questions.

Let's assume Solr is 
> possible to work on his own with distributed indexing,
so that the client 
> does not need to know anything about shards etc.

What is interesting to 
> me is:
I. 
The scoring - Nutch uses special Scoring-implementations like 
> the
OPIC-algorithm. Can Solr use such improvements or do I need to 
> reimplement
it for Solr?

II. 
The indexing.
At the moment it 
> really sounds like nutch would index the whole stuff and
afterwards Solr does 
> the job again.
Regarding to indexing it would make sense, if Nutch computes 
> things like the
document boost (I am not sure, but I think the results of the 
> OPIC-algorithm
were added to each document as a boost) and sends an 
> indexing-request to
Solr afterwards.
However, if Nutch indexes the page's 
> content and Solr does it, too - I would
waste some time, no?
Is this the 
> case or do I missunderstood something here?

III.
I am no 
> Java-Expert.
However, in a few month I will start to study computer-science 
> at an
university. Maybe I will find some literature to learn more 
> about
distributed software and how hashing needs to work, to do the job it 
> should
do, to make distributed indexing work.
Maybe than I can help to 
> implement this feature into  Solr.
On the other hand, not much is known 
> about Solr's distributed search-concept
and which classes are responsible for 
> that - but such things one could ask
on the mailing list, no? 

As far 
> as I know Elastic Search already supports distributed indexing. 
Maybe one 
> can reuse the responsible implementation for Solr.


Btw:
I think a 
> great benefit of using Solr + Nutch would be to extend the search.
I could 
> create several Solr cores for different kinds of search - one 
> for
picture-search, one for video-search etc. *and* with the help of Nutch I 
> can
index some of the needed content in special directories. So Solr does 
> not
need to care about indexing a picture - Nutch already does the job. 
> 

Kind regards,
- Mitch
-- 
View this message in context: 
> href="http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p901943.html";
>  
> target=_blank 
> >http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p901943.html
Sent 
> from the Solr - User mailing list archive at Nabble.com.

Reply via email to