Mitch, If you use Nutch+Solr then you wouldn't *index* the fetched content with Nutch. Solr doesn't know anything about OPIC, but I suppose you can feed the OPIC score computed by Nutch into a Solr field and use it during scoring, if you want, say with a function query.
Yes, ES has built-in support for sharding and replication. It also makes it easy to implement custom scoring, which may work for OPIC here. Yes, ask questions here. :) Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: MitchK <mitc...@web.de> > To: solr-user@lucene.apache.org > Sent: Thu, June 17, 2010 1:52:32 AM > Subject: RE: Re: Re: Solr and Nutch/Droids - to use or not to use? > > Good morning! Great feedback from you all. This really helped a lot > to get an impression of what is possible and what is not. What is > interesting to me are some detail questions. Let's assume Solr is > possible to work on his own with distributed indexing, so that the client > does not need to know anything about shards etc. What is interesting to > me is: I. The scoring - Nutch uses special Scoring-implementations like > the OPIC-algorithm. Can Solr use such improvements or do I need to > reimplement it for Solr? II. The indexing. At the moment it > really sounds like nutch would index the whole stuff and afterwards Solr does > the job again. Regarding to indexing it would make sense, if Nutch computes > things like the document boost (I am not sure, but I think the results of the > OPIC-algorithm were added to each document as a boost) and sends an > indexing-request to Solr afterwards. However, if Nutch indexes the page's > content and Solr does it, too - I would waste some time, no? Is this the > case or do I missunderstood something here? III. I am no > Java-Expert. However, in a few month I will start to study computer-science > at an university. Maybe I will find some literature to learn more > about distributed software and how hashing needs to work, to do the job it > should do, to make distributed indexing work. Maybe than I can help to > implement this feature into Solr. On the other hand, not much is known > about Solr's distributed search-concept and which classes are responsible for > that - but such things one could ask on the mailing list, no? As far > as I know Elastic Search already supports distributed indexing. Maybe one > can reuse the responsible implementation for Solr. Btw: I think a > great benefit of using Solr + Nutch would be to extend the search. I could > create several Solr cores for different kinds of search - one > for picture-search, one for video-search etc. *and* with the help of Nutch I > can index some of the needed content in special directories. So Solr does > not need to care about indexing a picture - Nutch already does the job. > Kind regards, - Mitch -- View this message in context: > href="http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p901943.html" > > target=_blank > >http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p901943.html Sent > from the Solr - User mailing list archive at Nabble.com.