Re: Federated Search

Ken Krugler Tue, 27 Feb 2007 15:22:23 -0800

I just downloaded Solr to try out, it seems like it will replace aton of code I've written. I saw a few posts about theFederatedSearch and skimmed the ideas athttp://wiki.apache.org/solr/FederatedSearch. The project I amworking on has several Lucene indexes 20-40GB in size spread among afew machines. I've also run into problems figuring out how to workwith Lucene in a distributed fashion, though all of my difficultieswere in indexing, searching with Multisearcher and a few customclasses on top of the hits was not that difficult.
Indexing involved using a SQL database as a master db so you couldfind documents by their unique ID and a JMS server to distributeadditions, deletions and updates to each of the indexing servers. Ieventually replaced the JMS server with someone custom I wrote thatis much more lightweight, and less prone to bogging down.
I'd be curious if Yonik was still on the list and if he or anyonehad any new ideas for Federated Searching.

I'm also interested in this. For me, I don't need sorted output,faceted browsing, or alternative output formats - so something alongthe lines of the "Merge XML responses w/o Schema" proposal would bejust fine.


Open issues:

1. How much better (if at all) would it be to use Hadoop PRC (versusHTTP) to call the sub-searchers? I'm assuming it has betterperformance, and there might be fewer connectivity issues, but thenyou aren't leveraging the work being done on embedded Jetty, forexample. Anybody have data points on relative performance?

2. Is there one master schema on the "main" search server that couldget distributed to the remote searchers, or would that be part of asnappuller-ish update mechanism?


Thanks,

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"

Re: Federated Search

Reply via email to