Re[2]: Federated Search

Tim Patton Mon, 05 Mar 2007 12:06:05 -0800


Jack L wrote:

This is very interesting discussion. I have a few question while
reading Tim and Venkatesh's email:

To Tim:
1. is there any reason you don't want to use HTTP? Since solr has
   an HTTP interface already, I suppose using HTTP is the simplest
   way to communicate the solr servers from the merger/search broker.
   hadoop and ice would both require some additional work - this is
   if you are using solr and not lucent directly.

2. "Do you broadcast to the slaves as to who owns a document?"
   Do the searchers need to know who has what document?

To Venkatesh:

1. I suppose solr is ok to handle 20 million document - I hope I'm
   right because that's what I'm planning on doing :) Is it because
   of storage capacity why you you choose to use multiple solr
   servers?

An open question: what's the best way to manage server addition?
- If a hash value-based partitioning is used, re-indexing all
  the document will be needed.
- Otherwise, a database seems to be required to track the documents.


Jack,

My big stumbling blocks were with indexing more so than searching. Idid put together an RMI based system to search multiple lucene servers.And the searchers don't need to know where everything is. Howeverwith indexing at some point something needs to know where to send thedocuments for updating or who to tell to delete a document, whether itis the server that does the processing or some sort of broker. Theprocessing machines could do the DB look up and talk to Solr over HTTPno problem and this is part of what I am considering doing. However Ihave some extra code on the indexing machines to handle DB updatesetc..., though I might find a way to move this elsewhere in the systemso I can have pretty much a pure solr server with just a few customitems (like my own Similarity or QueryParser).


I suppose the DB could be moved to lucene from SQL in the future as well.

Re[2]: Federated Search

Reply via email to