Jack L wrote:
This is very interesting discussion. I have a few question while
reading Tim and Venkatesh's email:

To Tim:
1. is there any reason you don't want to use HTTP? Since solr has
   an HTTP interface already, I suppose using HTTP is the simplest
   way to communicate the solr servers from the merger/search broker.
   hadoop and ice would both require some additional work - this is
   if you are using solr and not lucent directly.

2. "Do you broadcast to the slaves as to who owns a document?"
   Do the searchers need to know who has what document?
To Venkatesh:
1. I suppose solr is ok to handle 20 million document - I hope I'm
   right because that's what I'm planning on doing :) Is it because
   of storage capacity why you you choose to use multiple solr
   servers?

An open question: what's the best way to manage server addition?
- If a hash value-based partitioning is used, re-indexing all
  the document will be needed.
- Otherwise, a database seems to be required to track the documents.


Jack,

My big stumbling blocks were with indexing more so than searching. I did put together an RMI based system to search multiple lucene servers. And the searchers don't need to know where everything is. However with indexing at some point something needs to know where to send the documents for updating or who to tell to delete a document, whether it is the server that does the processing or some sort of broker. The processing machines could do the DB look up and talk to Solr over HTTP no problem and this is part of what I am considering doing. However I have some extra code on the indexing machines to handle DB updates etc..., though I might find a way to move this elsewhere in the system so I can have pretty much a pure solr server with just a few custom items (like my own Similarity or QueryParser).

I suppose the DB could be moved to lucene from SQL in the future as well.

Reply via email to