Hi, I have a huge Lucene index, which I'd like to split between machines ("Grid").
E.g. say I have a chain of book-stores, in different countries, and I'm aiming for the following: - Each country has its own index file, on its own machine (e.g. books from Japan are indexed on machine "japan1") - Most users search only within their own country (e.g. search only the "japan1" index) - But sometimes, they might ask to search the entire chain (all countries), meaning some sort of "map/reduce" (=collect data from all countries). The main challenge is the "entire chain search", especially if I want reasonable ranking. After some investigation (+great help from Hibernate Search forum), I've seen the following suggestions: 1) Implement a LuceneDirectory that transparently spreads across several machines. I'm not sure how the Search would work - can I ask each index for *relevant* data only? Or would I need to maintain one huge combined file, allowing "random access" for the Searcher? 2) Run an IndexReader on each machine. They tell me each reader can report its relevant term-frequencies, and based on that I can fetch relevant results from each machine. Apparently the ranking won't be perfect (for the overhaul result), but bearable. Now, I'm not familiar with Lucene internals, and would really appreciate your views on it. - Any good articles on Lucene "Gridding"? - Any idea whether approach #1 makes any sense (IMHO it's not very sensible if I need to merge everything to a single huge file). - Any good implementations (of either approaches)? So far I found Hibernate Search 4, and Solandra. Thanks very much.