We are trying to convert a Lucene-based search solution to a
Solr/Lucene-based solution.  The problem we have is that we currently have
our data split into many indexes and Solr expects things to be in a single
index unless you're sharding.  In addition to this, our indexes wouldn't
work well using the distributed search functionality in Solr because the
documents are not evenly or randomly distributed.  We are currently using
Lucene's MultiSearcher to search over subsets of these indexes.

I know this has been brought up a number of times in previous posts and the
typical response is that the best thing to do is to convert everything into
a single index.  One of the major reasons for having the indexes split up
the way we do is because different types of data need to be indexed at
different intervals.  You may need one index to be updated every 20 minutes
and another is only updated every week.  If we move to a single index, then
we will constantly be warming and replacing searchers for the entire
dataset, and will essentially render the searcher caches useless.  If we
were able to have multiple indexes, they would each have a searcher and
updates would be isolated to a subset of the data.

The other problem is that we will likely need to shard this large single
index and there isn't a clean way to shard randomly and evenly across the of
the data.  We would, however like to shard a single data type.  If we could
use multiple indexes, we would likely be also sharding a small sub-set of
them.

Thanks in advance,

Ben

Reply via email to