Thanks Erick.  The problem with multiple cores is that the documents are scored 
independently in each core.  I would like to be able to search across both 
cores and have the scores 'normalized' in a way that's similar to what Lucene's 
MultiSearcher would do.  As far a I understand, multiple cores would likely 
result in seriously skewed scores in my case since the documents are not 
distributed evenly or randomly.  I could have one core/index with 20 million 
docs and another with 200.

I've poked around in the code and this feature doesn't seem to exist.  I would 
be happy with finding a decent place to try to add it.  I'm not sure if there 
is a clean place for it.

Ben

On Oct 20, 2010, at 8:36 PM, Erick Erickson <erickerick...@gmail.com> wrote:

> It seems to me that multiple cores are along the lines you
> need, a single instance of Solr that can search across multiple
> sub-indexes that do not necessarily share schemas, and are
> independently maintainable......
> 
> This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin
> 
> HTH
> Erick
> 
> On Wed, Oct 20, 2010 at 3:23 PM, ben boggess <ben.bogg...@gmail.com> wrote:
> 
>> We are trying to convert a Lucene-based search solution to a
>> Solr/Lucene-based solution.  The problem we have is that we currently have
>> our data split into many indexes and Solr expects things to be in a single
>> index unless you're sharding.  In addition to this, our indexes wouldn't
>> work well using the distributed search functionality in Solr because the
>> documents are not evenly or randomly distributed.  We are currently using
>> Lucene's MultiSearcher to search over subsets of these indexes.
>> 
>> I know this has been brought up a number of times in previous posts and the
>> typical response is that the best thing to do is to convert everything into
>> a single index.  One of the major reasons for having the indexes split up
>> the way we do is because different types of data need to be indexed at
>> different intervals.  You may need one index to be updated every 20 minutes
>> and another is only updated every week.  If we move to a single index, then
>> we will constantly be warming and replacing searchers for the entire
>> dataset, and will essentially render the searcher caches useless.  If we
>> were able to have multiple indexes, they would each have a searcher and
>> updates would be isolated to a subset of the data.
>> 
>> The other problem is that we will likely need to shard this large single
>> index and there isn't a clean way to shard randomly and evenly across the
>> of
>> the data.  We would, however like to shard a single data type.  If we could
>> use multiple indexes, we would likely be also sharding a small sub-set of
>> them.
>> 
>> Thanks in advance,
>> 
>> Ben
>> 

Reply via email to