Thanks Erick. The problem with multiple cores is that the documents are scored independently in each core. I would like to be able to search across both cores and have the scores 'normalized' in a way that's similar to what Lucene's MultiSearcher would do. As far a I understand, multiple cores would likely result in seriously skewed scores in my case since the documents are not distributed evenly or randomly. I could have one core/index with 20 million docs and another with 200.
I've poked around in the code and this feature doesn't seem to exist. I would be happy with finding a decent place to try to add it. I'm not sure if there is a clean place for it. Ben On Oct 20, 2010, at 8:36 PM, Erick Erickson <erickerick...@gmail.com> wrote: > It seems to me that multiple cores are along the lines you > need, a single instance of Solr that can search across multiple > sub-indexes that do not necessarily share schemas, and are > independently maintainable...... > > This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin > > HTH > Erick > > On Wed, Oct 20, 2010 at 3:23 PM, ben boggess <ben.bogg...@gmail.com> wrote: > >> We are trying to convert a Lucene-based search solution to a >> Solr/Lucene-based solution. The problem we have is that we currently have >> our data split into many indexes and Solr expects things to be in a single >> index unless you're sharding. In addition to this, our indexes wouldn't >> work well using the distributed search functionality in Solr because the >> documents are not evenly or randomly distributed. We are currently using >> Lucene's MultiSearcher to search over subsets of these indexes. >> >> I know this has been brought up a number of times in previous posts and the >> typical response is that the best thing to do is to convert everything into >> a single index. One of the major reasons for having the indexes split up >> the way we do is because different types of data need to be indexed at >> different intervals. You may need one index to be updated every 20 minutes >> and another is only updated every week. If we move to a single index, then >> we will constantly be warming and replacing searchers for the entire >> dataset, and will essentially render the searcher caches useless. If we >> were able to have multiple indexes, they would each have a searcher and >> updates would be isolated to a subset of the data. >> >> The other problem is that we will likely need to shard this large single >> index and there isn't a clean way to shard randomly and evenly across the >> of >> the data. We would, however like to shard a single data type. If we could >> use multiple indexes, we would likely be also sharding a small sub-set of >> them. >> >> Thanks in advance, >> >> Ben >>