We are trying to convert a Lucene-based search solution to a Solr/Lucene-based solution. The problem we have is that we currently have our data split into many indexes and Solr expects things to be in a single index unless you're sharding. In addition to this, our indexes wouldn't work well using the distributed search functionality in Solr because the documents are not evenly or randomly distributed. We are currently using Lucene's MultiSearcher to search over subsets of these indexes.
I know this has been brought up a number of times in previous posts and the typical response is that the best thing to do is to convert everything into a single index. One of the major reasons for having the indexes split up the way we do is because different types of data need to be indexed at different intervals. You may need one index to be updated every 20 minutes and another is only updated every week. If we move to a single index, then we will constantly be warming and replacing searchers for the entire dataset, and will essentially render the searcher caches useless. If we were able to have multiple indexes, they would each have a searcher and updates would be isolated to a subset of the data. The other problem is that we will likely need to shard this large single index and there isn't a clean way to shard randomly and evenly across the of the data. We would, however like to shard a single data type. If we could use multiple indexes, we would likely be also sharding a small sub-set of them. Thanks in advance, Ben