multiple local indexes

Brent Palmer Tue, 28 Sep 2010 15:05:04 -0700

In our application, we need to be able to search across multiple localindexes. We need this not so much for performance reasons, but becauseof the particular needs of our project. But the indexes, while sharingthe same schema can be vary different in terms of size and distributionof documents. By that I mean that some indexes may have a lot moredocuments about some topic while others will have more documents aboutother topics. We want to be able add documents to the individualindexes as well. I can provide more detail about our project isnecessary. Thus, the Distributed Search feature with shards indifferent cores seems to be an obvious solution except for thelimitation of distributed idf.

First, I want to make sure my understanding about the distributed idflimitation are correct: If your documents are spread across your shardsevenly, then the distribution of terms across the individual shards canbe assumed to be even enough not to matter. If, as in our case, theshards are not very uniform, then this limitation is magnified. Eventhough simplistic, do I have the basic idea?

We have hacked together something that allows us to read from multipleindexes, but it isn't really a long-term solution. It's just sort ofshoe-horned in there. Here are some notes from the programmer whoworked on this:Two custom files: EgranaryIndexReaderFactory.java andEgranaryIndexReader.java

  EgranaryIndexReader.java

No real work is done here. This class extendslucene.index.MultiReader and overrides the directory() and getVersion()methods inherited from IndexReader.These methods don't make sense for a MultiReader as they only returna single value. However, Solr expects Readers to have these methods.directory() wasoverridden to return a call to directory() on the first reader in thesubreader list. The same was done for getVersion(). This hack makes anyuse of these methods

  by Solr somewhat pointless.

  EgranaryIndexReaderFactory.java
  Overrides the newReader(Directory indexDir, boolean readOnly) method

The expected behavior of this method is to construct a Reader fromthe index at indexDir.However, this method ignores indexDir and reads a list of indexDirsfrom the solrconfig.xml file.These indices are used to create a list of lucene.index.IndexReaderclasses. This list is then used to create the EgranaryIndexReader.

So the second questions is: Does anybody have other ideas about how wemight solve this problem? Is distributed search still our best bet?


Thanks for your thoughts!
Brent

multiple local indexes

Reply via email to