+2 cents:

At 2:43 PM +0530 1/9/07, Mekin Maheshwari wrote:
>In general I felt that smaller indexes with different requirements
>might be more flexible than 1 large index (Would a  3G index
>considered large ?). eg. backing up the index, deploying a fresh
>index, etc. But Solr does address most of these.

3Gb indexes are not at all unreasonable -- I have a Lucene-based (soon-to-be 
SOLR-based) app which uses 5 indexes, the biggest of which is 3.8Gb.  The 
combined index is 6.7Gb.

>The assumption could be baseless now & I should probably consider
>having 1 index for all categories.

An important thing to note is that Lucene does not store information in a grid 
as do RDBMSs, it only stores the fields which are explicitly defined for each 
Document. So if some class of Documents has a set of class-specific fields, 
there is no storage penalty for the non-class Documents which don't have them.  
And Lucene's querying mechanism is very efficient at dealing with sparse values 
in the index so the query-time penalty is slight.

As Hoss pointed out, SOLR's wildcard-field specification makes it very simple 
take advantage of Lucene's sparse storage: SOLR will tell Lucene to index 
and/or store any field matching one of the wildcard patterns, and the Request 
Handlers will allow * as a field name which returns all stored fields in the 
resulting documents.

So while there may still be some issues needing to be worked out with a single 
index in your specific case, it is probably much simpler than integrating hits 
from multiple indexes.

- J.J.

Reply via email to