+2 cents: At 2:43 PM +0530 1/9/07, Mekin Maheshwari wrote: >In general I felt that smaller indexes with different requirements >might be more flexible than 1 large index (Would a 3G index >considered large ?). eg. backing up the index, deploying a fresh >index, etc. But Solr does address most of these.
3Gb indexes are not at all unreasonable -- I have a Lucene-based (soon-to-be SOLR-based) app which uses 5 indexes, the biggest of which is 3.8Gb. The combined index is 6.7Gb. >The assumption could be baseless now & I should probably consider >having 1 index for all categories. An important thing to note is that Lucene does not store information in a grid as do RDBMSs, it only stores the fields which are explicitly defined for each Document. So if some class of Documents has a set of class-specific fields, there is no storage penalty for the non-class Documents which don't have them. And Lucene's querying mechanism is very efficient at dealing with sparse values in the index so the query-time penalty is slight. As Hoss pointed out, SOLR's wildcard-field specification makes it very simple take advantage of Lucene's sparse storage: SOLR will tell Lucene to index and/or store any field matching one of the wildcard patterns, and the Request Handlers will allow * as a field name which returns all stored fields in the resulting documents. So while there may still be some issues needing to be worked out with a single index in your specific case, it is probably much simpler than integrating hits from multiple indexes. - J.J.