Michael, I'm not sure that objectType should be tied to which index something is stored in. If Solr does evolve multiple index support, one usecase would be partitioning data based on other factors than objectType (documentType).
It would seem more flexible for clients (the direct updater or querier of Solr) to identify which index should be used. Of course each index could have it's own schema, but it shouldn't be mandatory... it seems like a new index should be able to be created on-the-fly somehow, perhaps using an existing index as a template. On 4/12/06, Bryzek.Michael <[EMAIL PROTECTED]> wrote: > We did rough tests and found that creating multiple indexes performed > better at run time, especially as the logic to determine what results > should be presented to which customer became more complex. I would expect searching a small index would be somewhat faster than searching a large index with the small one embedded in it. How much faster though? Is it really worth the effort to separate things out? When you did the benchmarks, did you make sure to discount the first queries (because of first-use norm and FieldCache loading)? All that can be done in the background... I'm not arguing against extending Solr to support multiple indicies, but wondering if you could start using it as-is until such support is well hashed out. Seems so, since it seems to be an issue of performance (an optimization) and not functionallity, right? Another easy optimization you might be able to make external to Solr is to segment your site data into different Solr collections (on different boxes). This assumes that search traffic is naturally partitioned by siteId (but I may be misunderstanding). > a) Minimize the number of instances of SOLR. If I have 3 web > applications, each with 12 database tables to index, I don't want > to run 36 JVMs. I think introducing an objectType would address > this. Another possible option is to run multiple Solr instances (webapps) per appserver... I recall someone else going after this solution. > b) Optimize retrieval when I have some knowledge that I can use to > define partitions of data. This may actually be more appropriate > for Lucene itself, but I see SOLR pretty well positioned to > address. One approach is to introduce a "partitionField" that > SOLR would use to figure out if a new index is required. For each > unique value of the partitionField, we create a separate physical > index. If the query does NOT contain a term for the > partitionField, we use a multi reader to search across all > indexes. If the query DOES contain the term, we only search > across those partitions. While that approach might be better w/o caching, it might be worse with caching... it really depends on the nature of the index and the queries. It would really complicate Solr's caching though since a cache item would only be valid for certain combinations of sub-indicies. > We have tried using cached bitsets to implement this sort of > approach, but have found that when we have one large document set > partitioned into much smaller sets (e.g. 1-10% of the total > document space), creating separate indexes gives us a much higher > boost in performance. I assume this was with Lucene and not Solr? Solr has better/faster filter representations... (and if I ever get around to finishing it, a faster BitSet implementation too). -Yonik