Thanks Jack for the alternatives. The first is interesting but has the downside of requiring multiple queries to get the full matching docs. The second is interesting and very simple, but has the downside of not being modular and being difficult to configure field boosting when the collections have overlapping field names with different boosts being needed for the same field in different document types.
I'd still like to know about the viability of my original approach though too. Chris On Tue, Jun 25, 2013 at 3:19 PM, Jack Krupansky <j...@basetechnology.com>wrote: > One simple scenario to consider: N+1 collections - one collection per > document type with detailed fields for that document type, and one common > collection that indexes a subset of the fields. The main user query would > be an edismax over the common fields in that "main" collection. You can > then display summary results from the common collection. You can also then > support "drill down" into the type-specific collection based on a "type" > field for each document in the main collection. > > Or, sure, you actually CAN index multiple document types in the same > collection - add all the fields to one schema - there is no time or space > penalty if most of the field are empty for most documents. > > -- Jack Krupansky > > -----Original Message----- From: Chris Toomey > Sent: Tuesday, June 25, 2013 6:08 PM > To: solr-user@lucene.apache.org > Subject: Querying multiple collections in SolrCloud > > > Hi, I'm investigating using SolrCloud for querying documents of different > but similar/related types, and have read through docs. on the wiki and done > many searches in these archives, but still have some questions. Thanks in > advance for your help. > > Setup: > * Say that I have N distinct types of documents and I want to do queries > that return the best matches regardless document type. I.e., something > akin to a Google search where I'd like to get the best matches from the > web, news, images, and maps. > > * Our main use case is supporting simple user-entered searches, which would > just contain terms / phrases and wouldn't specify fields. > > * The document types will not all have the same fields, though there may be > some overlap in the fields. > > * We plan to use a separate collection for each document type, and to use > the eDisMax query parser. Each collection would have a document-specific > schema configuration with appropriate defaults for query fields and boosts, > etc. > > Questions: > * Would the above setup qualify as "multiple compatible collections", such > that we could search all N collections with a single SolrCloud query, as in > the example query " > http://localhost:8983/solr/**collection1/select?q=apple%** > 20pie&collection=c1,c2,..<http://localhost:8983/solr/collection1/select?q=apple%20pie&collection=c1,c2,..> > .,cN"**? > Again, we're not querying against specific fields. > > * How does SolrCloud combine the query results from multiple collections? > Does it re-sort the combined result set, or does it just return the > concatenation of the (unmerged) results from each of the collections? > > * Does SolrCloud impose any restrictions on querying multiple, sharded > collections? I know it supports querying say all 3 shards of a single > collection, so want to make sure it would also support say all Nx3 shards > of N collections. > > * When SolrCloud queries multiple shards/collections, it queries them > concurrently vs. serially, correct? > > thanks much, > Chris >