One simple scenario to consider: N+1 collections - one collection per
document type with detailed fields for that document type, and one common
collection that indexes a subset of the fields. The main user query would be
an edismax over the common fields in that "main" collection. You can then
display summary results from the common collection. You can also then
support "drill down" into the type-specific collection based on a "type"
field for each document in the main collection.
Or, sure, you actually CAN index multiple document types in the same
collection - add all the fields to one schema - there is no time or space
penalty if most of the field are empty for most documents.
-- Jack Krupansky
-----Original Message-----
From: Chris Toomey
Sent: Tuesday, June 25, 2013 6:08 PM
To: solr-user@lucene.apache.org
Subject: Querying multiple collections in SolrCloud
Hi, I'm investigating using SolrCloud for querying documents of different
but similar/related types, and have read through docs. on the wiki and done
many searches in these archives, but still have some questions. Thanks in
advance for your help.
Setup:
* Say that I have N distinct types of documents and I want to do queries
that return the best matches regardless document type. I.e., something
akin to a Google search where I'd like to get the best matches from the
web, news, images, and maps.
* Our main use case is supporting simple user-entered searches, which would
just contain terms / phrases and wouldn't specify fields.
* The document types will not all have the same fields, though there may be
some overlap in the fields.
* We plan to use a separate collection for each document type, and to use
the eDisMax query parser. Each collection would have a document-specific
schema configuration with appropriate defaults for query fields and boosts,
etc.
Questions:
* Would the above setup qualify as "multiple compatible collections", such
that we could search all N collections with a single SolrCloud query, as in
the example query "
http://localhost:8983/solr/collection1/select?q=apple%20pie&collection=c1,c2,...,cN"?
Again, we're not querying against specific fields.
* How does SolrCloud combine the query results from multiple collections?
Does it re-sort the combined result set, or does it just return the
concatenation of the (unmerged) results from each of the collections?
* Does SolrCloud impose any restrictions on querying multiple, sharded
collections? I know it supports querying say all 3 shards of a single
collection, so want to make sure it would also support say all Nx3 shards
of N collections.
* When SolrCloud queries multiple shards/collections, it queries them
concurrently vs. serially, correct?
thanks much,
Chris