I'm trying to figure out how to structure this query.

I have two types of documents: items and sources.  Previously, they were all in 
the same collection.  I'm now testing a cluster with separate collections.

The items collection has 38,034,895,527 documents, and the sources collection 
has 417,618,443 documents.

I have all of the documents in the same collection in a solr cluster running 
version 6.0.1, with 100 shards and replication factor 1.

The following query works as expected:

q=type:source&fq={!join from=source_id 
to=source_id}item_category:abc&rows=0&stats=true&stats.field={!tag=pv1 
count=true}source_id&facet=true&facet.pivot={!stats=pv1}source_factory&facet.sort=index&facet.limit=-1

In the source documents, the source_id identifies the source.  In the items 
documents, the source_id identifies the unique source document related to it.  
There is a 1:many relationship between sources and items.

The above query gets the sources that are associated with items that have 
item_category "abc", and then facets on the sources' source_factory field.


Now, I'm testing a separate cluster that has the same data, but organized into 
two collections: items and sources.

In order to do the same query, I have to use a cross-collection join, which 
requires the FROM collection to be unsharded.  However, in this case, the FROM 
collection is the items collection, which due to its size cannot be unsharded.

I'm hoping there's an easy way to restructure my data / query to accomplish the 
faceting I need.

The data set is static so can be re-indexed and reconfigured as needed.  It's 
also not under any load yet.

Reply via email to