Have you looked at the streaming functionality (StreamingExpressions
and ParllelSQL in particular)? While it has some restrictions, it
easily handles cross-collection joins. It's generally intended for
analytic-type queries, but at your scale that may be what you need.

At that scale denoramlizing the data doesn't seem feasible....

Best,
Erick

On Sat, Dec 9, 2017 at 6:02 PM,  <ch...@yeeplusplus.com> wrote:
>
>
> I'm trying to figure out how to structure this query.
>
> I have two types of documents: items and sources.  Previously, they were all 
> in the same collection.  I'm now testing a cluster with separate collections.
>
> The items collection has 38,034,895,527 documents, and the sources collection 
> has 417,618,443 documents.
>
> I have all of the documents in the same collection in a solr cluster running 
> version 6.0.1, with 100 shards and replication factor 1.
>
> The following query works as expected:
>
> q=type:source&fq={!join from=source_id 
> to=source_id}item_category:abc&rows=0&stats=true&stats.field={!tag=pv1 
> count=true}source_id&facet=true&facet.pivot={!stats=pv1}source_factory&facet.sort=index&facet.limit=-1
>
> In the source documents, the source_id identifies the source.  In the items 
> documents, the source_id identifies the unique source document related to it. 
>  There is a 1:many relationship between sources and items.
>
> The above query gets the sources that are associated with items that have 
> item_category "abc", and then facets on the sources' source_factory field.
>
>
> Now, I'm testing a separate cluster that has the same data, but organized 
> into two collections: items and sources.
>
> In order to do the same query, I have to use a cross-collection join, which 
> requires the FROM collection to be unsharded.  However, in this case, the 
> FROM collection is the items collection, which due to its size cannot be 
> unsharded.
>
> I'm hoping there's an easy way to restructure my data / query to accomplish 
> the faceting I need.
>
> The data set is static so can be re-indexed and reconfigured as needed.  It's 
> also not under any load yet.
>

Reply via email to