[jira] [Commented] (SOLR-14467) inconsistent server errors combining relatedness() with allBuckets:true

Michael Gibney (Jira) Fri, 22 May 2020 09:56:47 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-14467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114234#comment-17114234
 ]


Michael Gibney commented on SOLR-14467:
---------------------------------------

This sounds good to me.

One general question: when {{SlotContext.isAllBucket()==true}}, what would be 
returned by {{SlotContext.getSlotQuery()}}? I agree that using the base 
DocSet/domain for the entire facet as the slotContext in SpecialSlotAcc would 
be misleading. (I had also wondered about using {{field:[* TO *]}}, which would 
be slightly different, but equally misleading for all the same reasons you 
identified). So if there's no single slotQuery that _wouldn't_ be misleading in 
the allBuckets case, the options could be:
 # treat {{isAllBucket()==true}} as mutually exclusive with {{getSlotQuery()}}, 
and have the latter either return {{null}} or throw an 
{{IllegalStateException}}?
 # allow SlotContext to wrap the original (i.e. not allBuckets) slot and 
slotContext, e.g.:
{code:java}
public static final class SlotContext {
  private final boolean isAllBuckets;
  private final Query slotQuery;
  private int originalSlot = -1;
  private IntFunction<SlotContext> originalContext;
  /** constructs a normal instance */
  public SlotContext(Query slotQuery) {
    this.slotQuery = slotQuery;
    this.isAllBuckets = false;
  }
  /** constructs an allBuckets instance */
  public SlotContext() {
    this.slotQuery = null;
    this.isAllBuckets = true;
  }
  public Query getSlotQuery() {
    return isAllBuckets ? originalContext.apply(originalSlot).getSlotQuery() : 
slotQuery;
  }
  public boolean isAllBuckets() {
    return isAllBuckets;
  }
  /** provides access to the original slot, if desired? */
  public boolean getOriginalSlot() {
    return originalSlot;
  }
  /** called by SpecialSlotAcc before passing to collectAcc/otherAccs */
  public void updateAllBuckets(int originalSlot, IntFunction<SlotContext> 
originalContext) {
    this.originalSlot = originalSlot;
    this.originalContext = originalContext;
  }
}
{code}

FWIW I thought a little more about why I proceeded under the assumption that 
{{relatedness()}} would be meaningful for {{allBuckets}}. I actually _do_ think 
it could be relevant, but in a way that (upon further reflection) I think can 
only be practically calculated using sweep collection. And for that matter, 
considering the problems and awkwardness you identified with making 
{{RelatednessAgg}} directly aware of {{allBucketsSlot}} and "double-counting", 
could (I think?) be best supported by extending the concept of "sweep 
collection" to cover "normal" {{allBuckets}} collection. I'm not exactly sure 
what that would look like, but in any event it seems clear that it would be a 
different issue, if there's even any interest in pursuing it.

To briefly expand on why I think {{relatedness()}} might be meaningful for 
{{allBuckets}}: at a high level, say you have 5 buckets returned out of 10 
total buckets, and each of the 5 returned is perfectly correlated 
(relatedness==1.0). Despite this, you have no way of knowing how "special" 
these buckets are in the overall context of field you're faceting on ... 
perhaps all 10 bucket values are perfectly correlated; or perhaps the 5 buckets 
that weren't returned are perfectly _negatively_ correlated 
(relatedness==-1.0). I _think_ that calculating relatedness over allBuckets 
(with fgCount="sum(fgCount) over all buckets", bgCount="sum(bgCount) over all 
buckets", and fgSize and bgSize each multiplied by the total number of buckets) 
should give you a meaningful way of normalizing/contextualizing relatedness 
scores of individual buckets. But the non-sweep implementation of relatedness, 
being driven by the presence of values in the base domain, would be a bad fit 
for this, since it ignores all buckets not represented in the base domain 
(regardless of whether they might have values in the fgSet or bgSet that would 
be relevant to calculating a meaningful "allBuckets" relatedness score).

> inconsistent server errors combining relatedness() with allBuckets:true
> -----------------------------------------------------------------------
>
>                 Key: SOLR-14467
>                 URL: https://issues.apache.org/jira/browse/SOLR-14467
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Facet Module
>            Reporter: Chris M. Hostetter
>            Priority: Major
>         Attachments: SOLR-14467.patch, SOLR-14467_test.patch
>
>
> While working on randomized testing for SOLR-13132 i discovered a variety of 
> different ways that JSON Faceting's "allBuckets" option can fail when 
> combined with the "relatedness()" function.
> I haven't found a trivial way to manual reproduce this, but i have been able 
> to trigger the failures with a trivial patch to {{TestCloudJSONFacetSKG}} 
> which i will attach.
> Based on the nature of the failures it looks like it may have something to do 
> with multiple segments of different sizes, and or resizing the SlotAccs ?
> The relatedness() function doesn't have much (any?) existing tests in place 
> that leverage "allBuckets" so this is probably a bug that has always existed 
> -- it's possible it may be excessively cumbersome to fix and we might 
> nee/wnat to just document that incompatibility and add some code to try and 
> detect if the user combines these options and if so fail with a 400 error?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14467) inconsistent server errors combining relatedness() with allBuckets:true

Reply via email to