[ 
https://issues.apache.org/jira/browse/SOLR-14467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-14467:
--------------------------------------
    Attachment: SOLR-14467.patch
                beast2.log.txt
                beast.log.txt
        Status: Open  (was: Open)

{quote}There's a nocommit marking an edge case problem for distributed merging. 
I think this happens for the case buckets are merged without any shard having 
initialized a bucket.
{quote}
Ah yeah ... that is an interesting edge case. (discussion of your suggested 
solution below)

Another interesting discrepancy in the output that I noticed when testing your 
patch is that in "single core" mode the "key" for {{relatedness()}} is 
completley missing from the {{allBuckets}} bucket while in distributed mode the 
key was coming back with a literal value of {{null}}. (It jumped out at me 
because i didn't understand why you had needed to add the {{notPresent(List)}} 
method to the test – and then realized it was to account for the possibility 
that either the key had no values, or the key had a {{null}} value).

I confirmed this is because of a logic discrepancy between how the per-shard 
(and single core) code paths dealing with {{SlotAcc.getValue()}} ignore result 
values of {{null}} (and don't add the key to the response) but the 
corresponding code path in {{FacetBucket.getMergedBucket()}} does not – we 
should be able to (safely) fix {{FacetBucket.getMergedBucket()}} to be 
consistent w/o impacting any other aggregations, since {{RelatednessAgg}} is 
the only one that can produce a "non-null" slot value on a shard (to return the 
fg/bg sizes for buckets that don't exist on the shard) but may still want a 
final "null" merged value (for allBuckets).

(stats like "sum" always return a value at the shard level, even if there are 
no docs in the bucket, so they always produce a merged value; while stats like 
"min" don't return anything at the shard level if no docs are in the bucket, so 
a merger is never initialized)

Which circles back to your point about merging in the situation where there are 
no buckets: we also have to consider the "how allBuckets behaves when there are 
no buckets" in the _non_ cloud case, where no merging happens – based on 
testing, some (not sure if all?) processors evidently do't do any collection on 
the the {{allBuckets}} slot at all in this situation – meaning that with the 
{{implied}} concept you introduced for the " {{// since we haven't been told 
about any docs for this slot, use a slot w/no counts}} " code path of 
{{getValue(int)}}, we have to ensure that (in the non-shard situation) a bucket 
that is implied externalizes as {{null}}.

So here's an updated patch with some additional improvements/fixes...
 * rolled the test patch and the code patch into one
 * beefed up javadocs for the SlotContext API methods
 * tweaked ref-guide wording to make it clear the {{relatedness()}} key won't 
be in the {{allBuckets}} bucket at all
 * tweaked TestCloudJSONFacetSKG to require that the key must be explicitly 
missing from allBuckets
 * added explicit testing of both basic allBuckets usage as well as the 
"allBuckets when there are no regular buckets" situation to to TestJsonFacets & 
TestJsonFacetRefinement
 * fixed {{FacetBucket.getMergedBucket()}} to ignore null (merged) values from 
stats
 * fixed {{BucketData.externalize()}} to consider {{implied}} flag for the 
non-shard case and return null
 * added some nocommits with reminders to revisit & consider if/how {{implied}} 
should affect plumbing methods like hashCode, equals, compareTo, etc...
 * incorporate your suggested change to {{BucketData.getMergedResult()}} to 
assume that entirely implied (merged data must be allBuckets

...back to your idea: i think you're correct, and this change should be safe. 
I'm trying to beast it now and wnat to think about it more ... BUT ... I think 
even if/once we confirm this is a "correct" fix, there might be a simpler 
solution?

Since we definitely need the {{implied}} concept to deal with the "never got a 
SlotContext so we don't know if it's the allBucket" situation, we may not need 
the special {{ALL_BUCKETS_SPECIAL_BUCKET_DATA}} object – I need to think it 
through a little more, but the gist of waht i have in mind is...
 * make {{implied}} non final
 * set {{implied=false}} anytime we {{incCounts()}}
 * if we know a bucket is {{allBuckets}} then never call {{incCounts()}} on it 
(already true)
 * if we are merging in a bucket from a shard that said it was {{implied:true}} 
then make sure we don't call {{incCounts()}} when merging it
 ** we should even be able to prevent the counts from being returned for an 
implied shard?
 ** (and assert that the counts are 0 when externalizing both {{isShard}} and 
{{implied}} ?)
 * for non-shard requests, externalizing an {{implied}} should always just be 
{{null}}

I think that would let us remove a lot of the special casing of {{allBuckets}} 
in terms of merging? .. like i said, I need to think it through more – i don't 
wnat to try and simplify/refactor any of this until test beasting seems solid.
----
Unfortunately with these new tests beasting can fail pretty easily ... we 
appear to have uncovered yet another (seemingly?) unrelated problem with 
allBuckets + refine in some processors that prevents beasting of SKG tests from 
getting very far.

I've attached 2 beast logs showing examples ... I'm still trying to wrap my 
head around this, i _think_ it's specific to using allBuckets on a nested 
subfacet? ... i'm going to open a new jira to track this (and try to repro in a 
simple test) but i'm not sure how far down the rabbit hole i care to go about 
this .. for the purpose of this jira, and testing {{relatedness()}}, we may 
want to just restrict the randomized testing of 'allBuckets' to only top level 
facets – since we don't actually care about allBuckets! (other then proving 
{{relatedness()}} doesn't break in that case ... if the processor can't handle 
refining allBuckets that's out of hte scope of what we're trying to test here.)

> inconsistent server errors combining relatedness() with allBuckets:true
> -----------------------------------------------------------------------
>
>                 Key: SOLR-14467
>                 URL: https://issues.apache.org/jira/browse/SOLR-14467
>             Project: Solr
>          Issue Type: Bug
>          Components: Facet Module
>            Reporter: Chris M. Hostetter
>            Priority: Major
>         Attachments: SOLR-14467.patch, SOLR-14467.patch, SOLR-14467.patch, 
> SOLR-14467_test.patch, SOLR-14467_test.patch, beast.log.txt, beast2.log.txt
>
>
> While working on randomized testing for SOLR-13132 i discovered a variety of 
> different ways that JSON Faceting's "allBuckets" option can fail when 
> combined with the "relatedness()" function.
> I haven't found a trivial way to manual reproduce this, but i have been able 
> to trigger the failures with a trivial patch to {{TestCloudJSONFacetSKG}} 
> which i will attach.
> Based on the nature of the failures it looks like it may have something to do 
> with multiple segments of different sizes, and or resizing the SlotAccs ?
> The relatedness() function doesn't have much (any?) existing tests in place 
> that leverage "allBuckets" so this is probably a bug that has always existed 
> -- it's possible it may be excessively cumbersome to fix and we might 
> nee/wnat to just document that incompatibility and add some code to try and 
> detect if the user combines these options and if so fail with a 400 error?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to