[ https://issues.apache.org/jira/browse/SOLR-14467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris M. Hostetter updated SOLR-14467: -------------------------------------- Attachment: SOLR-14467.patch beast2.log.txt beast.log.txt Status: Open (was: Open) {quote}There's a nocommit marking an edge case problem for distributed merging. I think this happens for the case buckets are merged without any shard having initialized a bucket. {quote} Ah yeah ... that is an interesting edge case. (discussion of your suggested solution below) Another interesting discrepancy in the output that I noticed when testing your patch is that in "single core" mode the "key" for {{relatedness()}} is completley missing from the {{allBuckets}} bucket while in distributed mode the key was coming back with a literal value of {{null}}. (It jumped out at me because i didn't understand why you had needed to add the {{notPresent(List)}} method to the test – and then realized it was to account for the possibility that either the key had no values, or the key had a {{null}} value). I confirmed this is because of a logic discrepancy between how the per-shard (and single core) code paths dealing with {{SlotAcc.getValue()}} ignore result values of {{null}} (and don't add the key to the response) but the corresponding code path in {{FacetBucket.getMergedBucket()}} does not – we should be able to (safely) fix {{FacetBucket.getMergedBucket()}} to be consistent w/o impacting any other aggregations, since {{RelatednessAgg}} is the only one that can produce a "non-null" slot value on a shard (to return the fg/bg sizes for buckets that don't exist on the shard) but may still want a final "null" merged value (for allBuckets). (stats like "sum" always return a value at the shard level, even if there are no docs in the bucket, so they always produce a merged value; while stats like "min" don't return anything at the shard level if no docs are in the bucket, so a merger is never initialized) Which circles back to your point about merging in the situation where there are no buckets: we also have to consider the "how allBuckets behaves when there are no buckets" in the _non_ cloud case, where no merging happens – based on testing, some (not sure if all?) processors evidently do't do any collection on the the {{allBuckets}} slot at all in this situation – meaning that with the {{implied}} concept you introduced for the " {{// since we haven't been told about any docs for this slot, use a slot w/no counts}} " code path of {{getValue(int)}}, we have to ensure that (in the non-shard situation) a bucket that is implied externalizes as {{null}}. So here's an updated patch with some additional improvements/fixes... * rolled the test patch and the code patch into one * beefed up javadocs for the SlotContext API methods * tweaked ref-guide wording to make it clear the {{relatedness()}} key won't be in the {{allBuckets}} bucket at all * tweaked TestCloudJSONFacetSKG to require that the key must be explicitly missing from allBuckets * added explicit testing of both basic allBuckets usage as well as the "allBuckets when there are no regular buckets" situation to to TestJsonFacets & TestJsonFacetRefinement * fixed {{FacetBucket.getMergedBucket()}} to ignore null (merged) values from stats * fixed {{BucketData.externalize()}} to consider {{implied}} flag for the non-shard case and return null * added some nocommits with reminders to revisit & consider if/how {{implied}} should affect plumbing methods like hashCode, equals, compareTo, etc... * incorporate your suggested change to {{BucketData.getMergedResult()}} to assume that entirely implied (merged data must be allBuckets ...back to your idea: i think you're correct, and this change should be safe. I'm trying to beast it now and wnat to think about it more ... BUT ... I think even if/once we confirm this is a "correct" fix, there might be a simpler solution? Since we definitely need the {{implied}} concept to deal with the "never got a SlotContext so we don't know if it's the allBucket" situation, we may not need the special {{ALL_BUCKETS_SPECIAL_BUCKET_DATA}} object – I need to think it through a little more, but the gist of waht i have in mind is... * make {{implied}} non final * set {{implied=false}} anytime we {{incCounts()}} * if we know a bucket is {{allBuckets}} then never call {{incCounts()}} on it (already true) * if we are merging in a bucket from a shard that said it was {{implied:true}} then make sure we don't call {{incCounts()}} when merging it ** we should even be able to prevent the counts from being returned for an implied shard? ** (and assert that the counts are 0 when externalizing both {{isShard}} and {{implied}} ?) * for non-shard requests, externalizing an {{implied}} should always just be {{null}} I think that would let us remove a lot of the special casing of {{allBuckets}} in terms of merging? .. like i said, I need to think it through more – i don't wnat to try and simplify/refactor any of this until test beasting seems solid. ---- Unfortunately with these new tests beasting can fail pretty easily ... we appear to have uncovered yet another (seemingly?) unrelated problem with allBuckets + refine in some processors that prevents beasting of SKG tests from getting very far. I've attached 2 beast logs showing examples ... I'm still trying to wrap my head around this, i _think_ it's specific to using allBuckets on a nested subfacet? ... i'm going to open a new jira to track this (and try to repro in a simple test) but i'm not sure how far down the rabbit hole i care to go about this .. for the purpose of this jira, and testing {{relatedness()}}, we may want to just restrict the randomized testing of 'allBuckets' to only top level facets – since we don't actually care about allBuckets! (other then proving {{relatedness()}} doesn't break in that case ... if the processor can't handle refining allBuckets that's out of hte scope of what we're trying to test here.) > inconsistent server errors combining relatedness() with allBuckets:true > ----------------------------------------------------------------------- > > Key: SOLR-14467 > URL: https://issues.apache.org/jira/browse/SOLR-14467 > Project: Solr > Issue Type: Bug > Components: Facet Module > Reporter: Chris M. Hostetter > Priority: Major > Attachments: SOLR-14467.patch, SOLR-14467.patch, SOLR-14467.patch, > SOLR-14467_test.patch, SOLR-14467_test.patch, beast.log.txt, beast2.log.txt > > > While working on randomized testing for SOLR-13132 i discovered a variety of > different ways that JSON Faceting's "allBuckets" option can fail when > combined with the "relatedness()" function. > I haven't found a trivial way to manual reproduce this, but i have been able > to trigger the failures with a trivial patch to {{TestCloudJSONFacetSKG}} > which i will attach. > Based on the nature of the failures it looks like it may have something to do > with multiple segments of different sizes, and or resizing the SlotAccs ? > The relatedness() function doesn't have much (any?) existing tests in place > that leverage "allBuckets" so this is probably a bug that has always existed > -- it's possible it may be excessively cumbersome to fix and we might > nee/wnat to just document that incompatibility and add some code to try and > detect if the user combines these options and if so fail with a 400 error? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org