[ 
https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127197#comment-17127197
 ] 

Chris M. Hostetter commented on SOLR-13132:
-------------------------------------------

Hey Michael - a couple of questions/thoughts...
----
Re: GIT:61befab60696dc4267ab9c96e36bc266c93a2fc3

This is definitely awkward and tricky, and not at all something i'd really 
considered in SOLR-14467 when advocating for the introduction of 
{{SlotContext.isAllBucket()}}.

I want to think about this some more... at the moment 2 straw man ideas occur 
to me, but they feel like baby-bear/papa-bear solutions (too little / too 
much)...
 * (baby-bear) - disable sweeping if allBuckets is used.
 * (papa-bear) - add a SlotContext param to {{CountSlotAcc.incrementCount()}} 
and rework {{SKGSlotAcc.registerSweepingAccs()}} use custom {{CountSlotAcc}} 
instances that ~ "skip & track" the allBuckets slot in some way that 
SweepSKGSlotAcc can later ignore? maybe?

As much as a dislike the sort of hackish solution you put in place with the 
nocommits, it's simple and by comparison both of those "cleaner" ideas seem too 
heavy handed, in either direction.

Although ... An idea that just occured to me as i type this: would it be viable 
to have {{SKGSlotAcc.registerSweepingAccs()}} actually return the new 
{{SweepSKGSlotAcc}} instance it registers, and modify the {{collect}} methods 
of that class to be No-Ops except for noting & tracking the allBucketsSlot?

(although as i say that i, again, feel like it may be a heavy handed change to 
what you have for such a niche special circumstance)
----
Re: a6b1c60e61563535d7ba67c17d74f2bada6f80a2

What motivated this change?

I don't think this is a good idea, largely because of the answer to the 
question you asked in your commit message...
{quote}what was the point of the "if countAcc == null"?
{quote}
...this is explained in the javadocs of the methods you modified (and in the 
class level javadocs)
{quote}By default, this class assumes subclasses can support sweeping 
collection unless subclasses initialize <code>countAcc</code> directly in their 
constructors.
{quote}
{quote}This impl first initializes <code>countAcc</code> as a \{@link 
SweepingCountSlotAcc} if null.
{quote}
{{Although on apache/master it's un-explained in this particular 
class's/method's javadocs, the general convention of why processors use the " 
if (null == countAcc)}}{{{ countAcc = .. } }}" pattern is touched on in a 
comment in FacetFieldProcessor:
{code:java}
    // allow a custom count acc to be used
    if (countAcc == null) {
      countAcc = new SlotAcc.CountSlotArrAcc(fcontext, slotCount);
{code}
...the fact that FacetFieldProcessor doesn't hard code an assumption about how 
countAcc works is one of the reasons the current SweepingCountSlotAcc approach 
works in FacetFieldProcessorByArray.

I don't know if it's a good idea to *force* (future) subclasses to use sweeping 
by wrapping whatever "original" {{countAcc}} they init in a 
SweepingCountSlotAcc ... if you think there's a reason why it might make sense 
for a future subclass to use a SweepingCountSlotAcc that wraps an arbitrary 
CountSlotAcc then i'm all ears, but I'd rather the design _allow_ subclasses to 
override FacetFieldProcessorByArray's behavior by letting them init a 
SweepingCountSlotAcc that wraps a CountSlotAcc directly, rather then by 
_forcing_ the subclass to provide a CountSlotAcc that "gets wrapped" by their 
superclass.
----
I pushed a small change to replace TestCloudJSONFacetSKGSweep with equivilent 
improvements to the TestCloudJSONFacetSKGEquiv from upstream in the hopes of 
doing some heavy test beasting of all the json.facet tests on your branch – but 
almost immediately ran into a failure from TestCloudJSONFacetSKG...
{noformat}
   [junit4]   2> 5919 ERROR (qtp1172942992-67) [n:127.0.0.1:41317_solr 
c:org.apache.solr.search.facet.TestCloudJSONFacetSKG_collection s:shard2 
r:core_node4 
x:org.apache.solr.search.facet.TestCloudJSONFacetSKG_collection_shard2_replica_n2
 ] o.a.s.s.HttpSolrCall null:java.lang.ArrayIndexOutOfBoundsException: Index 8 
out of bounds for length 1
   [junit4]   2>        at 
org.apache.solr.search.facet.SlotAcc$CountSlotArrAcc.incrementCount(SlotAcc.java:841)
   [junit4]   2>        at 
org.apache.solr.search.facet.UnInvertedField.collectDocsGeneric(UnInvertedField.java:462)
   [junit4]   2>        at 
org.apache.solr.search.facet.UnInvertedField.collectDocs(UnInvertedField.java:437)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetFieldProcessorByArrayUIF.collectDocs(FacetFieldProcessorByArrayUIF.java:70)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.calcFacets(FacetFieldProcessorByArray.java:121)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.process(FacetFieldProcessorByArray.java:89)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:410)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:477)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:433)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetFieldProcessor.refineBucket(FacetFieldProcessor.java:1001)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetFieldProcessor.refineFacets(FacetFieldProcessor.java:966)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.calcFacets(FacetFieldProcessorByArray.java:97)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.process(FacetFieldProcessorByArray.java:89)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:410)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:477)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:433)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:65)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:410)
   [junit4]   2>        at 
org.apache.solr.search.facet.FacetModule.process(FacetModule.java:150)
   [junit4]   2>        at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:331)
   [junit4]   2>        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:210)
...
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestCloudJSONFacetSKG -Dtests.method=testRandom 
-Dtests.seed=AF0A69795F7EC65 -Dtests.slow=true -Dtests.badapples=true 
-Dtests.locale=fi-FI -Dtests.timezone=America/Atikokan -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   [junit4] ERROR   0.43s | TestCloudJSONFacetSKG.testRandom <<<
   [junit4]    > Throwable #1: java.lang.RuntimeException: init query failed: 
{main(q=(field_12_multi_ss:58+OR+field_9_multi_ss:57+OR+field_4_multi_sds:29+OR+field_12_multi_ss:60+OR+field_2_multi_sdsS:49+OR+field_13_multi_sds:47)&json.facet={"processEmpty":true,"facet_1":{"facet":{"processEmpty":true,"skg":"relatedness($fore,$back)","facet_2":{"facet":{"processEmpty":true,"skg":"relatedness($fore,$back)"},"limit":25,"overrequest":0,"type":"terms","field":"field_14_multi_idsS","refine":true,"domain":{"query":"*:*"}},"facet_3":{"facet":{"processEmpty":true,"skg":"relatedness($fore,$back)"},"limit":27,"overrequest":-1,"allBuckets":true,"perSeg":true,"type":"terms","field":"field_6_multi_ss","refine":true,"domain":{"query":"*:*"}}},"sort":"index+desc","limit":62,"overrequest":919339441,"allBuckets":true,"type":"terms","field":"field_11_multi_sdsS","refine":true,"domain":{"query":"*:*"}}}),extra(rows=0&fore=(field_14_multi_sdsS:64+OR+field_0_multi_ss:12+OR+field_2_multi_sdsS:50+OR+field_2_multi_sdsS:20+OR+field_5_multi_sdsS:42+OR+field_2_multi_sdsS:36+OR+field_9_multi_ss:19)&back=(field_2_multi_sdsS:49+OR+field_9_multi_ss:16+OR+field_10_multi_sds:28+OR+field_0_multi_ss:49+OR+field_11_multi_sdsS:29))}:
 Error from server at 
https://127.0.0.1:41317/solr/org.apache.solr.search.facet.TestCloudJSONFacetSKG_collection:
 Error from server at null: Index 8 out of bounds for length 1
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([AF0A69795F7EC65:78BC839824975A16]:0)
   [junit4]    >        at 
org.apache.solr.search.facet.TestCloudJSONFacetSKG.assertFacetSKGsAreCorrect(TestCloudJSONFacetSKG.java:382)
   [junit4]    >        at 
org.apache.solr.search.facet.TestCloudJSONFacetSKG.testRandom(TestCloudJSONFacetSKG.java:319)

{noformat}
...I haven't dug into this yet, but it's the same "refining partial sub-facet 
when using allBuckets" code path that you fixed in SOLR-14520, and this seed 
does *not* reproduce on upstream master (nor did i see anything like it when 
beasting json.faceting tests before committing SOLR-14520) suggesting that this 
is something related to sweeping.

Allthough ... even when i tried manually hacking {{DEFAULT_SWEEP_COLLECTION = 
false;}} locally it still reproduced? ... so it seems like it must be something 
else on the branch?

Ohhh... actual: i think this is related of the 
a6b1c60e61563535d7ba67c17d74f2bada6f80a2 change i asked about above? ... HA! 
... yeah: the refinement code you fixed in SOLR-14520 now looks like...
{code:java}
        // count is irrelevant, but hardcoded in collect(...), so 
intercept/mask normal counts.
        // Set here to prevent createAccs(...) from creating a 1-slot countAcc 
that will fail with AIOOBE
        countAcc = SlotAcc.DEV_NULL_SLOT_ACC;
        createAccs(nDocs, 1);
{code}
...but on this branch you modified {{createAccs()}} to always wrap the 
{{countAcc}} in a {{SweepingCountSlotAcc}} – which does not ignore all the 
slots it's asked to collect.

....

I briefly experimented with revering a6b1c60e61563535d7ba67c17d74f2bada6f80a2 
locally, but that just changed the nature of the failure to a class cast 
exception since methods like {{FacetFieldProcessorByArrayUIF.collectDocs()}} 
have hard coded casts like: 
{{registerSweepingAccIfSupportedByCollectAcc((SweepingCountSlotAcc) 
countAcc);}} ... which makes me think either we need {{SweepingCountSlotAcc}} 
to be smarter about the possibility it's wrapping something like 
{{DEV_NULL_SLOT_ACC}} or we need to change the API of 
{{registerSweepingAccIfSupportedByCollectAcc(...)}} to take in {{CountSlotAcc}} 
and do an instanceof check internally (i had initially assumed we wouldn't need 
it since only subclasses that support sweeping would need/want to call that 
method, but here's an example of a subclass that normally wants to use 
sweeping, except for the code paths where it's just refining some specific 
buckets.

WDYT?

> Improve JSON "terms" facet performance when sorted by relatedness 
> ------------------------------------------------------------------
>
>                 Key: SOLR-13132
>                 URL: https://issues.apache.org/jira/browse/SOLR-13132
>             Project: Solr
>          Issue Type: Improvement
>          Components: Facet Module
>    Affects Versions: 7.4, master (9.0)
>            Reporter: Michael Gibney
>            Priority: Major
>         Attachments: SOLR-13132-with-cache-01.patch, 
> SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate 
> {{relatedness}} for every term. 
> The current implementation uses a standard uninverted approach (either 
> {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain 
> base docSet, and then uses that initial pass as a pre-filter for a 
> second-pass, inverted approach of fetching docSets for each relevant term 
> (i.e., {{count > minCount}}?) and calculating intersection size of those sets 
> with the domain base docSet.
> Over high-cardinality fields, the overhead of per-term docSet creation and 
> set intersection operations increases request latency to the point where 
> relatedness sort may not be usable in practice (for my use case, even after 
> applying the patch for SOLR-13108, for a field with ~220k unique terms per 
> core, QTime for high-cardinality domain docSets were, e.g.: cardinality 
> 1816684=9000ms, cardinality 5032902=18000ms).
> The attached patch brings the above example QTimes down to a manageable 
> ~300ms and ~250ms respectively. The approach calculates uninverted facet 
> counts over domain base, foreground, and background docSets in parallel in a 
> single pass. This allows us to take advantage of the efficiencies built into 
> the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids 
> the per-term docSet creation and set intersection overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to