[ 
https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099437#comment-17099437
 ] 

Chris M. Hostetter commented on SOLR-13132:
-------------------------------------------

[~mgibney]- beefing up the randomized testing of the code paths involved with 
MultiAcc has uncovered 2 bugs - i committed some test changes showing these, 
but they can also be reproduced fairly easily with {{bin/solr -e techproducts}} 
...

What both cases have in common is:
 * limit==-1 to trigger single pass collection
 * 1 or more "non-sweepable" stats are being collected in addition to 
relatedness (so MultiAcc can't be completely optimized away)

----
Way to trigger First bug...
{noformat}
curl -sS -X POST http://localhost:8983/solr/techproducts/query -d 
'rows=0&q=inStock:true
&back=*:*                                  
&fore=popularity:[10 TO *]
&json.facet={
  hobby : {
    type : terms,
    field : cat,
    limit : -1,
    facet : {
      min : "min(price)",
      skg : { type : func,
              func : "relatedness($fore,$back)",
              sweep_collection: false,
      }
    }  
  }
}'
{noformat}
...if sweeping is explicitly disabled, then the "skg" stat completely drops out 
of the results, probably related to {{MultiAcc}} making some assumptions the 
{{SweepableAcc}} even when the call to {{foo.registerSweepingAccs(...)}} 
returned a non null result?
{noformat}
{
  "responseHeader":{
    "status":0,
    "QTime":88,
    "params":{
      "q":"inStock:true\n",
      "json.facet":"{\n  hobby : {\n    type : terms,\n    field : cat,\n    
limit : -1,\n    facet : {\n      min : \"min(price)\",\n      skg : { type : 
func,\n              func : \"relatedness($fore,$back)\",\n              
sweep_collection: false,\n      }\n    }  \n  }\n}",
      "back":"*:*                                  \n",
      "rows":"0",
      "fore":"popularity:[10 TO *]\n"}},
  "response":{"numFound":17,"start":0,"docs":[]
  },
  "facets":{
    "count":17,
    "hobby":{
      "buckets":[{
          "val":"electronics",
          "count":8,
          "min":74.98999786376953},
        {
          "val":"currency",
          "count":4},
        {
          "val":"memory",
          "count":3,
          "min":74.98999786376953},
        {
          "val":"hard drive",
          "count":2,
          "min":92.0},

...
{noformat}
----
Way to trigger Second bug...
{noformat}
curl -sS -X POST http://localhost:8983/solr/techproducts/query -d 
'rows=0&q=inStock:true
&back=*:*                                  
&fore=popularity:[10 TO *]
&json.facet={
  hobby : {
    type : terms,
    field : cat,
    limit : -1,
    facet : {
      skg : { type : func,
              func : "relatedness($fore,$back)",
              sweep_collection: true,
      },
      max : "max(price)"
      min : "min(price)"
    }  
  }
}'
{noformat}
...when there are multiple non-sweeping stats in the MultiAcc, we get an AIOOBE 
(it's possible the order of the stats matters in input, i didn't dig very 
deep)...
{noformat}
2020-05-05 00:25:21.371 ERROR (qtp1839168128-22) [   x:techproducts] 
o.a.s.s.HttpSolrCall null:java.lang.ArrayIndexOutOfBoundsException: 
arraycopy: last destination index 3 out of bounds for object array[2]
        at java.base/java.lang.System.arraycopy(Native Method)
        at org.apache.lucene.util.ArrayUtil.growExact(ArrayUtil.java:221)
        at 
org.apache.solr.search.facet.FacetFieldProcessor$MultiAcc.registerSweepingAccs(FacetFieldProcessor.java:777)
        at 
org.apache.solr.search.facet.FacetFieldProcessor.registerSweepingAccIfSupportedByCollectAcc(FacetFieldProcessor.java:797)
        at 
org.apache.solr.search.facet.FacetFieldProcessorByArrayUIF.collectDocs(FacetFieldProcessorByArrayUIF.java:68)
        at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.calcFacets(FacetFieldProcessorByArray.java:112)
        at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.process(FacetFieldProcessorByArray.java:62)
        at 
org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:416)
        at 
org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:474)
        at 
org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:431)
        at 
org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:64)
        at 
org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:416)
        at 
org.apache.solr.search.facet.FacetModule.process(FacetModule.java:147)
        at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:209)

{noformat}

> Improve JSON "terms" facet performance when sorted by relatedness 
> ------------------------------------------------------------------
>
>                 Key: SOLR-13132
>                 URL: https://issues.apache.org/jira/browse/SOLR-13132
>             Project: Solr
>          Issue Type: Improvement
>          Components: Facet Module
>    Affects Versions: 7.4, master (9.0)
>            Reporter: Michael Gibney
>            Priority: Major
>         Attachments: SOLR-13132-with-cache-01.patch, 
> SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate 
> {{relatedness}} for every term. 
> The current implementation uses a standard uninverted approach (either 
> {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain 
> base docSet, and then uses that initial pass as a pre-filter for a 
> second-pass, inverted approach of fetching docSets for each relevant term 
> (i.e., {{count > minCount}}?) and calculating intersection size of those sets 
> with the domain base docSet.
> Over high-cardinality fields, the overhead of per-term docSet creation and 
> set intersection operations increases request latency to the point where 
> relatedness sort may not be usable in practice (for my use case, even after 
> applying the patch for SOLR-13108, for a field with ~220k unique terms per 
> core, QTime for high-cardinality domain docSets were, e.g.: cardinality 
> 1816684=9000ms, cardinality 5032902=18000ms).
> The attached patch brings the above example QTimes down to a manageable 
> ~300ms and ~250ms respectively. The approach calculates uninverted facet 
> counts over domain base, foreground, and background docSets in parallel in a 
> single pass. This allows us to take advantage of the efficiencies built into 
> the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids 
> the per-term docSet creation and set intersection overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to