[ 
https://issues.apache.org/jira/browse/SOLR-15048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-15048:
--------------------------------------
    Attachment: SOLR-15048.patch
        Status: Open  (was: Open)

so, if we ignore nulls, the way collapse deals with "boosted" docs is that any 
boosted doc is explicitly collected (even if there are multiple boosted docs 
from the same group), and all other docs in the same group(s) as boosted docs 
are _not_ collected: ie: {{elevateIds=1,5}} causes docs 1 & 5 to be boosted & 
returned, even if they would not normally be group heads, and any other docs in 
the same group(s) as either of those docs will not be returned (even if they 
would normally be the group head)

I initially thought that (from the code I looked at & some quick testing) that 
the "intent" when dealing with "null group id" docs was that boosting these 
docs would still cause all other "null group id" docs to behave as they would 
w/o boosting. ie: nullPolicy=collapse would still result in a "null" group 
using the "best" group head that wasn't already boosted; nullPolicy=expand 
would still result in all docs w/null group id being returned as if they were 
their own group.

In practice though, there is no rhyme or reason to how the current code 
behaves. some group head selectors cause nullPolicy=collapse to completely 
eliminate the null group if any docs in it are boosted (like a normal group) 
others cause the group to still be returned (like i initially thought). but 
when nullPolicy=expand, some group head selector code causes (non-boosted) null 
group docs to actually be returned in the wrong order acording to the top level 
sort (evidently due to a logic error, but i'm not sure how since they still get 
returned with the correct score?)

*TL;DR: some of the nullPolicy+boosting logic is flat out broken, the logic 
that isn't broken is internally inconsistent.*
----
I would like to try and just ignore all the existing (lack of) "logic" and 
instead "fix" nullPolicy+boosting such that:
 * boosting null docs (continues) to work even when nullPolicy=ignore
 * boosting w/nullPolicy=collapse is fixed to treat the "null group" exactly 
the same as a "real" group: if any docs in it are boosted, the group is not 
returned
 * boosting w/nullPolicy=expand should be fixed to ensure all "null" docs are 
returned (as if they were exach in their own group) in the correct order, 
w/correct scores, but any "boosted" null docs should come first (just like 
boosted regular docs)

...but before i can do that, i really need to get to the bottom of how/why 
nullPolicy=expand is sometimes collected _non_-boosted "null" docs out of their 
expected order ... because i can't even figure out how that's possible given 
the way the DelegatingCollector API works (let alone how the collapse code is – 
erroneously – making it happen.

[~jbernste] can you please take a look at testSmallWTF in the latest patch and 
help me understand how the ValueSource and SortSpec related "collapseStrategy" 
impls are moving "doc id=4" out of it's normal position in the results (inspite 
of it's score) ?

> collapse + query elevation behaves inconsistenty w/ 'null group' docs 
> depending on group head selector
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-15048
>                 URL: https://issues.apache.org/jira/browse/SOLR-15048
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Assignee: Chris M. Hostetter
>            Priority: Major
>         Attachments: SOLR-15048.patch, SOLR-15048.patch
>
>
> while working on SOLR-15047, I realized I wasn't really clear on what the 
> _expected_ semantics of were suppose to be when "boosting"
>   docs that had null values in the collapse field.
> I expanded on my test from that jira, to demonstrate the logic i (thought) i 
> understood from the Ord based collector - but then discovered that depending 
> on the group head selector used (ie: OrdScoreCollector vs 
> OrdFieldValueCollector+OrdIntStrategy vs 
> OrdFieldValueCollector+OrdValueSourceStrategy , etc...) you get different 
> behavior - not just in what group head is selected, but even when the 
> behavior should be functionally equivilent, you can get different sets of 
> groups. (even for simple string field collapsing, independent of the bugs in 
> numeric field collapsing).
>  
> I have not dug into WTF is happening here, but I'll attach my WIP test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to