[ 
https://issues.apache.org/jira/browse/SOLR-14687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175086#comment-17175086
 ] 

Chris M. Hostetter edited comment on SOLR-14687 at 8/12/20, 12:32 AM:
----------------------------------------------------------------------

besides that fact that Jira's WYSIWYG editor lied to me and munged up some of 
the formatting of "STAR:STAR" and "UNDERSCORE nest UNDERSCORE path UNDERSCORE" 
in many places, something else has been nagging that i felt like i was 
overlooking and i finally figured out what it is: I hadn't really accounted for 
docs that _have_ a "nest path" but their path doesn't have any common ancestors 
with the {{parentPath}} specified – ie: how would a mix of {{/a/b/c}} hierarchy 
docs mixed in an index with docs having a hierarchy of {{/x/y/z}} wind up 
affecting each other?

I *think* that what i described above would still mostly work for the "parent" 
parser – even if the "parent filter" generated by a {{parentPath="/a/b/c"}} as 
i described above didn't really "rule out" the other docs, because this still 
wouldn't match the "nest path with a prefix of /a/b/c" rule for the "children", 
but it still wouldn't really be a "correct" "parents bit set filter" as the 
underlying code expects it to be in terms of identifying all "non children" 
documents ... but** I'm _pretty sure_ it would be broken for the "child" parser 
case, because some doc with a n "/x" or  "/x/y" path isn't going to be matched 
by the "parents filter bitset" so might get swallowed up in the list of 
children.

The other thing that bugged me was the (mistaken & missguided) need to ' ... 
compute a list of all "prefix subpaths" ... ' – i'm not sure way i thought that 
was necessary, instead of just saying "must _NOT_ have a prefix of the 
specified path – ie:
{code:java}
     GIVEN:    {!foo parentPath="/a/b/c"} ...

INSTEAD OF:    PARENT FILTER BITSET = ((*:* -_nest_path_:*) OR _nest_path_:(/a 
/a/b /a/b/c))

  JUST USE:    PARENT FILTER BITSET = (*:* -{prefix f="_nest_path_" 
v="/a/b/c/"}) {code}
...which (IIUC) should solve both problems, by matching:
 * docs w/o any nest path
 * docs with a nest path that does NOT start with /a/b/c/
 ** which includes the immediate "/a/b/c" parents, as well as their ancestors, 
as well as any docs with completely orthoginal paths (like /x/y/z)

But of course: in the case of {{parentFilter="/"}} this would still simply be 
"docs w/o a nest path"

That should work, right?
----
I also think i made some mistakes/types in my examples above in trying to 
articular what the equivalent "old style" query would be, so let me restate all 
of the examples in full...
{noformat}
NEW:  q={!parent parentPath="/a/b/c"}c_title:son

OLD:  q=(+{!field f="_nest_path_" v="/a/b/c"} +{!parent which=$ff v=$vv})
     ff=(*:* -{prefix f="_nest_path_" v="/a/b/c/"}) 
     vv=(+c_title:son +{prefix f="_nest_path_" v="/a/b/c/"})
{noformat}
{noformat}
NEW:  q={!parent parentPath="/"}c_title:son

OLD:  q=(-_nest_path_:* +{!parent which=$ff v=$vv}
     ff=(*:* -_nest_path_:*) 
     vv=(+c_title:son +_nest_path_:*)
{noformat}
{noformat}
NEW:  q={!child parentPath="/a/b/c"}p_title:dad

OLD:  q={!child of=$ff v=$vv})
     ff=(*:* -{prefix f="_nest_path_" v="/a/b/c/"}) 
     vv=(+p_title:dad +{field f="_nest_path_" v="/a/b/c"})
{noformat}
{noformat}
NEW:  q={!child parentPath="/"}p_title:dad

OLD:  q={!child of=$ff v=$vv})
     ff=(*:* -_nest_path_:*) 
     vv=(+p_title:dad -_nest_path_:*)
{noformat}
 

[~mkhl] - what do you think about this approach? do you see any flaws in the 
logic here? ... if the logic looks correct, I'd like to write it up as "how to 
create a *safe* of/which local param when using nest path" doc tip for 
SOLR-14383 and move forward there as a documentation improvement, even if there 
are still feature/implementation/syntax concerns/discussion to happen here as 
far as a "new feature"

 *EDIT*: fixed brain fart / typo of + vs - in last example


was (Author: hossman):
besides that fact that Jira's WYSIWYG editor lied to me and munged up some of 
the formatting of "STAR:STAR" and "UNDERSCORE nest UNDERSCORE path UNDERSCORE" 
in many places, something else has been nagging that i felt like i was 
overlooking and i finally figured out what it is: I hadn't really accounted for 
docs that _have_ a "nest path" but their path doesn't have any common ancestors 
with the {{parentPath}} specified – ie: how would a mix of {{/a/b/c}} hierarchy 
docs mixed in an index with docs having a hierarchy of {{/x/y/z}} wind up 
affecting each other?

I *think* that what i described above would still mostly work for the "parent" 
parser – even if the "parent filter" generated by a {{parentPath="/a/b/c"}} as 
i described above didn't really "rule out" the other docs, because this still 
wouldn't match the "nest path with a prefix of /a/b/c" rule for the "children", 
but it still wouldn't really be a "correct" "parents bit set filter" as the 
underlying code expects it to be in terms of identifying all "non children" 
documents ... but** I'm _pretty sure_ it would be broken for the "child" parser 
case, because some doc with a n "/x" or  "/x/y" path isn't going to be matched 
by the "parents filter bitset" so might get swallowed up in the list of 
children.

The other thing that bugged me was the (mistaken & missguided) need to ' ... 
compute a list of all "prefix subpaths" ... ' – i'm not sure way i thought that 
was necessary, instead of just saying "must _NOT_ have a prefix of the 
specified path – ie:
{code:java}
     GIVEN:    {!foo parentPath="/a/b/c"} ...

INSTEAD OF:    PARENT FILTER BITSET = ((*:* -_nest_path_:*) OR _nest_path_:(/a 
/a/b /a/b/c))

  JUST USE:    PARENT FILTER BITSET = (*:* -{prefix f="_nest_path_" 
v="/a/b/c/"}) {code}
...which (IIUC) should solve both problems, by matching:
 * docs w/o any nest path
 * docs with a nest path that does NOT start with /a/b/c/
 ** which includes the immediate "/a/b/c" parents, as well as their ancestors, 
as well as any docs with completely orthoginal paths (like /x/y/z)

But of course: in the case of {{parentFilter="/"}} this would still simply be 
"docs w/o a nest path"

That should work, right?
----
I also think i made some mistakes/types in my examples above in trying to 
articular what the equivalent "old style" query would be, so let me restate all 
of the examples in full...
{noformat}
NEW:  q={!parent parentPath="/a/b/c"}c_title:son

OLD:  q=(+{!field f="_nest_path_" v="/a/b/c"} +{!parent which=$ff v=$vv})
     ff=(*:* -{prefix f="_nest_path_" v="/a/b/c/"}) 
     vv=(+c_title:son +{prefix f="_nest_path_" v="/a/b/c/"})
{noformat}
{noformat}
NEW:  q={!parent parentPath="/"}c_title:son

OLD:  q=(-_nest_path_:* +{!parent which=$ff v=$vv}
     ff=(*:* -_nest_path_:*) 
     vv=(+c_title:son +_nest_path_:*)
{noformat}
{noformat}
NEW:  q={!child parentPath="/a/b/c"}p_title:dad

OLD:  q={!child of=$ff v=$vv})
     ff=(*:* -{prefix f="_nest_path_" v="/a/b/c/"}) 
     vv=(+p_title:dad +{field f="_nest_path_" v="/a/b/c"})
{noformat}
{noformat}
NEW:  q={!child parentPath="/"}p_title:dad

OLD:  q={!child of=$ff v=$vv})
     ff=(*:* -_nest_path_:*) 
     vv=(+p_title:dad +_nest_path_:*)
{noformat}
 

[~mkhl] - what do you think about this approach? do you see any flaws in the 
logic here? ... if the logic looks correct, I'd like to write it up as "how to 
create a *safe* of/which local param when using nest path" doc tip for 
SOLR-14383 and move forward there as a documentation improvement, even if there 
are still feature/implementation/syntax concerns/discussion to happen here as 
far as a "new feature"

 

> Make child/parent query parsers natively aware of _nest_path_
> -------------------------------------------------------------
>
>                 Key: SOLR-14687
>                 URL: https://issues.apache.org/jira/browse/SOLR-14687
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Chris M. Hostetter
>            Priority: Major
>
> A long standing pain point of the parent/child QParsers is the "all parents" 
> bitmask/filter specified via the "which" and "of" params (respectively).
> This is particularly tricky/painful to "get right" when dealing with 
> multi-level nested documents...
>  * 
> https://issues.apache.org/jira/browse/SOLR-14383?focusedCommentId=17166339&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17166339
>  * 
> [https://lists.apache.org/thread.html/r7633a366dd76e7ce9d98e6b9f2a65da8af8240e846f789d938c8113f%40%3Csolr-user.lucene.apache.org%3E]
> ...and it's *really* hard to get right when the nested structure isn't 100% 
> consistent among all docs:
>  * collections that mix docs w/o children and docs that have children.
>  ** Ex: blog posts, some of which have child docs that are "comments", but 
> some don't
>  * when some "types" of documents can exist at multiple levels:
>  ** Ex: top level "product" documents, which may have 2 types of children: 
> "skus" and "manuals", but "skus" may also have their own wku-specific child 
> "manuals"
> BUT! ... now that we have some semi-native support for the {{_nest_path_}} 
> field, i think it may be possible to offer an "easier to use" variant syntax 
> of the parent/child QParsers that directly depends on these fields. This new 
> syntax should be optional – and purely syntactic sugar. "expert" users should 
> be able to do all the same things using the existing syntax (possibly more 
> efficiently depending on what invarients exist in their data model)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to