[jira] [Commented] (LUCENE-9207) Don't build SpanQuery in QueryBuilder

Michael Gibney (Jira) Wed, 05 Feb 2020 06:08:06 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030680#comment-17030680
 ]


Michael Gibney commented on LUCENE-9207:
----------------------------------------

I think the special logic building SpanQueries for the slop=0 case was left in 
place by LUCENE-8531 because the resulting behavior is functionally identical 
to the MultiPhraseQuery approach, and SpanQueries for slop=0 are more efficient 
(potentially _vastly_ more efficient) than the exponential expansion that can 
result from MultiPhraseQuery over graph TokenStreams (e.g., for bigrams, 
synonyms, wdgf, etc.).

[~romseygeek], do you think the code simplification is worth the potential 
performance hit for the {{slop=0}} case? [~jim.ferenczi], [~sarowe], 
[~uschindler], I'm curious for your perspectives (having been involved in the 
discussion around LUCENE-8531). For heavily branching token streams (e.g., 
bigrams, certain tYpEs 0f 1nPuT to common WGDF configurations), the performance 
impact is substantial. I know of (and in fact personally know) many people who 
have been bitten by this in the form of SOLR-13336; but the underlying 
performance issue is not Solr-specific and is not directly addressed by the fix 
for SOLR-13336, which simply restores Lucene's maxBooleanClauses threshold for 
shortcircuiting individual queries.

FWIW, I think LUCENE-7398 is a bit of a red herring here; I'm shooting from the 
hip a bit, but I'm 90% confident that the LUCENE-7398 issues don't affect the 
slop=0 case for _query_-time graph TokenStreams; and to the extent that they 
affect _index_-time graph TokenStreams, they affect SpanQueries and 
MultiPhraseQuery equally (that's a whole separate question!).

> Don't build SpanQuery in QueryBuilder
> -------------------------------------
>
>                 Key: LUCENE-9207
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9207
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Subtask of LUCENE-9204.  QueryBuilder currently has special logic for graph 
> phrase queries with no slop, constructing a spanquery that attempts to follow 
> all paths using a combination of OR and NEAR queries.  Given the known bugs 
> in this type of query (LUCENE-7398) and that we would like to move span 
> queries out of core in any case, we should remove this logic and just build a 
> disjunction of phrase queries, one phrase per path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9207) Don't build SpanQuery in QueryBuilder

Reply via email to