[ 
https://issues.apache.org/jira/browse/LUCENE-8682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376769#comment-17376769
 ] 

Michael Gibney commented on LUCENE-8682:
----------------------------------------

Yes, good catch! In most cases this likely wouldn't be a problem, but WDGF's 
introduction of the PositionLengthAttribute {{>1}} -- formerly static {{1}} in 
WDF -- [would trigger the building 
of|https://github.com/apache/lucene/blob/baceb1690442c2cdd6164f1faa34d65b54786a04/lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java#L328-L344]
 "graph phrases" in cases that would under WDF have built MultiPhraseQueries. 
The former has the potential for exponential expansion where the latter does 
not; in the case referred to on the solr users list, the 
[solution/workaround|https://lists.apache.org/thread.html/ra25ec5cb8bdfcd3d911f4d17a301a43ccf4ee02bf42cea59d1b0b9c0%40%3Cusers.solr.apache.org%3E]
 involved explicitly setting {{enableGraphPhrase=false}} on the fieldType.

There may be other subtle differences in the order of tokens returned by the 
two filters \(?), and the non-graph version of the phrase query has its own 
problems I think (likely false negatives, false positives, in some cases). But 
I doubt there's much we can do to address the more subtle differences. The 
potential performance impact of migrating configs from WDF=>WDGF perhaps merits 
an upgrade note in CHANGES.txt mentioning the {{enableGraphQueries=false}} 
workaround.

It'd also be possible to add general documentation (javadocs, refguide), 
independent of mentioning WDF, about the potential graph phrase query expansion 
issues with heavily branching token streams, and the (however sub-optimal) 
workaround of explicitly disabling graph phrase queries. Though that's pretty 
edge-casey, and might end up raising more questions than it answers, for the 
majority of users!

> Remove WordDelimiterFilter
> --------------------------
>
>                 Key: LUCENE-8682
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8682
>             Project: Lucene - Core
>          Issue Type: Task
>    Affects Versions: main (9.0)
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8682.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> WordDelimiterFilter was deprecated a while back.  We can remove it entirely 
> from the master branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to