[ https://issues.apache.org/jira/browse/LUCENE-8682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376769#comment-17376769 ]
Michael Gibney commented on LUCENE-8682: ---------------------------------------- Yes, good catch! In most cases this likely wouldn't be a problem, but WDGF's introduction of the PositionLengthAttribute {{>1}} -- formerly static {{1}} in WDF -- [would trigger the building of|https://github.com/apache/lucene/blob/baceb1690442c2cdd6164f1faa34d65b54786a04/lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java#L328-L344] "graph phrases" in cases that would under WDF have built MultiPhraseQueries. The former has the potential for exponential expansion where the latter does not; in the case referred to on the solr users list, the [solution/workaround|https://lists.apache.org/thread.html/ra25ec5cb8bdfcd3d911f4d17a301a43ccf4ee02bf42cea59d1b0b9c0%40%3Cusers.solr.apache.org%3E] involved explicitly setting {{enableGraphPhrase=false}} on the fieldType. There may be other subtle differences in the order of tokens returned by the two filters \(?), and the non-graph version of the phrase query has its own problems I think (likely false negatives, false positives, in some cases). But I doubt there's much we can do to address the more subtle differences. The potential performance impact of migrating configs from WDF=>WDGF perhaps merits an upgrade note in CHANGES.txt mentioning the {{enableGraphQueries=false}} workaround. It'd also be possible to add general documentation (javadocs, refguide), independent of mentioning WDF, about the potential graph phrase query expansion issues with heavily branching token streams, and the (however sub-optimal) workaround of explicitly disabling graph phrase queries. Though that's pretty edge-casey, and might end up raising more questions than it answers, for the majority of users! > Remove WordDelimiterFilter > -------------------------- > > Key: LUCENE-8682 > URL: https://issues.apache.org/jira/browse/LUCENE-8682 > Project: Lucene - Core > Issue Type: Task > Affects Versions: main (9.0) > Reporter: Alan Woodward > Assignee: Alan Woodward > Priority: Major > Attachments: LUCENE-8682.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > WordDelimiterFilter was deprecated a while back. We can remove it entirely > from the master branch. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org