mfolnovic opened a new issue, #14030:
URL: https://github.com/apache/lucene/issues/14030

   ### Description
   
   Hello,
   
   I'm new to Lucene, so I apologize if this is expected behaviour.
   
   I've noticed current implementation of `PhraseQuery#rewrite` does not take 
into account fuzziness (`slop`) when transforming single-term `PhraseQuery` to 
`TermQuery`.
   
   This can be seen here: 
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/PhraseQuery.java#L287
 .
   
   But also with these examples (last example is what's questionable to me):
   
   | Raw query (parameter of `parse`) | Parsed query (`toString` output) | Note 
|
   |--------|--------|--------|
   | `A:12345` | `A:12345` | return value is `TermQuery(A:12345)` |
   | `A:12345~1` | `A:12345~1` |return value is `FuzzyQuery(A:12345~1)` |
   | `A:"12 345"~1` | `A:"12 345"~1` |return value is `PhraseQuery(A:"12 
345"~1)` |
   | `A:"12345"~1` | `A:12345` |return value is `TermQuery(A:12345)` |
   
   In last example, notice the disappearance of fuzziness. I see two 
alternatives that seem more correct than this:
   
   1. `PhraseQuery(A:"12345"~1)`
   2. `FuzzyQuery(A:12345~1)`
   
   In my understanding, 1st option seems correct. For example, if we imagine a 
document `{ "A": "12345 6789" }`, 1st option wouldn't match it while 2nd option 
would, and intention of original query was to not match it (because it was 
written as `PhraseQuery`).
   
   All examples are tested with: `new StandardQueryParser(new 
StandardAnalyzer()).parse(query, "A")`, and `toString()` output is pasted here.
   
   I'm open to contributing a fix for this (including unit tests). This also 
seems the case for `MultiPhraseQuery`, but I'm not 100% sure as I've never used 
it.
   
   Thank you for amazing product.
   
   ### Version and environment details
   
   Lucene version: 10.0.0
   OS: Arch Linux
   JDK: 21


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to