[ https://issues.apache.org/jira/browse/LUCENE-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017895#comment-17017895 ]
Chen Zhixiang commented on LUCENE-9130: --------------------------------------- Lucene SloppyPhraseMatcher.java public boolean nextMatch() throws IOException { if (!positioned) { return false; } PhrasePositions pp = pq.pop(); assert pp != null; // if the pq is not full, then positioned == false captureLead(pp); matchLength = end - pp.position; int next = pq.top().position; while (advancePP(pp)) { if (hasRpts && !advanceRpts(pp)) { break; // pps exhausted } if (pp.position > next) { // done minimizing current match-length pq.add(pp); if (matchLength <= slop) { return true; } pp = pq.pop(); next = pq.top().position; assert pp != null; // if the pq is not full, then positioned == false matchLength = end - pp.position; } else { int matchLength2 = end - pp.position; if (matchLength2 < matchLength) { matchLength = matchLength2; } } captureLead(pp); } positioned = false; return matchLength <= slop; } Condition while (advancePP(pp)) doesn't match, and directly skip, matchLength=3, slop=2, so return false. I believe here exists a bug, but i cannot figure out why. > Failed to match when create PhraseQuery with terms analyzed from long query > text > -------------------------------------------------------------------------------- > > Key: LUCENE-9130 > URL: https://issues.apache.org/jira/browse/LUCENE-9130 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: 8.4 > Reporter: Chen Zhixiang > Priority: Major > Attachments: LongTextFieldSearchTest.java > > > When i use a long text (which is euqual to doc's StringField at indexing > time) to build a PhraseQuery, i cannot match the document. But BooleanQuery > with MUST/AND mode successes. > > long query text is a address string: > "申长路988弄虹桥万科中心地下停车场LG2层2179-2184车位(锡虹路入,LG1层开到底下LG2)" > test case is attached. > logs: > > 15:46:11.940 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:11.956 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:11.962 [main] INFO test.LongTextFieldSearchTest - query: +(+address:申 > +address:长 +address:路 +address:988 +address:弄 +address:虹桥 +address:万 > +address:科 +address:中 +address:心 +address:地下 +address:停车场 +address:lg > +address:2 +address:层 +address:2179 +address:2184 +address:车位 +address:锡 > +address:虹 +address:路 +address:入 +address:lg +address:1 +address:层 +address:开 > +address:到 +address:底下 +address:lg +address:2) > 15:46:11.988 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=1 > 15:46:12.181 [main] INFO test.LongTextFieldSearchTest - indexed terms: 开, 层, > 心, 弄, 万, 停车场, 地下, 科, 虹桥, 底下, 锡, 入, 2184, 中, 路, 到, 1, 2, 申, 2179, 车位, 988, 虹, > lg, 长 > 15:46:12.185 [main] INFO test.LongTextFieldSearchTest - terms: 申, 长, 路, 988, > 弄, 虹桥, 万, 科, 中, 心, 地下, 停车场, lg, 2, 层, 2179, 2184, 车位, 锡, 虹, 路, 入, lg, 1, 层, > 开, 到, 底下, lg, 2 > 15:46:12.188 [main] INFO test.LongTextFieldSearchTest - query: +address:"申 长 > 路 988 弄 虹桥 万 科 中 心 地下 停车场 lg 2 层 2179 2184 车位 锡 虹 路 入 lg 1 层 开 到 底下 lg 2"~2 > 15:46:12.210 [main] INFO test.LongTextFieldSearchTest - > results.totalHits.value=0 > 15:46:12.214 [main] INFO test.LongTextFieldSearchTest - no matching phrase -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org