herley-shaori commented on code in PR #15825:
URL: https://github.com/apache/lucene/pull/15825#discussion_r2937973100
##########
lucene/analysis/common/src/java/org/apache/lucene/analysis/cjk/CJKBigramFilter.java:
##########
@@ -333,6 +360,10 @@ private void flushUnigram() {
termAtt.setLength(len);
offsetAtt.setOffset(startOffset[index], endOffset[index]);
typeAtt.setType(SINGLE_TYPE);
+ if (!outputUnigrams && deferredPosInc > 0) {
Review Comment:
Thanks for the review! Applied your suggestion and extended the same
reasoning to the other guards:
- flushUnigram(): removed !outputUnigrams && (your suggestion)
- flushBigram(): added if (deferredPosInc > 0) guard to skip the redundant
setPositionIncrement(1) when
clearAttributes() already defaults to 1
- incrementToken() (both segment boundary checks): removed !outputUnigrams
&& before hadBigrams — since hadBigrams is only ever set true inside the
!outputUnigrams branch of flushBigram(), the outer check is redundant.
Also fixed TestCJKAnalyzer (testJa2, testMix, testMix2,
testReusableTokenStream, testFinalOffset) — same position increment updates
needed since CJKAnalyzer uses CJKBigramFilter with outputUnigrams=false.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]