Re: [PR] BaseTokenStreamTestCase.assertAnalyzesTo fails when Analyzer contains… [lucene]

via GitHub Mon, 27 Nov 2023 12:15:36 -0800


msfroh commented on PR #12750:
URL: https://github.com/apache/lucene/pull/12750#issuecomment-1828469855


   I was looking into this and the approach used for (Edge)NGramTokenizer back 
in 2013: 
https://github.com/apache/lucene/commit/a03e38d5d05008aaef969a200071c03a1d6cb991
   
   The solution there is to *always* set the position increment and length to 
1: 
https://github.com/apache/lucene/blob/8ef6a0da56878177ff8d6880c92e8f7d0321d076/lucene/analysis/common/src/java/org/apache/lucene/analysis/ngram/NGramTokenizer.java#L186-L187
   
   With that change, your test passes (but I had to change every other test): 
https://github.com/msfroh/lucene/commit/0d05366c65a79aabc407e0662537520ba9c56737
   
   Given that it's not backward-compatible, I imagine it would have to be a 
change for 10.0? Also, whatever we do should probably also be applied to 
ReversePathHierarchyTokenizer too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] BaseTokenStreamTestCase.assertAnalyzesTo fails when Analyzer contains… [lucene]

Reply via email to