Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-27 Thread via GitHub
renatoh commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2688845145 @rmuir could we reduce it to only two 'valid' behavior: onlyLongestMatch=true with reuseChars=false and onlyLongestMatch=false with reuseChars=true. if we think only these two cases mak

Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-27 Thread via GitHub
rmuir commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2688769041 @renatoh Feel free to open another PR, if you have time, to try to improve defaults around this for the next version of lucene. If i ask for "longest match" I don't expect to have addition

Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-27 Thread via GitHub
renatoh commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2688748464 > Thanks @renatoh ! thanks for your inputs and review it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-26 Thread via GitHub
rmuir merged PR #14278: URL: https://github.com/apache/lucene/pull/14278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-26 Thread via GitHub
rmuir commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2684994611 Thanks @renatoh ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-26 Thread via GitHub
renatoh commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2684233234 > Yes, I'm just suggesting to split it. We can add this new parameter here, backport to minor release 10.2.0, no breaking changes. Separately we can default it to `true` for 11.0?

Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-24 Thread via GitHub
rmuir commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2679383631 Yes, I'm just suggesting to split it. We can add this new parameter here, backport to minor release 10.2.0, no breaking changes. Separately we can default it to `true` for 11.0? -- Thi

Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-24 Thread via GitHub
rmuir commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2679239148 for changing defaults, my goto would be, if we could do that as a followup PR, for a major release. We can expose this parameter in a minor release without hurting anyone, but if w

Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-24 Thread via GitHub
renatoh commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2679257613 I would argue, at least in German, nothing but longestMatch=true and skipping forward does make any sense. Without skipping forward the filter extracts a lot of nonsense and in my opinio

Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-24 Thread via GitHub
rmuir commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2679218267 I'm not really opinionated on it, was just brainstorming because I had to look at the source code to figure out what the parameter was doing. And I agree, it is surprising behavior

Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-24 Thread via GitHub
renatoh commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2679176831 > looks good to me. I wonder about the name of the parameter, maybe "greedy" would be more intuitive as a way to describe what it is doing? not saying "consumeChars" is a good name

Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-24 Thread via GitHub
rmuir commented on code in PR #14278: URL: https://github.com/apache/lucene/pull/14278#discussion_r1967791155 ## lucene/analysis/common/src/test/org/apache/lucene/analysis/compound/TestCompoundWordTokenFilter.java: ## @@ -682,4 +687,41 @@ protected TokenStreamComponents createCo

Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-24 Thread via GitHub
rmuir commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2678643974 looks good to me. I wonder about the name of the parameter, maybe "greedy" would be more intuitive as a way to describe what it is doing? -- This is an automated message from the Apache