[GitHub] [lucene] uschindler commented on issue #9231: HyphenationCompoundWordTokenFilter creates overlapping tokens with onlyLongestMatch enabled [LUCENE-8183]

via GitHub Thu, 13 Jul 2023 04:57:26 -0700


uschindler commented on issue #9231:
URL: https://github.com/apache/lucene/issues/9231#issuecomment-1634114514


   Hi thanks @MartinDemberger ,
   the PR looks good - you also added tests and an hyphenation XML file, 
although I have not closely looked into the internals of what you are actually 
doing.
   
   I think it should be fine to merge this into head, but I'd like to get 
another look by @rmuir who was one of the committers working on that 
TokenFilter. If this also fixes the problems with my dictionary and the 
configuration presented on its repository 
(https://github.com/uschindler/german-decompounder) I am more than happy.
   
   To be clear: Except reordering tokens there aren't any backwards 
compatibility issues by the new features? From what I understood it only 
removes useless tokens - order of tokens with same position does not matter. So 
actually somebody having an index that was created with the older version of 
that filter won't see any serious issues, just some inprecise matches may no 
longer be returned (because either the token is no longer in new documents of 
the index or the generated query no longer contains the token). So it would 
only return less matches, but no wrong matches.
   
   To me this looks fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on issue #9231: HyphenationCompoundWordTokenFilter creates overlapping tokens with onlyLongestMatch enabled [LUCENE-8183]

Reply via email to