[ https://issues.apache.org/jira/browse/LUCENE-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999146#comment-16999146 ]
Michael Sokolov commented on LUCENE-8596: ----------------------------------------- That looks like a real problem we should fix. There's no way to include a token with a "#" in a user dictionary. However it's problematic since any change here will not be backwards-compatible. We can just change the behavior to respect comments only at the beginning of the line, and document the breaking change. Some users may get bit when they upgrade (if they have been using the other comment style in their dictionaries). Or, we can introduce some API change to support both styles of commenting. This seems overly complex for a pretty small edge case though: I favor fixing the behavior with a breaking change plus documentation in CHANGES. If there's no objection, I'll merge this and add a note to CHANGES > The replacement of comments is a bug, in "UserDictionary.java" > -------------------------------------------------------------- > > Key: LUCENE-8596 > URL: https://issues.apache.org/jira/browse/LUCENE-8596 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis > Reporter: miyaharas > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > [https://github.com/apache/lucene-solr/blob/1d85cd783863f75cea133fb9c452302214165a4d/lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java#L68] > > hi > I think that this is bug. > I think the following is correct > {code:java} > line = line.replaceAll ("^ #. * $", ""); > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org