[ https://issues.apache.org/jira/browse/LUCENE-9581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237197#comment-17237197 ]
ASF subversion and git services commented on LUCENE-9581: --------------------------------------------------------- Commit 24d7fe5c62b4815d90a9e21ce295483e55a5460b in lucene-solr's branch refs/heads/branch_8x from jimczi [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=24d7fe5 ] LUCENE-9581: Japanese tokenizer should discard the compound token instead of disabling the decomposition of long tokens when discardCompoundToken is activated. > Clarify discardCompoundToken behavior in the JapaneseTokenizer > -------------------------------------------------------------- > > Key: LUCENE-9581 > URL: https://issues.apache.org/jira/browse/LUCENE-9581 > Project: Lucene - Core > Issue Type: Bug > Reporter: Jim Ferenczi > Priority: Minor > Attachments: LUCENE-9581.patch, LUCENE-9581.patch, LUCENE-9581.patch > > > At first sight, the discardCompoundToken option added in LUCENE-9123 seems > redundant with the NORMAL mode of the Japanese tokenizer. When set to true, > the current behavior is to disable the decomposition for compounds, that's > exactly what the NORMAL mode does. > So I wonder if the right semantic of the option would be to keep only the > decomposition of the compound or if it's really needed. If the goal is to > make the output compatible with a graph token filter, the current workaround > to set the mode to NORMAL should be enough. > That's consistent with the mode that should be used to preserve positions in > the index since we don't handle position length on the indexing side. > Am I missing something regarding the new option ? Is there a compelling case > where it differs from the NORMAL mode ? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org