[ https://issues.apache.org/jira/browse/LUCENE-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453268#comment-17453268 ]
ASF subversion and git services commented on LUCENE-10243: ---------------------------------------------------------- Commit c8f5b9127d29f3ccd3650fe79c77b5e639a7f200 in lucene's branch refs/heads/main from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c8f5b91 ] LUCENE-10243: increase unicode versions of tokenizers to 12.1 (#465) * Bump %unicode 9 -> %unicode 12.1 for the 3 unicode grammars * regenerate emoji conformance tests for unicode 12.1 * modify wordbreak conformance tests to use emoji data (which replaces old crazy E_base etc properties) * regenerate wordbreak conformance tests * Simplify grammar files and word-break conformance test generator, now that full-width numbers are WordBreak=Numeric * Use jflex emoji properties rather than ICU-generated ones > increase unicode versions of tokenizers to unicode 12.1 > ------------------------------------------------------- > > Key: LUCENE-10243 > URL: https://issues.apache.org/jira/browse/LUCENE-10243 > Project: Lucene - Core > Issue Type: Task > Reporter: Robert Muir > Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Followup from LUCENE-10239 > Bump the Unicode version of these tokenizers from Unicode 9 to 12.1, which is > the most recent supported by the jflex release. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org