[jira] [Commented] (LUCENE-10243) increase unicode versions of tokenizers to unicode 12.1

ASF subversion and git services (Jira) Fri, 03 Dec 2021 17:35:06 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453276#comment-17453276
 ]


ASF subversion and git services commented on LUCENE-10243:
----------------------------------------------------------

Commit eff5430e5877d84a6a0754f2b2f2aa0befeb7291 in lucene's branch 
refs/heads/branch_9x from Robert Muir
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=eff5430 ]

LUCENE-10243: increase unicode versions of tokenizers to 12.1 (#465)

* Bump %unicode 9 -> %unicode 12.1 for the 3 unicode grammars
* regenerate emoji conformance tests for unicode 12.1
* modify wordbreak conformance tests to use emoji data (which replaces old 
crazy E_base etc properties)
* regenerate wordbreak conformance tests
* Simplify grammar files and word-break conformance test generator, now that 
full-width numbers are WordBreak=Numeric
* Use jflex emoji properties rather than ICU-generated ones


> increase unicode versions of tokenizers to unicode 12.1
> -------------------------------------------------------
>
>                 Key: LUCENE-10243
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10243
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Robert Muir
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Followup from LUCENE-10239
> Bump the Unicode version of these tokenizers from Unicode 9 to 12.1, which is 
> the most recent supported by the jflex release.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10243) increase unicode versions of tokenizers to unicode 12.1

Reply via email to