[
https://issues.apache.org/jira/browse/OPENNLP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312439#comment-17312439
]
Tim Allison commented on OPENNLP-1270:
--------------------------------------
{noformat}
Adding (bin) leipzig/data/amh_community_2017-sentences.txt
Adding (bin) leipzig/data/amh_wikipedia_2014_30K-sentences.txt
Adding (bin) leipzig/data/amh_wikipedia_2016_30K-sentences.txt
Adding (bin) leipzig/data/asm_community_2017-sentences.txt
Adding (bin) leipzig/data/asm_wikipedia_2014_30K-sentences.txt
Adding (bin) leipzig/data/asm_wikipedia_2016_30K-sentences.txt
Adding leipzig/data/azj-az_web_2015_1M-sentences.txt
Adding leipzig/data/azj_wikipedia_2007_10K-sentences.txt
Adding leipzig/data/ban-id_web_2013_30K-sentences.txt
Adding leipzig/data/ban_community_2017-sentences.txt
Adding (bin) leipzig/data/bih_wikipedia_2016_10K-sentences.txt
Adding (bin) leipzig/data/div-mv_web_2016_1M-sentences.txt
Adding (bin) leipzig/data/div_newscrawl_2015_300K-sentences.txt
Adding leipzig/data/gom_community_2017-sentences.txt
Adding leipzig/data/gom_newscrawl_2011_30K-sentences.txt
Adding leipzig/data/gom_wikipedia_2016_10K-sentences.txt
Adding leipzig/data/hat-ht_web_2015_30K-sentences.txt
Adding leipzig/data/hat_community_2017-sentences.txt
Adding (bin) leipzig/data/kas_community_2017-sentences.txt
Adding (bin) leipzig/data/khm_community_2017-sentences.txt
Adding leipzig/data/knn-in_web_2015_10K-sentences.txt
Adding leipzig/data/knn_community_2017-sentences.txt
Adding (bin) leipzig/data/lao_community_2017-sentences.txt
Adding (bin) leipzig/data/lao_community_2021-sentences.txt
Adding leipzig/data/mhr_wikipedia_2014_10K-sentences.txt
Adding leipzig/data/mhr_wikipedia_2016_30K-sentences.txt
Adding (bin) leipzig/data/mya_community_2017-sentences.txt
Adding leipzig/data/new_wikipedia_2010_30K-sentences.txt
Adding (bin) leipzig/data/new_wikipedia_2016_30K-sentences.txt
Adding (bin) leipzig/data/ori_community_2017-sentences.txt
Adding (bin) leipzig/data/ori_wikipedia_2014_30K-sentences.txt
Adding (bin) leipzig/data/ori_wikipedia_2016_30K-sentences.txt
Adding leipzig/data/tuk-tm_web_2015_30K-sentences.txt
Adding leipzig/data/tuk-tm_web_2016_100K-sentences.txt
Adding leipzig/data/tuk_community_2017-sentences.txt
Adding leipzig/data/tuk_wikipedia_2016_30K-sentences.txt
Adding (bin) leipzig/data/uig_community_2017-sentences.txt
Adding (bin) leipzig/data/uig_community_2021-sentences.txt
Adding leipzig/data/xho_community_2017-sentences.txt
Adding leipzig/data/xho_mixed_2016_30K-sentences.txt
Adding leipzig/data/yid_wikipedia_2010_30K-sentences.txt
Adding leipzig/data/yid_wikipedia_2016_30K-sentences.txt
Adding leipzig/data/zho-simp-tw_web_2014_300K-sentences.txt
Adding leipzig/data/zho-trad_newscrawl_2011_1M-sentences.txt
Adding leipzig/data/zsm_mixed-tufs4_2012_300K-sentences.txt
Adding leipzig/data/zsm_web-tufs13_2012_300K-sentences.txt
Adding leipzig/data/zsm_wikipedia-tufs16_2016_300K-sentences.txt
{noformat}
> Add new languages to the language detector
> ------------------------------------------
>
> Key: OPENNLP-1270
> URL: https://issues.apache.org/jira/browse/OPENNLP-1270
> Project: OpenNLP
> Issue Type: Task
> Reporter: Tim Allison
> Assignee: Tim Allison
> Priority: Major
> Fix For: 1.9.4
>
> Attachments: report.txt, report.txt
>
>
> Leipzig has several other languages that might be useful to add to the
> language detector. I've selected some with > 10k sentences. Once I build
> the model and evaluate performance, I'll share the reports, the model and a
> tgz of the *-sentences.txt files.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)