[ 
https://issues.apache.org/jira/browse/OPENNLP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312439#comment-17312439
 ] 

Tim Allison commented on OPENNLP-1270:
--------------------------------------

{noformat}
Adding  (bin)  leipzig/data/amh_community_2017-sentences.txt
Adding  (bin)  leipzig/data/amh_wikipedia_2014_30K-sentences.txt
Adding  (bin)  leipzig/data/amh_wikipedia_2016_30K-sentences.txt
Adding  (bin)  leipzig/data/asm_community_2017-sentences.txt
Adding  (bin)  leipzig/data/asm_wikipedia_2014_30K-sentences.txt
Adding  (bin)  leipzig/data/asm_wikipedia_2016_30K-sentences.txt
Adding         leipzig/data/azj-az_web_2015_1M-sentences.txt
Adding         leipzig/data/azj_wikipedia_2007_10K-sentences.txt
Adding         leipzig/data/ban-id_web_2013_30K-sentences.txt
Adding         leipzig/data/ban_community_2017-sentences.txt
Adding  (bin)  leipzig/data/bih_wikipedia_2016_10K-sentences.txt
Adding  (bin)  leipzig/data/div-mv_web_2016_1M-sentences.txt
Adding  (bin)  leipzig/data/div_newscrawl_2015_300K-sentences.txt
Adding         leipzig/data/gom_community_2017-sentences.txt
Adding         leipzig/data/gom_newscrawl_2011_30K-sentences.txt
Adding         leipzig/data/gom_wikipedia_2016_10K-sentences.txt
Adding         leipzig/data/hat-ht_web_2015_30K-sentences.txt
Adding         leipzig/data/hat_community_2017-sentences.txt
Adding  (bin)  leipzig/data/kas_community_2017-sentences.txt
Adding  (bin)  leipzig/data/khm_community_2017-sentences.txt
Adding         leipzig/data/knn-in_web_2015_10K-sentences.txt
Adding         leipzig/data/knn_community_2017-sentences.txt
Adding  (bin)  leipzig/data/lao_community_2017-sentences.txt
Adding  (bin)  leipzig/data/lao_community_2021-sentences.txt
Adding         leipzig/data/mhr_wikipedia_2014_10K-sentences.txt
Adding         leipzig/data/mhr_wikipedia_2016_30K-sentences.txt
Adding  (bin)  leipzig/data/mya_community_2017-sentences.txt
Adding         leipzig/data/new_wikipedia_2010_30K-sentences.txt
Adding  (bin)  leipzig/data/new_wikipedia_2016_30K-sentences.txt
Adding  (bin)  leipzig/data/ori_community_2017-sentences.txt
Adding  (bin)  leipzig/data/ori_wikipedia_2014_30K-sentences.txt
Adding  (bin)  leipzig/data/ori_wikipedia_2016_30K-sentences.txt
Adding         leipzig/data/tuk-tm_web_2015_30K-sentences.txt
Adding         leipzig/data/tuk-tm_web_2016_100K-sentences.txt
Adding         leipzig/data/tuk_community_2017-sentences.txt
Adding         leipzig/data/tuk_wikipedia_2016_30K-sentences.txt
Adding  (bin)  leipzig/data/uig_community_2017-sentences.txt
Adding  (bin)  leipzig/data/uig_community_2021-sentences.txt
Adding         leipzig/data/xho_community_2017-sentences.txt
Adding         leipzig/data/xho_mixed_2016_30K-sentences.txt
Adding         leipzig/data/yid_wikipedia_2010_30K-sentences.txt
Adding         leipzig/data/yid_wikipedia_2016_30K-sentences.txt
Adding         leipzig/data/zho-simp-tw_web_2014_300K-sentences.txt
Adding         leipzig/data/zho-trad_newscrawl_2011_1M-sentences.txt
Adding         leipzig/data/zsm_mixed-tufs4_2012_300K-sentences.txt
Adding         leipzig/data/zsm_web-tufs13_2012_300K-sentences.txt
Adding         leipzig/data/zsm_wikipedia-tufs16_2016_300K-sentences.txt
{noformat}

> Add new languages to the language detector
> ------------------------------------------
>
>                 Key: OPENNLP-1270
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1270
>             Project: OpenNLP
>          Issue Type: Task
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Major
>             Fix For: 1.9.4
>
>         Attachments: report.txt, report.txt
>
>
> Leipzig has several other languages that might be useful to add to the 
> language detector.  I've selected some with > 10k sentences.  Once I build 
> the model and evaluate performance, I'll share the reports, the model and a 
> tgz of the *-sentences.txt files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to