[ 
https://issues.apache.org/jira/browse/OPENNLP-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Wiesner resolved OPENNLP-1661.
-------------------------------------
    Resolution: Fixed

> Fix custom models being wiped from OpenNLP user.home directory
> --------------------------------------------------------------
>
>                 Key: OPENNLP-1661
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1661
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Models
>    Affects Versions: 2.5.0
>            Reporter: Martin Wiesner
>            Assignee: Martin Wiesner
>            Priority: Major
>             Fix For: 2.5.1
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, a Maven build ({{mvn clean test}}) wipes existing models in the 
> '{{user.home/.opennlp}}' directory, as the code in 
> {{AbstractDownloadUtilTest#cleanupWhenOnline}} will clean those up before the 
> related methods in {{DownloadUtil}} will be tested.
> However, this causes some headache, if custom-trained models with similar 
> name patterns exist in that directory, as:
> _wipeExistingModelFiles("\-tokens\-");_
> _wipeExistingModelFiles("\-sentence\-");_
> _wipeExistingModelFiles("\-pos\-");_
> _wipeExistingModelFiles("\-lemmas\-");_
> will be executed. Moreover, this also causes a lot of overhead for dev 
> people, as each run of the whole test suite will clean up either in the 
> target directory of {{opennlp-tools}} module, or even worse, the local 
> '{{user.home/.opennlp}}' directory, causing at least 128 (32 langs x 4 model 
> types) models to be downloaded (over and over again).
> Aims:
> * Ensure no (custom) model is accidentally removed from 
> '{{user.home/.opennlp}}'.
> * Ensure models downloads aren't repeated if they exist locally & are "valid" 
> (_sha512_)
> * Validate freshly downloaded models AND existing ones to discover broken 
> model files
> * Reduce download volume required for full (IT) builds
> * Reduce load for ASF infrastructure
> * Reduce overall ecological footprint
> Note: Same applies for 'ci' Maven profile. As long as no "mvn clean" is 
> executed, existing models kept in a build's {{target}} folder should not be 
> wiped and not be re-downloaded per test suite execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to