[ 
https://issues.apache.org/jira/browse/OPENNLP-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864241#comment-17864241
 ] 

ASF GitHub Bot commented on OPENNLP-1589:
-----------------------------------------

rzo1 opened a new pull request, #635:
URL: https://github.com/apache/opennlp/pull/635

   This (draft) PRs introduces a more aggressive caching strategy in the cached 
feature generator, which doesn't rely on `==`.
   
   However, the eval results are a bit odd for the Conll02 dataset:
   
   Default (Spanish):
   ```
   opennlp.tools.util.featuregen.CachedFeatureGenerator@2ddc9a9f: hits=100385 
misses=52923 hit%0.6547929657943486
   opennlp.tools.util.featuregen.CachedFeatureGenerator@76a4ebf2: hits=98677 
misses=51533 hit%0.6569269689101924
   ```
   
   1589-Caching (Spanish)
   ```
   opennlp.tools.util.featuregen.CachedFeatureGenerator@2e385cce: hits=102229 
misses=51079 hit%0.6668210399979126
   opennlp.tools.util.featuregen.CachedFeatureGenerator@6e6f2380: hits=99197 
misses=51013 hit%0.6603887890286931
   ```
   
   Default (main) (Dutch)
   ```
   opennlp.tools.util.featuregen.CachedFeatureGenerator@2ddc9a9f: hits=67301 
misses=37687 hit%0.6410351659237247
   opennlp.tools.util.featuregen.CachedFeatureGenerator@76a4ebf2: hits=123179 
misses=68875 hit%0.6413769044123007
   ```
   
   1589-Caching (Dutch)
   ```
   opennlp.tools.util.featuregen.CachedFeatureGenerator@45d84a20: hits=68174 
misses=36814 hit%0.6493504019506992
   opennlp.tools.util.featuregen.CachedFeatureGenerator@52f27fbd: hits=124618 
misses=67436 hit%0.6488695887614941
   ```
   
   As you can see, the aggressive mechanism results in better caching.
   
   It doesn't have an impact on Spanish and on any other eval test **but** the 
results for conll02 for **dutch** are odd (see changes in eval f1 scores).
   
   They are sometimes slightly better but at the same time decrease in some 
scenarios. 
   
   I am actually wondering, why we don't see such changes in f-measure for 
Spanish. Therefore, I am opening this PR, so you can also investigate what is 
going on here.
   
   @mawiesne is also having a look here, but we would appreciate some 
additional 👁️ 👁️ 
   
   




> Fix incorrect array check in CachedFeatureGenerator
> ---------------------------------------------------
>
>                 Key: OPENNLP-1589
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1589
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Name Finder
>    Affects Versions: 2.3.3
>            Reporter: Martin Wiesner
>            Assignee: Richard Zowalla
>            Priority: Major
>             Fix For: 2.4.0
>
>
> There is an invalid comparison for equality of two arrays in 
> CachedFeatureGenerator#createFeatures(..) in line 58.
> Currently, many situations exists in which no improvements by the caching are 
> achieved (as intended).
> This should be repaired and a test shall show the cache mechanism is working 
> correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to