[
https://issues.apache.org/jira/browse/OPENNLP-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864241#comment-17864241
]
ASF GitHub Bot commented on OPENNLP-1589:
-----------------------------------------
rzo1 opened a new pull request, #635:
URL: https://github.com/apache/opennlp/pull/635
This (draft) PRs introduces a more aggressive caching strategy in the cached
feature generator, which doesn't rely on `==`.
However, the eval results are a bit odd for the Conll02 dataset:
Default (Spanish):
```
opennlp.tools.util.featuregen.CachedFeatureGenerator@2ddc9a9f: hits=100385
misses=52923 hit%0.6547929657943486
opennlp.tools.util.featuregen.CachedFeatureGenerator@76a4ebf2: hits=98677
misses=51533 hit%0.6569269689101924
```
1589-Caching (Spanish)
```
opennlp.tools.util.featuregen.CachedFeatureGenerator@2e385cce: hits=102229
misses=51079 hit%0.6668210399979126
opennlp.tools.util.featuregen.CachedFeatureGenerator@6e6f2380: hits=99197
misses=51013 hit%0.6603887890286931
```
Default (main) (Dutch)
```
opennlp.tools.util.featuregen.CachedFeatureGenerator@2ddc9a9f: hits=67301
misses=37687 hit%0.6410351659237247
opennlp.tools.util.featuregen.CachedFeatureGenerator@76a4ebf2: hits=123179
misses=68875 hit%0.6413769044123007
```
1589-Caching (Dutch)
```
opennlp.tools.util.featuregen.CachedFeatureGenerator@45d84a20: hits=68174
misses=36814 hit%0.6493504019506992
opennlp.tools.util.featuregen.CachedFeatureGenerator@52f27fbd: hits=124618
misses=67436 hit%0.6488695887614941
```
As you can see, the aggressive mechanism results in better caching.
It doesn't have an impact on Spanish and on any other eval test **but** the
results for conll02 for **dutch** are odd (see changes in eval f1 scores).
They are sometimes slightly better but at the same time decrease in some
scenarios.
I am actually wondering, why we don't see such changes in f-measure for
Spanish. Therefore, I am opening this PR, so you can also investigate what is
going on here.
@mawiesne is also having a look here, but we would appreciate some
additional 👁️ 👁️
> Fix incorrect array check in CachedFeatureGenerator
> ---------------------------------------------------
>
> Key: OPENNLP-1589
> URL: https://issues.apache.org/jira/browse/OPENNLP-1589
> Project: OpenNLP
> Issue Type: Bug
> Components: Name Finder
> Affects Versions: 2.3.3
> Reporter: Martin Wiesner
> Assignee: Richard Zowalla
> Priority: Major
> Fix For: 2.4.0
>
>
> There is an invalid comparison for equality of two arrays in
> CachedFeatureGenerator#createFeatures(..) in line 58.
> Currently, many situations exists in which no improvements by the caching are
> achieved (as intended).
> This should be repaired and a test shall show the cache mechanism is working
> correctly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)