[
https://issues.apache.org/jira/browse/OPENNLP-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653174#comment-17653174
]
ASF GitHub Bot commented on OPENNLP-1428:
-----------------------------------------
rzo1 commented on code in PR #473:
URL: https://github.com/apache/opennlp/pull/473#discussion_r1059494145
##########
opennlp-tools/src/main/java/opennlp/tools/util/DownloadUtil.java:
##########
@@ -174,4 +143,82 @@ public static <T extends BaseModel> T downloadModel(URL
url, Class<T> type) thro
}
}
+ @Internal
+ static class DownloadParser {
+
+ private static final Pattern LINK_PATTERN = Pattern.compile("<a
href=\\\"(.*?)\\\">(.*?)</a>", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
+ private final URL indexUrl;
+
+ DownloadParser(URL indexUrl) {
+ Objects.requireNonNull(indexUrl);
+ this.indexUrl = indexUrl;
+ }
+
+ Map<String, Map<ModelType, String>> getAvailableModels() {
+
+ final Matcher matcher = LINK_PATTERN.matcher(fetchPageIndex());
+
+ final List<String> links = new ArrayList<>();
+ while (matcher.find()) {
+ links.add(matcher.group(1));
+ }
+
+ return toMap(links);
+ }
+
+ private Map<String, Map<ModelType, String>> toMap(List<String> links) {
+
+ final Map<String, Map<ModelType, String>> result = new HashMap<>();
+
+ for (String link : links) {
Review Comment:
We are voting on models [1], so perhaps it would make sense to add the
structure of the model file names to [2]. I think, that the release process for
models is slightly different, so we maybe need a separate page for "Making a
model release" with instructions for conducting it.
However, the structure itself is indeed defined and described, see [3]. It
is a bit hidden though.
So maybe:
- Add a separate page to the webpage for "Doing a model release"
- Add the file name structure and some additional instructions to this
webpage as well.
[1] https://lists.apache.org/thread/t2v946x3s24yvvfc6hpq41lqnxshx2b0
[2] https://opennlp.apache.org/release.html
[3] https://dist.apache.org/repos/dist/dev/opennlp/ud-models-1.0/README
> Enhance DownloadUtil to avoid the use of hard-coded model urls
> --------------------------------------------------------------
>
> Key: OPENNLP-1428
> URL: https://issues.apache.org/jira/browse/OPENNLP-1428
> Project: OpenNLP
> Issue Type: Improvement
> Reporter: Richard Zowalla
> Assignee: Richard Zowalla
> Priority: Major
>
> As pointed out in https://github.com/apache/opennlp/pull/472, we should not
> rely on hard-coded URLs in DownloadUtil.
> Instead we can parse the content of
> https://dlcdn.apache.org/opennlp/models/ud-models-1.0/ and automatically
> derive the related model files from it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)