[
https://issues.apache.org/jira/browse/OPENNLP-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653128#comment-17653128
]
ASF GitHub Bot commented on OPENNLP-1428:
-----------------------------------------
jzonthemtn commented on code in PR #473:
URL: https://github.com/apache/opennlp/pull/473#discussion_r1059381083
##########
opennlp-tools/src/main/java/opennlp/tools/util/DownloadUtil.java:
##########
@@ -174,4 +143,82 @@ public static <T extends BaseModel> T downloadModel(URL
url, Class<T> type) thro
}
}
+ @Internal
+ static class DownloadParser {
+
+ private static final Pattern LINK_PATTERN = Pattern.compile("<a
href=\\\"(.*?)\\\">(.*?)</a>", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
+ private final URL indexUrl;
+
+ DownloadParser(URL indexUrl) {
+ Objects.requireNonNull(indexUrl);
+ this.indexUrl = indexUrl;
+ }
+
+ Map<String, Map<ModelType, String>> getAvailableModels() {
+
+ final Matcher matcher = LINK_PATTERN.matcher(fetchPageIndex());
+
+ final List<String> links = new ArrayList<>();
+ while (matcher.find()) {
+ links.add(matcher.group(1));
+ }
+
+ return toMap(links);
+ }
+
+ private Map<String, Map<ModelType, String>> toMap(List<String> links) {
+
+ final Map<String, Map<ModelType, String>> result = new HashMap<>();
+
+ for (String link : links) {
Review Comment:
Do you have any suggestions on where we should document the structure of the
model file names?
> Enhance DownloadUtil to avoid the use of hard-coded model urls
> --------------------------------------------------------------
>
> Key: OPENNLP-1428
> URL: https://issues.apache.org/jira/browse/OPENNLP-1428
> Project: OpenNLP
> Issue Type: Improvement
> Reporter: Richard Zowalla
> Assignee: Richard Zowalla
> Priority: Major
>
> As pointed out in https://github.com/apache/opennlp/pull/472, we should not
> rely on hard-coded URLs in DownloadUtil.
> Instead we can parse the content of
> https://dlcdn.apache.org/opennlp/models/ud-models-1.0/ and automatically
> derive the related model files from it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)