[ 
https://issues.apache.org/jira/browse/OPENNLP-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653173#comment-17653173
 ] 

ASF GitHub Bot commented on OPENNLP-1428:
-----------------------------------------

rzo1 commented on code in PR #473:
URL: https://github.com/apache/opennlp/pull/473#discussion_r1059494145


##########
opennlp-tools/src/main/java/opennlp/tools/util/DownloadUtil.java:
##########
@@ -174,4 +143,82 @@ public static <T extends BaseModel> T downloadModel(URL 
url, Class<T> type) thro
     }
   }
 
+  @Internal
+  static class DownloadParser {
+
+    private static final Pattern LINK_PATTERN = Pattern.compile("<a 
href=\\\"(.*?)\\\">(.*?)</a>", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
+    private final URL indexUrl;
+
+    DownloadParser(URL indexUrl) {
+      Objects.requireNonNull(indexUrl);
+      this.indexUrl = indexUrl;
+    }
+
+    Map<String, Map<ModelType, String>> getAvailableModels() {
+
+      final Matcher matcher = LINK_PATTERN.matcher(fetchPageIndex());
+
+      final List<String> links = new ArrayList<>();
+      while (matcher.find()) {
+        links.add(matcher.group(1));
+      }
+
+      return toMap(links);
+    }
+
+    private Map<String, Map<ModelType, String>> toMap(List<String> links) {
+
+      final Map<String, Map<ModelType, String>> result = new HashMap<>();
+
+      for (String link : links) {

Review Comment:
   We are voting on models [1], so perhaps it would make sense to add the 
structure of the model file names to [2].
   
   I think, that the release process for models is slightly different, so we 
maybe need a separate page for "Making a model release" with instructions for 
conducting it. 
   
   However, the structure itself is indeed defined and described, see [3]. It 
is a bit hidden though.
   
   
   [1] https://lists.apache.org/thread/t2v946x3s24yvvfc6hpq41lqnxshx2b0
   [2] https://opennlp.apache.org/release.html
   [3] https://dist.apache.org/repos/dist/dev/opennlp/ud-models-1.0/README





> Enhance DownloadUtil to avoid the use of hard-coded model urls
> --------------------------------------------------------------
>
>                 Key: OPENNLP-1428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1428
>             Project: OpenNLP
>          Issue Type: Improvement
>            Reporter: Richard Zowalla
>            Assignee: Richard Zowalla
>            Priority: Major
>
> As pointed out in https://github.com/apache/opennlp/pull/472, we should not 
> rely on hard-coded URLs in DownloadUtil.
> Instead we can parse the content of 
> https://dlcdn.apache.org/opennlp/models/ud-models-1.0/ and automatically 
> derive the related model files from it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to