[ 
https://issues.apache.org/jira/browse/OPENNLP-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854961#comment-17854961
 ] 

ASF GitHub Bot commented on OPENNLP-1567:
-----------------------------------------

mawiesne commented on code in PR #606:
URL: https://github.com/apache/opennlp/pull/606#discussion_r1639398105


##########
opennlp-tools-models/src/main/java/opennlp/tools/models/ClasspathModelFinder.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.net.URI;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
+
+import io.github.classgraph.ClassGraph;

Review Comment:
   I'm not against classgraph as a library. 
   
   Nevertheless: is there an easy way to scan/look for model files in the 
classpath _without_ introducing this 3rd party dependency? Aka: Can we realize 
it with plain JDK classes?
   
   wdyt @jzonthemtn @kinow @rzo1 ?



##########
opennlp-tools-models/src/main/java/opennlp/tools/models/ClasspathModelFinder.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.net.URI;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
+
+import io.github.classgraph.ClassGraph;
+import io.github.classgraph.ResourceList;
+import io.github.classgraph.ScanResult;
+
+
+/**
+ * Enables the detection of OpenNLP models in the classpath.
+ */
+public class ClasspathModelFinder {
+
+  private static final String OPENNLP_MODEL_JAR_PREFIX = 
"opennlp-models-*.jar";
+  private final String jarModelPrefix;
+  private Set<ClassPathModelEntry> models;
+
+  /**
+   * By default, it scans for "opennlp-models-*.jar".
+   */
+  public ClasspathModelFinder() {
+    this(OPENNLP_MODEL_JAR_PREFIX);
+  }
+
+  /**
+   * @param modelJarPrefix The leafnames of the jars that should be canned 
(e.g. "opennlp.jar").
+   *                      May contain a wildcard glob ("opennlp-*.jar"). It 
must not be {@code null}.
+   */
+  public ClasspathModelFinder(String modelJarPrefix) {
+    Objects.requireNonNull(modelJarPrefix, "modelJarPrefix must not be null");
+    this.jarModelPrefix = modelJarPrefix;
+  }
+
+  /**
+   * Finds OpenNLP models within the classpath.
+   *
+   * @param reloadCache {@code true}, if the internal cache should explicitly 
be reloaded
+   * @return A Set of {@link ClassPathModelEntry ClassPathModelEntries}. It 
might be empty.
+   */
+  public Set<ClassPathModelEntry> findModels(boolean reloadCache) {
+
+    if (this.models == null || reloadCache) {
+      try (ScanResult sr = new 
ClassGraph().acceptJars(jarModelPrefix).disableDirScanning().scan()) {

Review Comment:
   See comment above on introducing ClassGraph as a dependency.



##########
opennlp-tools-models/src/main/java/opennlp/tools/models/ClasspathModelFinder.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.net.URI;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
+
+import io.github.classgraph.ClassGraph;
+import io.github.classgraph.ResourceList;
+import io.github.classgraph.ScanResult;
+
+
+/**
+ * Enables the detection of OpenNLP models in the classpath.
+ */
+public class ClasspathModelFinder {
+
+  private static final String OPENNLP_MODEL_JAR_PREFIX = 
"opennlp-models-*.jar";
+  private final String jarModelPrefix;
+  private Set<ClassPathModelEntry> models;
+
+  /**
+   * By default, it scans for "opennlp-models-*.jar".
+   */
+  public ClasspathModelFinder() {
+    this(OPENNLP_MODEL_JAR_PREFIX);
+  }
+
+  /**
+   * @param modelJarPrefix The leafnames of the jars that should be canned 
(e.g. "opennlp.jar").
+   *                      May contain a wildcard glob ("opennlp-*.jar"). It 
must not be {@code null}.
+   */
+  public ClasspathModelFinder(String modelJarPrefix) {
+    Objects.requireNonNull(modelJarPrefix, "modelJarPrefix must not be null");
+    this.jarModelPrefix = modelJarPrefix;
+  }
+
+  /**
+   * Finds OpenNLP models within the classpath.
+   *
+   * @param reloadCache {@code true}, if the internal cache should explicitly 
be reloaded
+   * @return A Set of {@link ClassPathModelEntry ClassPathModelEntries}. It 
might be empty.
+   */
+  public Set<ClassPathModelEntry> findModels(boolean reloadCache) {
+
+    if (this.models == null || reloadCache) {
+      try (ScanResult sr = new 
ClassGraph().acceptJars(jarModelPrefix).disableDirScanning().scan()) {
+
+        final List<URI> classpathModels = getResourcesMatchingWildcard(sr, 
"*.bin");
+        final List<URI> classPathProperties = getResourcesMatchingWildcard(sr, 
"model.properties");
+
+        this.models = new HashSet<>();
+
+        for (URI model : classpathModels) {
+          URI m = null;
+          for (URI prop : classPathProperties) {
+            if (jarPathsMatch(model, prop)) {
+              m = prop;
+              break;
+            }
+          }
+          this.models.add(new ClassPathModelEntry(model, 
Optional.ofNullable(m)));
+
+        }
+      }
+    }
+    return this.models;
+  }
+
+  private List<URI> getResourcesMatchingWildcard(final ScanResult sr, final 
String resourceWildcard) {
+    try (final ResourceList resources = 
sr.getResourcesMatchingWildcard(resourceWildcard)) {
+      return resources.getURIs();
+    }
+  }
+
+  private boolean jarPathsMatch(URI uri1, URI uri2) {
+    final String[] parts1 = parseJarURI(uri1);
+    final String[] parts2 = parseJarURI(uri2);
+
+    if (parts1 == null || parts2 == null) {
+      return false;
+    }
+
+    return parts1[0].equals(parts2[0]);
+  }
+
+  private String[] parseJarURI(URI uri) {
+    try {
+      if ("jar".equals(uri.getScheme())) {
+        final String ssp = uri.getSchemeSpecificPart();
+        final int separatorIndex = ssp.indexOf("!/");
+        if (separatorIndex > 0) {
+          final String jarFileUri = ssp.substring(0, separatorIndex);
+          final String entryPath = ssp.substring(separatorIndex + 2);
+          return new String[] {jarFileUri, entryPath};
+        }
+      }
+    } catch (Exception ignored) {

Review Comment:
   Pls leave a comment, why it safe to ignore all exceptions here.



##########
opennlp-tools-models/src/test/java/opennlp/tools/models/ClassPathModelFinderTest.java:
##########
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.util.Set;
+
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertNotNull;
+
+public class ClassPathModelFinderTest {
+
+  @Test
+  public void testFindOpenNLPModels() {
+    final ClasspathModelFinder finder = new ClasspathModelFinder();

Review Comment:
   Please transparently prepare this "system under test" object in a 
@BeforeEach annotated _setup_ method. Avoids code duplication, see other test 
case.



##########
opennlp-tools-models/pom.xml:
##########
@@ -0,0 +1,95 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.
+-->
+
+<project xmlns="http://maven.apache.org/POM/4.0.0";
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+    <modelVersion>4.0.0</modelVersion>
+    <parent>
+        <groupId>org.apache.opennlp</groupId>
+        <artifactId>opennlp</artifactId>
+        <version>2.3.4-SNAPSHOT</version>
+    </parent>
+
+    <artifactId>opennlp-tools-models</artifactId>
+    <packaging>jar</packaging>
+    <name>Apache OpenNLP Tools Models</name>
+
+    <dependencies>
+        <dependency>
+            <groupId>org.apache.opennlp</groupId>
+            <artifactId>opennlp-tools</artifactId>
+            <version>${project.version}</version>
+            <scope>provided</scope>
+        </dependency>
+
+        <dependency>
+            <groupId>io.github.classgraph</groupId>
+            <artifactId>classgraph</artifactId>
+            <version>${classgraph.version}</version>
+        </dependency>
+
+        <dependency>
+            <groupId>org.slf4j</groupId>
+            <artifactId>slf4j-api</artifactId>
+        </dependency>
+
+        <dependency>
+            <groupId>org.junit.jupiter</groupId>
+            <artifactId>junit-jupiter-api</artifactId>
+            <scope>test</scope>
+        </dependency>
+
+        <dependency>
+            <groupId>org.junit.jupiter</groupId>
+            <artifactId>junit-jupiter-engine</artifactId>

Review Comment:
   Do we really need this as a test dep?



##########
opennlp-tools-models/src/main/java/opennlp/tools/models/ClasspathModelFinder.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.net.URI;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
+
+import io.github.classgraph.ClassGraph;
+import io.github.classgraph.ResourceList;
+import io.github.classgraph.ScanResult;
+
+
+/**
+ * Enables the detection of OpenNLP models in the classpath.
+ */
+public class ClasspathModelFinder {
+
+  private static final String OPENNLP_MODEL_JAR_PREFIX = 
"opennlp-models-*.jar";

Review Comment:
   This assumption should be made transparent to the user of this class in the 
JavaDoc of that class. Please add a hint, so the caller knows, what models will 
be accepted/scanned for, that is, what the naming pattern is, "starts with..."
   
   If required otherwise, a custom pattern can be used, see constructor 
alternative.
   
   It should be helpful, to read of this directly at the class' JavaDoc.



##########
opennlp-tools-models/src/test/java/opennlp/tools/models/ClassPathModelLoaderTest.java:
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.util.Set;
+
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertNotNull;
+
+public class ClassPathModelLoaderTest {
+
+  @Test
+  public void testLoadOpenNLPModel() throws Exception {
+    final ClasspathModelFinder finder = new 
ClasspathModelFinder("opennlp-models-langdetect-*.jar");
+
+    final Set<ClassPathModelEntry> models = finder.findModels(false);
+    assertNotNull(models);
+    assertEquals(1, models.size());
+
+    final ClassPathModelEntry entry = models.iterator().next();
+    assertNotNull(entry);
+    assertNotNull(entry.model());
+    assertNotNull(entry.properties());
+
+    //test
+    final ClassPathModelLoader loader = new ClassPathModelLoader();

Review Comment:
   See previous comment on @BeforeEach for SUT classes.



##########
opennlp-tools-models/src/main/java/opennlp/tools/models/ClasspathModelFinder.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.net.URI;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
+
+import io.github.classgraph.ClassGraph;
+import io.github.classgraph.ResourceList;
+import io.github.classgraph.ScanResult;
+
+
+/**
+ * Enables the detection of OpenNLP models in the classpath.
+ */
+public class ClasspathModelFinder {
+
+  private static final String OPENNLP_MODEL_JAR_PREFIX = 
"opennlp-models-*.jar";
+  private final String jarModelPrefix;
+  private Set<ClassPathModelEntry> models;
+
+  /**
+   * By default, it scans for "opennlp-models-*.jar".
+   */
+  public ClasspathModelFinder() {
+    this(OPENNLP_MODEL_JAR_PREFIX);
+  }
+
+  /**
+   * @param modelJarPrefix The leafnames of the jars that should be canned 
(e.g. "opennlp.jar").
+   *                      May contain a wildcard glob ("opennlp-*.jar"). It 
must not be {@code null}.
+   */
+  public ClasspathModelFinder(String modelJarPrefix) {
+    Objects.requireNonNull(modelJarPrefix, "modelJarPrefix must not be null");
+    this.jarModelPrefix = modelJarPrefix;
+  }
+
+  /**
+   * Finds OpenNLP models within the classpath.
+   *
+   * @param reloadCache {@code true}, if the internal cache should explicitly 
be reloaded
+   * @return A Set of {@link ClassPathModelEntry ClassPathModelEntries}. It 
might be empty.
+   */
+  public Set<ClassPathModelEntry> findModels(boolean reloadCache) {
+
+    if (this.models == null || reloadCache) {
+      try (ScanResult sr = new 
ClassGraph().acceptJars(jarModelPrefix).disableDirScanning().scan()) {
+
+        final List<URI> classpathModels = getResourcesMatchingWildcard(sr, 
"*.bin");
+        final List<URI> classPathProperties = getResourcesMatchingWildcard(sr, 
"model.properties");
+
+        this.models = new HashSet<>();
+
+        for (URI model : classpathModels) {
+          URI m = null;
+          for (URI prop : classPathProperties) {
+            if (jarPathsMatch(model, prop)) {
+              m = prop;
+              break;
+            }
+          }
+          this.models.add(new ClassPathModelEntry(model, 
Optional.ofNullable(m)));
+
+        }
+      }
+    }
+    return this.models;
+  }
+
+  private List<URI> getResourcesMatchingWildcard(final ScanResult sr, final 
String resourceWildcard) {
+    try (final ResourceList resources = 
sr.getResourcesMatchingWildcard(resourceWildcard)) {
+      return resources.getURIs();
+    }
+  }
+
+  private boolean jarPathsMatch(URI uri1, URI uri2) {
+    final String[] parts1 = parseJarURI(uri1);
+    final String[] parts2 = parseJarURI(uri2);
+
+    if (parts1 == null || parts2 == null) {
+      return false;
+    }
+
+    return parts1[0].equals(parts2[0]);
+  }
+
+  private String[] parseJarURI(URI uri) {
+    try {
+      if ("jar".equals(uri.getScheme())) {

Review Comment:
   "jar" should be a constant.



##########
opennlp-tools-models/src/test/java/opennlp/tools/models/ClassPathModelUsageTest.java:
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.util.Set;
+
+import org.junit.jupiter.api.Test;
+
+import opennlp.tools.langdetect.LanguageDetector;
+import opennlp.tools.langdetect.LanguageDetectorME;
+import opennlp.tools.langdetect.LanguageDetectorModel;
+import opennlp.tools.sentdetect.SentenceDetector;
+import opennlp.tools.sentdetect.SentenceDetectorME;
+import opennlp.tools.sentdetect.SentenceModel;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertNotNull;
+
+public class ClassPathModelUsageTest {
+
+  @Test
+  public void testLanguageDetection() throws IOException {
+
+    final ClassPathModel model = 
getClassPathModel("opennlp-models-langdetect-*.jar");

Review Comment:
   See previous comment on @ BeforeEach for SUT classes.





> OpenNLP Models: Provide a Finder / Loader Implementation
> --------------------------------------------------------
>
>                 Key: OPENNLP-1567
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1567
>             Project: OpenNLP
>          Issue Type: New Feature
>            Reporter: Richard Zowalla
>            Assignee: Richard Zowalla
>            Priority: Major
>             Fix For: 2.3.4, 2.4.0
>
>
> as the title says



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to