[
https://issues.apache.org/jira/browse/OPENNLP-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854961#comment-17854961
]
ASF GitHub Bot commented on OPENNLP-1567:
-----------------------------------------
mawiesne commented on code in PR #606:
URL: https://github.com/apache/opennlp/pull/606#discussion_r1639398105
##########
opennlp-tools-models/src/main/java/opennlp/tools/models/ClasspathModelFinder.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.net.URI;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
+
+import io.github.classgraph.ClassGraph;
Review Comment:
I'm not against classgraph as a library.
Nevertheless: is there an easy way to scan/look for model files in the
classpath _without_ introducing this 3rd party dependency? Aka: Can we realize
it with plain JDK classes?
wdyt @jzonthemtn @kinow @rzo1 ?
##########
opennlp-tools-models/src/main/java/opennlp/tools/models/ClasspathModelFinder.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.net.URI;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
+
+import io.github.classgraph.ClassGraph;
+import io.github.classgraph.ResourceList;
+import io.github.classgraph.ScanResult;
+
+
+/**
+ * Enables the detection of OpenNLP models in the classpath.
+ */
+public class ClasspathModelFinder {
+
+ private static final String OPENNLP_MODEL_JAR_PREFIX =
"opennlp-models-*.jar";
+ private final String jarModelPrefix;
+ private Set<ClassPathModelEntry> models;
+
+ /**
+ * By default, it scans for "opennlp-models-*.jar".
+ */
+ public ClasspathModelFinder() {
+ this(OPENNLP_MODEL_JAR_PREFIX);
+ }
+
+ /**
+ * @param modelJarPrefix The leafnames of the jars that should be canned
(e.g. "opennlp.jar").
+ * May contain a wildcard glob ("opennlp-*.jar"). It
must not be {@code null}.
+ */
+ public ClasspathModelFinder(String modelJarPrefix) {
+ Objects.requireNonNull(modelJarPrefix, "modelJarPrefix must not be null");
+ this.jarModelPrefix = modelJarPrefix;
+ }
+
+ /**
+ * Finds OpenNLP models within the classpath.
+ *
+ * @param reloadCache {@code true}, if the internal cache should explicitly
be reloaded
+ * @return A Set of {@link ClassPathModelEntry ClassPathModelEntries}. It
might be empty.
+ */
+ public Set<ClassPathModelEntry> findModels(boolean reloadCache) {
+
+ if (this.models == null || reloadCache) {
+ try (ScanResult sr = new
ClassGraph().acceptJars(jarModelPrefix).disableDirScanning().scan()) {
Review Comment:
See comment above on introducing ClassGraph as a dependency.
##########
opennlp-tools-models/src/main/java/opennlp/tools/models/ClasspathModelFinder.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.net.URI;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
+
+import io.github.classgraph.ClassGraph;
+import io.github.classgraph.ResourceList;
+import io.github.classgraph.ScanResult;
+
+
+/**
+ * Enables the detection of OpenNLP models in the classpath.
+ */
+public class ClasspathModelFinder {
+
+ private static final String OPENNLP_MODEL_JAR_PREFIX =
"opennlp-models-*.jar";
+ private final String jarModelPrefix;
+ private Set<ClassPathModelEntry> models;
+
+ /**
+ * By default, it scans for "opennlp-models-*.jar".
+ */
+ public ClasspathModelFinder() {
+ this(OPENNLP_MODEL_JAR_PREFIX);
+ }
+
+ /**
+ * @param modelJarPrefix The leafnames of the jars that should be canned
(e.g. "opennlp.jar").
+ * May contain a wildcard glob ("opennlp-*.jar"). It
must not be {@code null}.
+ */
+ public ClasspathModelFinder(String modelJarPrefix) {
+ Objects.requireNonNull(modelJarPrefix, "modelJarPrefix must not be null");
+ this.jarModelPrefix = modelJarPrefix;
+ }
+
+ /**
+ * Finds OpenNLP models within the classpath.
+ *
+ * @param reloadCache {@code true}, if the internal cache should explicitly
be reloaded
+ * @return A Set of {@link ClassPathModelEntry ClassPathModelEntries}. It
might be empty.
+ */
+ public Set<ClassPathModelEntry> findModels(boolean reloadCache) {
+
+ if (this.models == null || reloadCache) {
+ try (ScanResult sr = new
ClassGraph().acceptJars(jarModelPrefix).disableDirScanning().scan()) {
+
+ final List<URI> classpathModels = getResourcesMatchingWildcard(sr,
"*.bin");
+ final List<URI> classPathProperties = getResourcesMatchingWildcard(sr,
"model.properties");
+
+ this.models = new HashSet<>();
+
+ for (URI model : classpathModels) {
+ URI m = null;
+ for (URI prop : classPathProperties) {
+ if (jarPathsMatch(model, prop)) {
+ m = prop;
+ break;
+ }
+ }
+ this.models.add(new ClassPathModelEntry(model,
Optional.ofNullable(m)));
+
+ }
+ }
+ }
+ return this.models;
+ }
+
+ private List<URI> getResourcesMatchingWildcard(final ScanResult sr, final
String resourceWildcard) {
+ try (final ResourceList resources =
sr.getResourcesMatchingWildcard(resourceWildcard)) {
+ return resources.getURIs();
+ }
+ }
+
+ private boolean jarPathsMatch(URI uri1, URI uri2) {
+ final String[] parts1 = parseJarURI(uri1);
+ final String[] parts2 = parseJarURI(uri2);
+
+ if (parts1 == null || parts2 == null) {
+ return false;
+ }
+
+ return parts1[0].equals(parts2[0]);
+ }
+
+ private String[] parseJarURI(URI uri) {
+ try {
+ if ("jar".equals(uri.getScheme())) {
+ final String ssp = uri.getSchemeSpecificPart();
+ final int separatorIndex = ssp.indexOf("!/");
+ if (separatorIndex > 0) {
+ final String jarFileUri = ssp.substring(0, separatorIndex);
+ final String entryPath = ssp.substring(separatorIndex + 2);
+ return new String[] {jarFileUri, entryPath};
+ }
+ }
+ } catch (Exception ignored) {
Review Comment:
Pls leave a comment, why it safe to ignore all exceptions here.
##########
opennlp-tools-models/src/test/java/opennlp/tools/models/ClassPathModelFinderTest.java:
##########
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.util.Set;
+
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertNotNull;
+
+public class ClassPathModelFinderTest {
+
+ @Test
+ public void testFindOpenNLPModels() {
+ final ClasspathModelFinder finder = new ClasspathModelFinder();
Review Comment:
Please transparently prepare this "system under test" object in a
@BeforeEach annotated _setup_ method. Avoids code duplication, see other test
case.
##########
opennlp-tools-models/pom.xml:
##########
@@ -0,0 +1,95 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
+ <modelVersion>4.0.0</modelVersion>
+ <parent>
+ <groupId>org.apache.opennlp</groupId>
+ <artifactId>opennlp</artifactId>
+ <version>2.3.4-SNAPSHOT</version>
+ </parent>
+
+ <artifactId>opennlp-tools-models</artifactId>
+ <packaging>jar</packaging>
+ <name>Apache OpenNLP Tools Models</name>
+
+ <dependencies>
+ <dependency>
+ <groupId>org.apache.opennlp</groupId>
+ <artifactId>opennlp-tools</artifactId>
+ <version>${project.version}</version>
+ <scope>provided</scope>
+ </dependency>
+
+ <dependency>
+ <groupId>io.github.classgraph</groupId>
+ <artifactId>classgraph</artifactId>
+ <version>${classgraph.version}</version>
+ </dependency>
+
+ <dependency>
+ <groupId>org.slf4j</groupId>
+ <artifactId>slf4j-api</artifactId>
+ </dependency>
+
+ <dependency>
+ <groupId>org.junit.jupiter</groupId>
+ <artifactId>junit-jupiter-api</artifactId>
+ <scope>test</scope>
+ </dependency>
+
+ <dependency>
+ <groupId>org.junit.jupiter</groupId>
+ <artifactId>junit-jupiter-engine</artifactId>
Review Comment:
Do we really need this as a test dep?
##########
opennlp-tools-models/src/main/java/opennlp/tools/models/ClasspathModelFinder.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.net.URI;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
+
+import io.github.classgraph.ClassGraph;
+import io.github.classgraph.ResourceList;
+import io.github.classgraph.ScanResult;
+
+
+/**
+ * Enables the detection of OpenNLP models in the classpath.
+ */
+public class ClasspathModelFinder {
+
+ private static final String OPENNLP_MODEL_JAR_PREFIX =
"opennlp-models-*.jar";
Review Comment:
This assumption should be made transparent to the user of this class in the
JavaDoc of that class. Please add a hint, so the caller knows, what models will
be accepted/scanned for, that is, what the naming pattern is, "starts with..."
If required otherwise, a custom pattern can be used, see constructor
alternative.
It should be helpful, to read of this directly at the class' JavaDoc.
##########
opennlp-tools-models/src/test/java/opennlp/tools/models/ClassPathModelLoaderTest.java:
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.util.Set;
+
+import org.junit.jupiter.api.Test;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertNotNull;
+
+public class ClassPathModelLoaderTest {
+
+ @Test
+ public void testLoadOpenNLPModel() throws Exception {
+ final ClasspathModelFinder finder = new
ClasspathModelFinder("opennlp-models-langdetect-*.jar");
+
+ final Set<ClassPathModelEntry> models = finder.findModels(false);
+ assertNotNull(models);
+ assertEquals(1, models.size());
+
+ final ClassPathModelEntry entry = models.iterator().next();
+ assertNotNull(entry);
+ assertNotNull(entry.model());
+ assertNotNull(entry.properties());
+
+ //test
+ final ClassPathModelLoader loader = new ClassPathModelLoader();
Review Comment:
See previous comment on @BeforeEach for SUT classes.
##########
opennlp-tools-models/src/main/java/opennlp/tools/models/ClasspathModelFinder.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.net.URI;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Objects;
+import java.util.Optional;
+import java.util.Set;
+
+import io.github.classgraph.ClassGraph;
+import io.github.classgraph.ResourceList;
+import io.github.classgraph.ScanResult;
+
+
+/**
+ * Enables the detection of OpenNLP models in the classpath.
+ */
+public class ClasspathModelFinder {
+
+ private static final String OPENNLP_MODEL_JAR_PREFIX =
"opennlp-models-*.jar";
+ private final String jarModelPrefix;
+ private Set<ClassPathModelEntry> models;
+
+ /**
+ * By default, it scans for "opennlp-models-*.jar".
+ */
+ public ClasspathModelFinder() {
+ this(OPENNLP_MODEL_JAR_PREFIX);
+ }
+
+ /**
+ * @param modelJarPrefix The leafnames of the jars that should be canned
(e.g. "opennlp.jar").
+ * May contain a wildcard glob ("opennlp-*.jar"). It
must not be {@code null}.
+ */
+ public ClasspathModelFinder(String modelJarPrefix) {
+ Objects.requireNonNull(modelJarPrefix, "modelJarPrefix must not be null");
+ this.jarModelPrefix = modelJarPrefix;
+ }
+
+ /**
+ * Finds OpenNLP models within the classpath.
+ *
+ * @param reloadCache {@code true}, if the internal cache should explicitly
be reloaded
+ * @return A Set of {@link ClassPathModelEntry ClassPathModelEntries}. It
might be empty.
+ */
+ public Set<ClassPathModelEntry> findModels(boolean reloadCache) {
+
+ if (this.models == null || reloadCache) {
+ try (ScanResult sr = new
ClassGraph().acceptJars(jarModelPrefix).disableDirScanning().scan()) {
+
+ final List<URI> classpathModels = getResourcesMatchingWildcard(sr,
"*.bin");
+ final List<URI> classPathProperties = getResourcesMatchingWildcard(sr,
"model.properties");
+
+ this.models = new HashSet<>();
+
+ for (URI model : classpathModels) {
+ URI m = null;
+ for (URI prop : classPathProperties) {
+ if (jarPathsMatch(model, prop)) {
+ m = prop;
+ break;
+ }
+ }
+ this.models.add(new ClassPathModelEntry(model,
Optional.ofNullable(m)));
+
+ }
+ }
+ }
+ return this.models;
+ }
+
+ private List<URI> getResourcesMatchingWildcard(final ScanResult sr, final
String resourceWildcard) {
+ try (final ResourceList resources =
sr.getResourcesMatchingWildcard(resourceWildcard)) {
+ return resources.getURIs();
+ }
+ }
+
+ private boolean jarPathsMatch(URI uri1, URI uri2) {
+ final String[] parts1 = parseJarURI(uri1);
+ final String[] parts2 = parseJarURI(uri2);
+
+ if (parts1 == null || parts2 == null) {
+ return false;
+ }
+
+ return parts1[0].equals(parts2[0]);
+ }
+
+ private String[] parseJarURI(URI uri) {
+ try {
+ if ("jar".equals(uri.getScheme())) {
Review Comment:
"jar" should be a constant.
##########
opennlp-tools-models/src/test/java/opennlp/tools/models/ClassPathModelUsageTest.java:
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package opennlp.tools.models;
+
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.util.Set;
+
+import org.junit.jupiter.api.Test;
+
+import opennlp.tools.langdetect.LanguageDetector;
+import opennlp.tools.langdetect.LanguageDetectorME;
+import opennlp.tools.langdetect.LanguageDetectorModel;
+import opennlp.tools.sentdetect.SentenceDetector;
+import opennlp.tools.sentdetect.SentenceDetectorME;
+import opennlp.tools.sentdetect.SentenceModel;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertNotNull;
+
+public class ClassPathModelUsageTest {
+
+ @Test
+ public void testLanguageDetection() throws IOException {
+
+ final ClassPathModel model =
getClassPathModel("opennlp-models-langdetect-*.jar");
Review Comment:
See previous comment on @ BeforeEach for SUT classes.
> OpenNLP Models: Provide a Finder / Loader Implementation
> --------------------------------------------------------
>
> Key: OPENNLP-1567
> URL: https://issues.apache.org/jira/browse/OPENNLP-1567
> Project: OpenNLP
> Issue Type: New Feature
> Reporter: Richard Zowalla
> Assignee: Richard Zowalla
> Priority: Major
> Fix For: 2.3.4, 2.4.0
>
>
> as the title says
--
This message was sent by Atlassian Jira
(v8.20.10#820010)