[
https://issues.apache.org/jira/browse/OPENNLP-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904447#comment-17904447
]
ASF GitHub Bot commented on OPENNLP-1664:
-----------------------------------------
rzo1 commented on code in PR #194:
URL: https://github.com/apache/opennlp-sandbox/pull/194#discussion_r1877807446
##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/AbstractWSDisambiguator.java:
##########
@@ -22,37 +22,41 @@
import java.security.InvalidParameterException;
import java.util.ArrayList;
import java.util.List;
+import java.util.regex.Pattern;
+
+import net.sf.extjwnl.JWNLException;
Review Comment:
Just checked the license, which is BSD-3 -> ok for us.
##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/AbstractWSDisambiguator.java:
##########
@@ -140,14 +126,65 @@ public List<String> disambiguate(String[]
tokenizedContext,
}
}
}
-
return senses;
}
/**
- * @param sample
- * @return result as an array of WordNet IDs
+ * Conducts disambiguation via available {@link Synset synsets} for the
specified
+ * {@code wordTag}.
+ *
+ * @param wordTag A combination of word and POS tag, separated by a {@code
.} character.
+ * @return The disambiguated sense and key if disambiguation was successful,
+ * {@code null} otherwise.
*/
- public abstract String disambiguate(WSDSample sample);
+ protected String disambiguate(String wordTag) {
+
+ String[] splitWordTag = SPLIT.split(wordTag);
+
+ String word = splitWordTag[0];
Review Comment:
Is it guaranteed that `wordTag != null` and `array.length == 2` ? (most
likely CnP via pull method up)
> Switch to pre-trained UD models in WSD component
> ------------------------------------------------
>
> Key: OPENNLP-1664
> URL: https://issues.apache.org/jira/browse/OPENNLP-1664
> Project: OpenNLP
> Issue Type: Task
> Components: wsd
> Reporter: Martin Wiesner
> Assignee: Martin Wiesner
> Priority: Major
> Fix For: 2.5.2
>
>
> Atm, the opennlp-wsd sandbox component uses old (v1.5) models for testing,
> contained as binary artifacts in the test resources directory.
> Aims:
> - Get rid of this dependency on old model files
> - Switch to new pre-trained UD models (via OPENNLP_DOWNLOAD_HOME), maven
> artifacts can be added in a separate issue
> - Make the existing tests and integration / evaluation tests pass with the UD
> based models
> - Modernize and tidy up some existing code structures in terms of API and
> efficiency
--
This message was sent by Atlassian Jira
(v8.20.10#820010)