[ 
https://issues.apache.org/jira/browse/OPENNLP-1664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904447#comment-17904447
 ] 

ASF GitHub Bot commented on OPENNLP-1664:
-----------------------------------------

rzo1 commented on code in PR #194:
URL: https://github.com/apache/opennlp-sandbox/pull/194#discussion_r1877807446


##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/AbstractWSDisambiguator.java:
##########
@@ -22,37 +22,41 @@
 import java.security.InvalidParameterException;
 import java.util.ArrayList;
 import java.util.List;
+import java.util.regex.Pattern;
+
+import net.sf.extjwnl.JWNLException;

Review Comment:
   Just checked the license, which is BSD-3 -> ok for us.



##########
opennlp-wsd/src/main/java/opennlp/tools/disambiguator/AbstractWSDisambiguator.java:
##########
@@ -140,14 +126,65 @@ public List<String> disambiguate(String[] 
tokenizedContext,
         }
       }
     }
-
     return senses;
   }
 
   /**
-   * @param sample
-   * @return result as an array of WordNet IDs
+   * Conducts disambiguation via available {@link Synset synsets} for the 
specified
+   * {@code wordTag}.
+   *
+   * @param wordTag A combination of word and POS tag, separated by a {@code 
.} character.
+   * @return The disambiguated sense and key if disambiguation was successful,
+   *         {@code null} otherwise.
    */
-  public abstract String disambiguate(WSDSample sample);
+  protected String disambiguate(String wordTag) {
+
+    String[] splitWordTag = SPLIT.split(wordTag);
+
+    String word = splitWordTag[0];

Review Comment:
   Is it guaranteed that `wordTag != null` and `array.length == 2` ? (most 
likely CnP via pull method up)





> Switch to pre-trained UD models in WSD component
> ------------------------------------------------
>
>                 Key: OPENNLP-1664
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1664
>             Project: OpenNLP
>          Issue Type: Task
>          Components: wsd
>            Reporter: Martin Wiesner
>            Assignee: Martin Wiesner
>            Priority: Major
>             Fix For: 2.5.2
>
>
> Atm, the opennlp-wsd sandbox component uses old (v1.5) models for testing, 
> contained as binary artifacts in the test resources directory.
> Aims:
> - Get rid of this dependency on old model files
> - Switch to new pre-trained UD models (via OPENNLP_DOWNLOAD_HOME), maven 
> artifacts can be added in a separate issue
> - Make the existing tests and integration / evaluation tests pass with the UD 
> based models
> - Modernize and tidy up some existing code structures in terms of API and 
> efficiency



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to