[
https://issues.apache.org/jira/browse/OPENNLP-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643021#comment-17643021
]
ASF GitHub Bot commented on OPENNLP-1404:
-----------------------------------------
kinow commented on code in PR #446:
URL: https://github.com/apache/opennlp/pull/446#discussion_r1038987138
##########
opennlp-tools/src/main/java/opennlp/tools/postag/WordTagSampleStream.java:
##########
@@ -58,6 +61,7 @@ public POSSample read() throws IOException {
try {
sample = POSSample.parse(sentence);
} catch (InvalidFormatException e) {
+ // TODO: An exception in error case should be thrown.
Review Comment:
:+1:
##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerFactory.java:
##########
@@ -363,11 +382,11 @@ public void validateArtifactMap() throws
InvalidFormatException {
if (ngramDictEntry != null && !(ngramDictEntry instanceof Dictionary)) {
throw new InvalidFormatException("NGram dictionary has wrong type!");
}
-
}
+ // reduced visibility to ensure deprecation is respected in future versions
@Deprecated
Review Comment:
We are more strict about changing visibility or removing public methods in
Commons between non-major releases, but I think it should be fine for OpenNLP,
as its API is not expected to be used by most users (i.e. most should be using
the tools to train/infer with models instead). Even then, worth adding a note
to the changelog about it IMO.
##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTagger.java:
##########
@@ -26,14 +26,38 @@ public interface POSTagger {
/**
* Assigns the sentence of tokens pos tags.
+ *
* @param sentence The sentence of tokens to be tagged.
- * @return an array of pos tags for each token provided in sentence.
+ * @return An array of pos tags for each token provided in {@code sentence}.
*/
String[] tag(String[] sentence);
+ /**
+ * Assigns the sentence of tokens pos tags.
+ *
+ * @param sentence The sentence of tokens to be tagged.
+ * @param additionalContext The context to provide additional information
with.
+ *
+ * @return An array of pos tags for each token provided in {@code sentence}.
+ */
String[] tag(String[] sentence, Object[] additionalContext);
+ /**
+ * Assigns the sentence the top-k {@link Sequence sequences}.
+ *
+ * @param sentence The sentence of tokens to be tagged.
+ *
+ * @return An array of {@link Sequence sequeneces} for each token provided
in {@code sentence}.
+ */
Sequence[] topKSequences(String[] sentence);
+ /**
+ * Assigns the sentence the top-k {@link Sequence sequences}.
+ *
+ * @param sentence The sentence of tokens to be tagged.
+ * @param additionalContext The context to provide additional information
with.
+ *
+ * @return An array of {@link Sequence sequeneces} for each token provided
in {@code sentence}.
Review Comment:
s/sequeneces/sequences
##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerFactory.java:
##########
@@ -95,8 +83,19 @@ public POSTaggerFactory(byte[] featureGeneratorBytes, final
Map<String, Object>
this.posDictionary = posDictionary;
}
+
+ // reduced visibility to ensure deprecation is respected in future versions
+ @Deprecated
+ POSTaggerFactory(Dictionary ngramDictionary, TagDictionary posDictionary) {
+ this.init(ngramDictionary, posDictionary);
+
+ // TODO: This could be made functional by creating some default feature
generation
+ // which uses the dictionary ...
Review Comment:
:ok_man:
##########
opennlp-tools/src/main/java/opennlp/tools/postag/TagDictionary.java:
##########
@@ -25,11 +25,18 @@
public interface TagDictionary {
/**
- * Returns a list of valid tags for the specified word.
+ * Retrieves a list of valid tags for the specified {@code word}.
*
* @param word The word.
- * @return A list of valid tags for the specified word or null if no
information
- * is available for that word.
+ * @return An array of valid tags for the specified {@code word} or {@code
null} if
+ * no information is available for that word.
*/
String[] getTags(String word);
+
+ /**
+ * Whether if the dictionary is case-sensitive or not.
Review Comment:
I think we can drop the `if` (i.e. Whether the dictionary is... or not).
##########
opennlp-tools/src/main/java/opennlp/tools/postag/POSTaggerFactory.java:
##########
@@ -225,49 +240,51 @@ public void setTagDictionary(TagDictionary dictionary) {
this.posDictionary = dictionary;
}
+ /**
+ * @return The key-value based resources map, or an empty map.
+ */
protected Map<String, Object> getResources() {
-
-
if (resources != null) {
return resources;
}
return Collections.emptyMap();
}
+ /**
+ * @return The feature generator bytes used.
+ */
protected byte[] getFeatureGenerator() {
return featureGeneratorBytes;
}
+ /**
+ * @return The {@link TagDictionary} used.
+ */
public TagDictionary getTagDictionary() {
if (this.posDictionary == null && artifactProvider != null)
this.posDictionary =
artifactProvider.getArtifact(TAG_DICTIONARY_ENTRY_NAME);
return this.posDictionary;
}
- /**
- * @deprecated this will be reduced in visibility and later removed
- */
- @Deprecated
- public Dictionary getDictionary() {
+ @Deprecated // will be removed when only 8 series models are supported
+ private Dictionary getDictionary() {
Review Comment:
Ditto the note below, about changing public methods :point_down:
> Enhance JavaDoc in opennlp.tools.postag package
> -----------------------------------------------
>
> Key: OPENNLP-1404
> URL: https://issues.apache.org/jira/browse/OPENNLP-1404
> Project: OpenNLP
> Issue Type: Improvement
> Components: Documentation, POS Tagger
> Affects Versions: 2.1.0
> Reporter: Martin Wiesner
> Priority: Minor
> Fix For: 2.1.1
>
>
> The JavaDoc of the _opennlp.tools.postag_ package suffers from several
> inconsistencies and missing descriptions. Moreover, several typos are present
> that need sanitizing.
> It needs enhancements and/or additions to provide more clarity for readers of
> this part of the OpenNLP API.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)