[
https://issues.apache.org/jira/browse/OPENNLP-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18067504#comment-18067504
]
Martin Wiesner commented on OPENNLP-660:
----------------------------------------
Thanks [~warun26] for sharing your ideas and work done. Are you still
interested and willing to contribute? Github Repo is the way to go.
> Include list of stop words for various languages
> ------------------------------------------------
>
> Key: OPENNLP-660
> URL: https://issues.apache.org/jira/browse/OPENNLP-660
> Project: OpenNLP
> Issue Type: New Feature
> Components: Parser, Stemmer
> Affects Versions: tools-1.5.3
> Environment: all
> Reporter: Martin Wunderlich
> Priority: Minor
> Labels: features, language, model
> Original Estimate: 0.05h
> Remaining Estimate: 0.05h
>
> This feature request is for inclusion of list of stop words for various
> languages. These stop word lists can be used to reduce the noise caused by by
> frequent but irrelevant words, e.g. when tokenizing texts. The list could be
> a simple list of words for a first iteration, but could also include
> multi-stopwords, which will apply to n-grams (i.e. a word in the list will
> serve to "stop" a multi-word n-gram).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)