[jira] [Commented] (OPENNLP-1185) Tokenizers should be able to output a new line token

ASF GitHub Bot (Jira) Tue, 29 Mar 2022 07:12:16 -0700


    [ 
https://issues.apache.org/jira/browse/OPENNLP-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514117#comment-17514117
 ]


ASF GitHub Bot commented on OPENNLP-1185:
-----------------------------------------

jzonthemtn commented on pull request #337:
URL: https://github.com/apache/opennlp/pull/337#issuecomment-1081922737


   I built and tested this branch without problems. The changes don't affect 
the current behavior of the tokenizers nor change the interfaces. A very long 
time getting to this but thanks @SchmaR for your contribution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Tokenizers should be able to output a new line token
> ----------------------------------------------------
>
>                 Key: OPENNLP-1185
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1185
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Tokenizer
>            Reporter: Jörn Kottmann
>            Priority: Major
>              Labels: ctakes
>
> Some use cases need the tokenizers to also output new line tokens. This is 
> needed e.g. by cTakes to process clinical notes, or by the name finder to 
> process list of names where each name is written in one line. Also it helps 
> the name finder to process news articles.
> To fix this issue add an option to all three tokenizers to emit new line 
> tokens.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (OPENNLP-1185) Tokenizers should be able to output a new line token

Reply via email to