[ 
https://issues.apache.org/jira/browse/OPENNLP-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651285#comment-17651285
 ] 

ASF GitHub Bot commented on OPENNLP-1214:
-----------------------------------------

rzo1 commented on PR #329:
URL: https://github.com/apache/opennlp/pull/329#issuecomment-1362847889

   Perhaps we would need to write some **JMH** benchmarks to see any 
performance impacts / drawbacks. Using ` System.currentTimeMillis()` has its 
own drawbacks if used for benchmarking.
   
   If this is still a concern, we should write some **JMH** benchmarks to see, 
if there is a real difference (aside from JVM quirks / optimizations). Thoughts?




> use hash to avoid linear search in DefaultEndOfSentenceScanner
> --------------------------------------------------------------
>
>                 Key: OPENNLP-1214
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1214
>             Project: OpenNLP
>          Issue Type: Improvement
>    Affects Versions: 1.9.0
>            Reporter: Koji Sekiguchi
>            Assignee: Koji Sekiguchi
>            Priority: Minor
>
> When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to 
> check if each characters in the sentence is one of eos characters. I think 
> we'd better use HashSet to keep eosCharacters instead of char[].
> In accordance with this replacement, I'd like to make 
> getEndOfSentenceCharacters() deprecated because it returns char[] and nobody 
> in OpenNLP calls it at present, and I'd like to add the equivalent method 
> which returns Set<Character> of eos chars. Though it cannot keep the order of 
> eos chars but I don't think it can be a problem anyway.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to