[ 
https://issues.apache.org/jira/browse/LUCENE-9457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176164#comment-17176164
 ] 

Dawid Weiss commented on LUCENE-9457:
-------------------------------------

bq. It could be hotspot noise maybe?  

Could be. Or it could be something else running in the background? It'd be good 
to somehow monitor background CPU activity while these benchmarks are being 
made. I'm not much of a sysop to help out here though. 

> Why is Kuromoji tokenization throughput bimodal?
> ------------------------------------------------
>
>                 Key: LUCENE-9457
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9457
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Priority: Major
>
> With the recent accidental regression of Japanese (Kuromoji) tokenization 
> throughput due to exciting FST optimizations, we [added new nightly Lucene 
> benchmarks|https://github.com/mikemccand/luceneutil/issues/64] to measure 
> tokenization throughput for {{JapaneseTokenizer}}: 
> [https://home.apache.org/~mikemccand/lucenebench/analyzers.html]
> It has already been running for ~5-6 weeks now!  But for some reason, it 
> looks bi-modal?  "Normally" it is ~.45 M tokens/sec, but for two data points 
> it dropped down to ~.33 M tokens/sec, which is odd.  It could be hotspot 
> noise maybe?  But would be good to get to the root cause and fix it if 
> possible.
> Hotspot noise that randomly steals ~27% of your tokenization throughput is no 
> good!!
> Or does anyone have any other ideas of what could be bi-modal in Kuromoji?  I 
> don't think [this performance 
> test|https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/TestAnalyzerPerf.java]
>  has any randomness in it...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to