Hi Micke,
There is some good research in here - have you had a chance to create
some issues in Jira from this?

On Fri, Feb 23, 2018 at 6:28 AM, Michael Burman <mibur...@redhat.com> wrote:
> Hi,
>
> I was referring to this article by Shipilev (there are few small issues
> forgotten in that url you pasted):
>
> https://shipilev.net/blog/2014/nanotrusting-nanotime/
>
> And his lovely recommendation on it: "System.nanoTime is as bad as
> String.intern now: you can use it, but use it wisely. ". And Cassandra uses
> it quite a lot in the write path at least. There isn't necessarily a better
> option in Java for it, but for that reason we shouldn't push them everywhere
> in the code "for fun".
>
>   - Micke
>
>
>
> On 02/22/2018 06:08 PM, Jeremiah D Jordan wrote:
>>
>> re: nanoTime vs currentTimeMillis there is a good blog post here about the
>> timing of both and how your choice of Linux clock source can drastically
>> effect the speed of the calls, and also showing that in general on linux
>> there is no perf improvement for one over the other.
>> http://pzemtsov.github.io/2017/07/23/the-slow-currenttimemillis.html
>>
>>> On Feb 22, 2018, at 11:01 AM, Blake Eggleston <beggles...@apple.com>
>>> wrote:
>>>
>>> Hi Micke,
>>>
>>> This is really cool, thanks for taking the time to investigate this. I
>>> believe the metrics around memtable insert time come in handy in identifying
>>> high partition contention in the memtable. I know I've been involved in a
>>> situation over the past year where we got actionable info from this metric.
>>> Reducing resolution to milliseconds is probably a no go since most things in
>>> this path should complete in less than a millisecond.
>>>
>>> Revisiting the use of the codahale metrics in the hot path like this
>>> definitely seems like a good idea though. I don't think it's been something
>>> we've talked about a lot, and it definitely looks like we could benefit from
>>> using something more specialized here. I think it's worth doing, especially
>>> since there won't be any major changes to how we do threading in 4.0. It's
>>> probably also worth opening a JIRA and investigating the calls to nano time.
>>> We at least need microsecond resolution here, and there could be something
>>> we haven't thought of? It's worth a look at least.
>>>
>>> Thanks,
>>>
>>> Blake
>>>
>>> On 2/22/18, 6:10 AM, "Michael Burman" <mibur...@redhat.com> wrote:
>>>
>>>     Hi,
>>>
>>>     I wanted to get some input from the mailing list before making a JIRA
>>>     and potential fixes. I'll touch the performance more on latter part,
>>> but
>>>     there's one important question regarding the write latency metric
>>>     recording place. Currently we measure the writeLatency (and metric
>>> write
>>>     sampler..) in ColumnFamilyStore.apply() and this is also the metric
>>> we
>>>     then replicate to Keyspace metrics etc.
>>>
>>>     This is an odd place for writeLatency. Not to mention it is in a
>>>     hot-path of Memtable-modifications, but it also does not measure the
>>>     real write latency, since it completely ignores the CommitLog latency
>>> in
>>>     that same process. Is the intention really to measure
>>>     Memtable-modification latency only or the actual write latencies?
>>>
>>>     Then the real issue.. this single metric is a cause of huge overhead
>>> in
>>>     Memtable processing. There are several metrics / events in the CFS
>>> apply
>>>     method, including metric sampler, storageHook reportWrite,
>>>     colUpdateTimeDeltaHistogram and metric.writeLatency. These are not
>>> free
>>>     at all when it comes to the processing. I made a small JMH benchmark
>>>     here:
>>> https://gist.github.com/burmanm/b5b284bc9f1d410b1d635f6d3dac3ade
>>>     that I'll be referring to.
>>>
>>>     The most offending of all these metrics is the writeLatency metric.
>>> What
>>>     it does is update the latency in codahale's timer, doing a histogram
>>>     update and then going through all the parent metrics also which
>>> update
>>>     the keyspace writeLatency and globalWriteLatency. When measuring the
>>>     performance of Memtable.put with parameter of 1 partition (to reduce
>>> the
>>>     ConcurrentSkipListMap search speed impact - that's separate issue and
>>>     takes a little bit longer to solve although I've started to prototype
>>>     something..) on my machine I see 1.3M/s performance with the metric
>>> and
>>>     when it is disabled the performance climbs to 4M/s. So the overhead
>>> for
>>>     this single metric is ~2/3 of total performance. That's insane. My
>>> perf
>>>     stats indicate that the CPU is starved as it can't get enough data
>>> in.
>>>
>>>     Removing the replication from TableMetrics to the Keyspace & global
>>>     latencies in the write time (and doing this when metrics are
>>> requested
>>>     instead) improves the performance to 2.1M/s on my machine. It's an
>>>     improvement, but it's still huge amount. Even when we pressure the
>>>     ConcurrentSkipListMap with 100 000 partitions in one active Memtable,
>>>     the performance drops by about ~40% due to this metric, so it's never
>>> free.
>>>
>>>     i did not find any discussion replacing the metric processing with
>>>     something faster, so has this been considered before? At least for
>>> these
>>>     performance sensitive ones. The other issue is obviously the use of
>>>     System.nanotime() which by itself is very slow (two System.nanotime()
>>>     calls eat another ~1M/s from the performance)
>>>
>>>     My personal quick fix would be to move writeLatency to
>>> Keyspace.apply,
>>>     change write time aggregates to read time processing (metrics are
>>> read
>>>     less often than we write data) and maybe even reduce the nanotime ->
>>>     currentTimeMillis (even given it's relative lack of precision). That
>>> is
>>>     - if these metrics make any sense at all at CFS level? Maybe these
>>>     should be measured from the network processing time (including all
>>> the
>>>     deserializations and such) ? Especially if at some point the smarter
>>>     threading / eventlooping changes go forward (in which case they might
>>>     sleep at some "queue" for a while).
>>>
>>>        - Micke
>>>
>>>
>>>     ---------------------------------------------------------------------
>>>     To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>     For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to