Re: [DISCUSSION] Handling precision when reporting stream_throughput_outbound_megabits_per_sec and inter_dc_stream_throughput_outbound_megabits_per_sec

2022-07-25 Thread Ekaterina Dimitrova
For the record, I got failing unit test JMXGetterTest which reminded me
that tools like JConsole will be just showing “unavailable” and no reason
no matter what type of exception we add… I guess in that case I will error
out only in nodetool where people can see the error and document loudly
everywhere the rounding for JMX methods. In that case probably will
deprecate those and add also new ones returning megabits in double for
people

On Thu, 21 Jul 2022 at 16:56, Ekaterina Dimitrova 
wrote:

> Thanks Caleb, seems like me and you are on the same page here.
> I assume lazy consensus with the others and I will finish the respective
> ticket tomorrow. Thanks
> New methods - return double, old - complain loudly and error out
>
> On Wed, 20 Jul 2022 at 13:03, Caleb Rackliffe 
> wrote:
>
>> My $0.02...
>>
>> Options from best to worst:
>>
>> 1.) Use double when possible (chopping off non-sig-digs after the decimal
>> if you want)
>> 2.) Complain loudly that we would have to round and error out.
>> --line of acceptable behavior
>> 100.) Round silently
>>
>> ;)
>>
>> On Mon, Jul 18, 2022 at 9:26 AM Ekaterina Dimitrova <
>> e.dimitr...@gmail.com> wrote:
>>
>>>
>>> Hi everyone,
>>> In 4.0 stream_throughput_outbound_megabits_per_sec and 
>>> inter_dc_stream_throughput_outbound_megabits_per_sec
>>> were introduced.
>>> It was requested all data rate parameters to support mebibytes/s,
>>> kibibytes/s and bytes/s with the new config in 4.1 but to support backward
>>> compatibility. This is in place but there is one outstanding question I
>>> wanted to discuss with the community. As we know, converting between
>>> megabits and mebibytes produces double numbers which we already handle
>>> internally. Now, the question is how do we want to handle this in
>>> JMX/nodetool when reporting to the users. The new getters in mebibytes will
>>> be reporting in double to the users (I have a patch for that as we just
>>> added the mebibytes flags in nodetool for stream_throughput_outbound
>>> and inter_dc_stream_throughput_outbound in 4.1).
>>> I guess changing the getters to report double now will be a breaking
>>> change so the options I see:
>>> - we throw an error if we cannot report integer (the property was set in
>>> mebibytes not megabits) and ask the users to check the mib versions and
>>> report only integers
>>> - We document there can be a loss of precision and people should be
>>> looking into the mib one which will report exact numbers
>>>
>>> I guess I would prefer option 1 as people can easily miss anything
>>> documented.
>>>
>>> Similar is the question for the Settings Virtual Table, but I guess
>>> there we can switch all to double reporting or set to NA in case the old
>>> properties (in megabits/s) are not integers.
>>>
>>> Best regards,
>>> Ekaterina
>>>
>>


New episode of the Apache Cassandra(R) Corner Podcast

2022-07-25 Thread Aaron Ploetz
Link to next episode:

Ep6 - Jeff Beck (SmartThings)
https://drive.google.com/file/d/1_dhnMUlSF_8qByHdcoWQc0GD0rhsZjVf/view?usp=sharing

(You may have to download it to listen)

It will remain in staging for 72 hours, going live (assuming no objections)
on Thursday, July 28th.

If anyone should have any questions, comments, or if you want to be a
guest, please reach out to me.

Thanks everyone!

Aaron Ploetz


Re: [DISCUSS] Improve Commitlog write path

2022-07-25 Thread Elliott Sims
Not sure how many people this would end up applying to, but.. how does this
affect write latency in batch or group mode?  I'm running across what seems
to be a latency bottleneck in NVME drives that lack a capacitor-backed
cache where writes take ~5-10ms in batch mode, though the drive is of
course capable of far more throughput than that under higher concurrency,
so I'm quite curious if this could get better write latency with existing
hardware.

On Fri, Jul 22, 2022 at 11:59 AM Pawar, Amit  wrote:

> [AMD Official Use Only - General]
>
>
>
> Hi Scott,
>
>
>
> Thank you for your reply. I didn’t mean to argue and sorry if it appeared
> that way.
>
>
>
> I see that compaction is a complex activity once the data grows too big
> and definitely at some peak point improvement will go away due to some
> factors on some workloads. Will leave it to community to decide about this
> patch. TLP-stress testing is not done and will see how to do it.
>
>
>
> Regarding direct IO, it is interesting and need to think how to implement.
> Java 11 contains native API support and Java 8 does not. So some JNI calls
> implementation required to enable it for all Java versions. Some blockages
> are there and trying to find out how to overcomes this. Hopefully will
> succeed in implementing the same.
>
>
>
> Thanks,
>
> Amit
>
>
>
> *From:* C. Scott Andreas 
> *Sent:* Friday, July 22, 2022 10:24 PM
> *To:* dev@cassandra.apache.org
> *Cc:* Bowen Song ; Raghavendra, Prakash <
> prakash.raghaven...@amd.com>
> *Subject:* Re: [DISCUSS] Improve Commitlog write path
>
>
>
> [CAUTION: External Email]
>
> Amit, welcome and thank you for contributing the results from your test
> and opening this discussion.
>
>
>
> I don’t think anyone is arguing that the database shouldn’t take advantage
> of available hardware.
>
>
>
> A few things important to keep in mind when considering a patch like this:
>
>
>
> - Where the actual bottleneck in the database will be for increased write
> throughput. As Bowen and Benedict mentioned, the amount of work performed
> by the commitlog versus the accrued cost of integrating flushed SSTables
> into the LSM tree is dramatically weighed toward compaction. A multi-day
> benchmark that allows the database to accrue and incorporate a sizable
> amount of data is much more likely to produce measurements that approximate
> what users of Cassandra may experience in production use.
>
>
>
> - Making something multi-threaded doesn’t reduce the amount of work done;
> it redistributes it. In a saturated system, this means resources are
> allocated in an environment of trade offs. Allocating additional resources
> to the front door will reduce the resources available to compaction, live
> serving, etc. in environments where cores are not limitless and free. This
> is why the holistic view of performance others are speaking to is important.
>
>
>
> - How such a change alters the balance of the database’s threading model,
> and where the bottleneck moved to. Users who overrun the commitlog’s
> capability today are likely to be even more negatively impacted by
> compaction overhead of backpressure is lost at the front door. The
> meta-point to consider is “how does this change affect the performance
> characteristics of a live database?”
>
>
>
> - We also need to balance complexity and correctness in the
> implementation. If the patch is straightforward, has a well-defined locking
> scheme, and ideally a suite of randomized tests, that can help mitigate
> concerns related to this.
>
>
>
> It sounds like several would welcome such a patch for review. Just want to
> signpost that the gains and tradeoffs aren’t always clear cut, especially
> in cases where the improvement is a rebalancing of the database’s threading
> model rather than reducing the amount of work performed.
>
>
>
> The second item you mentioned - a direct IO path for commitlog writes -
> sounds like an interesting potential addition.
>
>
>
> One thing that may be useful to post along with your patch is a result
> from an extended tlp-stress run that includes both the live write path as
> well as the deferred compaction of data written.
>
>
>
> - Scott
>
>
>
> On Jul 22, 2022, at 9:14 AM, Pawar, Amit  wrote:
>
> 
>
> [AMD Official Use Only - General]
>
>
>
> Hi Benedict,
>
>
>
> The whole point is Cassandra as a software should take advantage of
> hardware wherever possible. So reducing Commitlog bottleneck may help some
> workloads and not all. I am already working on trunk now and will share the
> patch. If changes looks good and not very complex then please give your
> feedback. Your input might help to reduce the complexity of change and
> possibly patch can be accepted.
>
>
>
> Thanks,
>
> Amit
>
>
>
> *From:* Benedict 
> *Sent:* Friday, July 22, 2022 3:56 PM
> *To:* dev@cassandra.apache.org
> *Cc:* Bowen Song ; Raghavendra, Prakash <
> prakash.raghaven...@amd.com>
> *Subject:* Re: [DISCUSS] Improve Commitlog write path
>
>
>
> [CAUTION: External Email]
>
>