Re: [VOTE] CEP-49: Hardware-accelerated compression

Štefan Miklošovič Wed, 17 Dec 2025 07:05:40 -0800

You are absolutely spot on. This is what I was trying to explain, that
we are wrapping it in try-catch and "act as if nothing happened" when
fallbacking to the default one we have. That is also the reason why I
did not want to see _this kind of fallback_. There is nothing wrong
with picking the compressor at the beginning upon startup as you
suggested and working with one implementation of that only. And if it
is meant to fail so let it. What I am against is this "dynamic
switching" in the very de/compression methods.


On Wed, Dec 17, 2025 at 3:57 PM Joseph Lynch <[email protected]> wrote:
>
> Does QAT not provide a way to detect what the hardware supports and return 
> that capability at construction time so we can pick the fastest 
> implementation that the hardware supports? That seems like a more robust way 
> than inline exception handling and consistent with how we do the other native 
> fallbacks, where we probe if they are available and fallback to a different 
> instance entirely if not available.
>
> Agree the inline try-catch is inelegant and implies that sometimes QAT can 
> succeed and sometimes it can fail. That should not be the case (either 
> hardware acceleration exists or not).
>
> -Joey
>
> On Wed, Dec 17, 2025 at 9:31 AM Štefan Miklošovič <[email protected]> 
> wrote:
>>
>> To be explicit, we are talking about this kind of fallbacking:
>>
>> https://gist.githubusercontent.com/smiklosovic/8efcdefadae0b6aae5c7eedd6cc948f7/raw/ae5716d077c1a37b4db901f81620f09d957dd303/gistfile1.txt
>>
>> I made a gist from that on PR in case it gets updated / overwritten.
>>
>> The logic here is that the "QAT backed compressor" is used first and
>> when it fails we fallback to the one we have in Cassandra. I have not
>> found the implementation of that plugin, it is said to be added later
>> on.
>>
>> So it is not about "we start Cassandra and then based on what is
>> available we pick to de/compress with and if QAT is not available we
>> fallback to stuff we have already". It is more about "we will put this
>> plugin on class path, that will effectively override the compressor we
>> are using AND IF THAT FAILS then while we are de/compressing we
>> fallback to the default one we have".
>>
>> Do you see the slight difference in the semantics here when talking
>> about "fallbacking"?
>>
>>
>> On Wed, Dec 17, 2025 at 3:14 PM Joseph Lynch <[email protected]> wrote:
>> >
>> > Just noticed the discussion here, I think this is just another case of 
>> > "native" code like we've done in the past. We try to load the native 
>> > library (try to load up QAT), if that fails then we try finding the 
>> > fastest implementation that works on the hardware they have. If you're 
>> > running on say arm we are already falling back to pure java 
>> > implementations of many things for example (afaik we only have native 
>> > implementations for crypto, compression and hashing on x86, but I might 
>> > have missed the arm patches).
>> >
>> > So instead of say x86 native -> fast java (unsafe) -> slow java it would 
>> > be qat -> x86 native -> slow java (since afaik we don't want to use unsafe 
>> > anymore). A log line helps the operator know _which_ of these they've 
>> > ended up with so they can debug why they are spending so many cycles where 
>> > they are, but I don't think the fallback is intrinsically hazardous (we 
>> > already do transparent fallbacks for TLS, Compression and Hashing afaik).
>> >
>> > -Joey
>> >
>> > On Wed, Dec 17, 2025 at 1:53 AM Štefan Miklošovič <[email protected]> 
>> > wrote:
>> >>
>> >> As mentioned, some combination of logging + metrics + maybe dying or
>> >> something else?
>> >>
>> >> I don't know for now, too soon / specific to deal with that, but
>> >> _something_ should be done, heh. I do not want to block otherwise
>> >> helpful and valuable contributions on these technicalities, but they
>> >> should be addressed.
>> >>
>> >> The "interesting" aspect of this acceleration hardware is that if it
>> >> is baked into the CPU and that fails, what are we actually supposed to
>> >> do with it? I do not know the details too much here but if it
>> >> hypothetically failed then we are supposed to do what, replace CPU?
>> >> Does a failure mean that the hardware as such is broken or the failure
>> >> was just intermittent? If a disk fails we can replace it and restart
>> >> the machine and rebuild or whatever, or we can just replace the whole
>> >> node.
>> >>
>> >> Anyway, we can always think about that more in follow-up tickets after
>> >> the initial delivery, but logging in a non-spamming manner + metrics
>> >> would be the minimum here imho.
>> >>
>> >> On Wed, Dec 17, 2025 at 1:27 AM Josh McKenzie <[email protected]> 
>> >> wrote:
>> >> >
>> >> > What if we went the same route we do for disk failure, have a sane 
>> >> > default we collectively believe to be the "majority case", but also 
>> >> > have a configuration knob in cassandra.yaml to choose a hard stop on 
>> >> > failure if so inclined? Complexity is low, maintenance burden should be 
>> >> > low.
>> >> >
>> >> > These discussions end up spinning trying to find the One Right Answer 
>> >> > when there isn't one. You're right Stefan. And so is Scott. It depends. 
>> >> > :)
>> >> >
>> >> > On Tue, Dec 16, 2025, at 2:11 PM, Štefan Miklošovič wrote:
>> >> >
>> >> > In the scenarios as Scott described it does make sense to fallback but
>> >> > I am not sure about that when there is a production traffic happening
>> >> > and we rely on hardware de/compression and _that_ fails silently.
>> >> >
>> >> > It is one thing to not fail catastrophically when upgrading or
>> >> > changing nodes or machines with that hardware are not present etc. and
>> >> > it is something different to actually expect that data will be
>> >> > de/compressed with some acceleration and we just swallow the exception
>> >> > and de/compress in software.
>> >> >
>> >> > My perception here is that Cassandra is embracing the philosophy that
>> >> > if it fails so let it and change the hardware. Heck, we have whole
>> >> > class of logic around what should happen if there is some kind of a
>> >> > disk failure.
>> >> >
>> >> > While here we are going to act as when the very hardware I am supposed
>> >> > to de/compress with fails to do so I just fallback to software and ...
>> >> > that's it? Should not there be some kind of a mechanism to also die
>> >> > when something goes wrong here?
>> >> >
>> >> > On Tue, Dec 16, 2025 at 7:10 PM Josh McKenzie <[email protected]> 
>> >> > wrote:
>> >> > >
>> >> > > As a user, I'd rather have a WARN in my logs than to be unable to 
>> >> > > start the database without changing cluster-wide configuration like 
>> >> > > schema / compaction parameters.
>> >> > >
>> >> > > Strong +1 here.
>> >> > >
>> >> > > While on the one hand we expect homogenous hardware environments for 
>> >> > > clusters, to Scott's point that's not always going to hold true in 
>> >> > > containerized and cloud-based environments. Definitely think we need 
>> >> > > to let the operators know, but graceful degradation of the database 
>> >> > > (in a step-wise plateau-based fashion like this, not a death spiral 
>> >> > > scenario to be clear) is much preferred IMO.
>> >> > >
>> >> > > On Tue, Dec 16, 2025, at 10:32 AM, Štefan Miklošovič wrote:
>> >> > >
>> >> > > Okay I guess that is a good compromise to make here. So warning in the
>> >> > > logs + metrics? I think that metrics would be cool to have so we might
>> >> > > chart how often it happens etc.
>> >> > >
>> >> > > On Tue, Dec 16, 2025 at 4:27 PM C. Scott Andreas 
>> >> > > <[email protected]> wrote:
>> >> > > >
>> >> > > > One example where lack of a fallback would be problematic is:
>> >> > > >
>> >> > > > – User provisions AWS metal-class instances that expose hardware 
>> >> > > > QAT and adopts.
>> >> > > > – User needs to expand cluster or replace failed hardware.
>> >> > > > – Insufficient hardware-QAT-capable machines available from AWS
>> >> > > > – Cassandra unable to start on replacement/expanded machines due to 
>> >> > > > lack of fallback.
>> >> > > >
>> >> > > > There are a handful of cases where the database performs similar 
>> >> > > > fallbacks today, such as attempting mlockall on startup for 
>> >> > > > improved memory locality and to avoid allocation stalls.
>> >> > > >
>> >> > > > As a user, I'd rather have a WARN in my logs than to be unable to 
>> >> > > > start the database without changing cluster-wide configuration like 
>> >> > > > schema / compaction parameters.
>> >> > > >
>> >> > > > – Scott
>> >> > > >
>> >> > > > On Dec 16, 2025, at 5:18 AM, Štefan Miklošovič 
>> >> > > > <[email protected]> wrote:
>> >> > > >
>> >> > > >
>> >> > > > I am open to adding some kind of metrics when it fallsbacks to track
>> >> > > > if / how often it failed by hardware etc. Wondering what others 
>> >> > > > think
>> >> > > > about fallbacking just like that. I feel like something is not
>> >> > > > transparent to a user who relies on hardware compression in the 
>> >> > > > first
>> >> > > > place.
>> >> > > >
>> >> > > > On Tue, Dec 16, 2025 at 1:52 PM Štefan Miklošovič
>> >> > > > <[email protected]> wrote:
>> >> > > >
>> >> > > >
>> >> > > > My personal preference is to not do any fallbacking. The reason for
>> >> > > > that is that failures should be transparent and if it is meant to 
>> >> > > > fail
>> >> > > > so be it.
>> >> > > >
>> >> > > > If we wrap it in try-catch and fallback, then a user thinks that
>> >> > > > everything is just fine, right? There is no visibility into whether
>> >> > > > and how often it failed so a user can act on that. By fallbacking, a
>> >> > > > user is kind of mislead, as they think that all is just fine while
>> >> > > > they can not wrap they head around the fact that they bought 
>> >> > > > hardware
>> >> > > > which says that their compression will be accelerated while looking 
>> >> > > > at
>> >> > > > their dashboards and every now and then seeing the same performance 
>> >> > > > as
>> >> > > > if they were compressing by software.
>> >> > > >
>> >> > > > If they see that it is failing then they can reach out to the vendor
>> >> > > > of such hardware, then raise complaints and issues with it so the
>> >> > > > vendor's engineers can look into why it failed and how to fix it.
>> >> > > > Instead of just wrapping it in one try-catch and acting like all is
>> >> > > > actually fine. A user bought hardware to compress it, I do not think
>> >> > > > they are interested in "best-effort" here. If that hardware fails, 
>> >> > > > or
>> >> > > > the software which is managing it is erroneous, then it should be
>> >> > > > either fixed or replaced.
>> >> > > >
>> >> > > > On Tue, Dec 16, 2025 at 2:29 AM Kokoori, Shylaja
>> >> > > > <[email protected]> wrote:
>> >> > > > >
>> >> > > > > Hi Stefan,
>> >> > > > > Thank you very much for the feedback.
>> >> > > > > You are correct, QAT is available on-die and not hot-plugged, and 
>> >> > > > > under normal circumstances , we shouldn't encounter this 
>> >> > > > > exception. However, wanted to add reverting to base compressor to 
>> >> > > > > make it fault-tolerant.
>> >> > > > >
>> >> > > > > While the QAT software stack includes built-in retries and 
>> >> > > > > software fallbacks for scenarios when devices end up being busy 
>> >> > > > > etc., I didn't want operations to fail due to transient hardware 
>> >> > > > > issues which otherwise would have succeeded. An example would be, 
>> >> > > > > some unrecoverable error occurring during a compress/decompress 
>> >> > > > > operation—whether due to a hardware issue or caused by related 
>> >> > > > > software libraries—the system can gracefully revert to the base 
>> >> > > > > compressor rather than failing the operation entirely.
>> >> > > > >
>> >> > > > > I am open to other suggestions.
>> >> > > > > Thanks,
>> >> > > > > Shylaja
>> >> > > > > ________________________________
>> >> > > > > From: Štefan Miklošovič <[email protected]>
>> >> > > > > Sent: Monday, December 15, 2025 2:50 PM
>> >> > > > > To: [email protected] <[email protected]>
>> >> > > > > Subject: Re: [VOTE] CEP-49: Hardware-accelerated compression
>> >> > > > >
>> >> > > > > Hi Shylaja,
>> >> > > > >
>> >> > > > > I am going through CEP so I can make the decision when voting and 
>> >> > > > > I
>> >> > > > > want to clarify a few things.
>> >> > > > >
>> >> > > > > You say there:
>> >> > > > >
>> >> > > > > Both the default compressor instance and a plugin compressor 
>> >> > > > > instance
>> >> > > > > (obtained from the provider), will be maintained by Cassandra. For
>> >> > > > > subsequent read/write operations, the plugin compressor will be 
>> >> > > > > used.
>> >> > > > > However, if the plugin version encounters an error, the default
>> >> > > > > compressor will handle the operation.
>> >> > > > >
>> >> > > > > Why are we doing this kind of "fallback"? Under what circumstances
>> >> > > > > "the plugin version encounters an error"? Why would it? It might 
>> >> > > > > be
>> >> > > > > understandable to do it like that if that compression accelerator
>> >> > > > > would be some "plug and play" or we could just remove it from a
>> >> > > > > running machine. But this does not seem to be the case? QAT you 
>> >> > > > > are
>> >> > > > > mentioning is baked into the CPU, right? It is not like we would
>> >> > > > > decide to just turn it suddenly off in runtime so the database 
>> >> > > > > would
>> >> > > > > need to deal with it.
>> >> > > > >
>> >> > > > > The reason I am asking is that I just briefly went over the PR 
>> >> > > > > and the
>> >> > > > > way it works there is that if plugin de/compression is not 
>> >> > > > > possible
>> >> > > > > (it throws IOException) then it will default to a software 
>> >> > > > > solution.
>> >> > > > > This is done for every single de/compression of a chunk.
>> >> > > > >
>> >> > > > > Is this design the absolute must?
>> >> > > > >
>> >> > > > >
>> >> > > > > On Mon, Dec 15, 2025 at 8:14 PM Josh McKenzie 
>> >> > > > > <[email protected]> wrote:
>> >> > > > > >
>> >> > > > > > Yes but it's in reply to the discussion thread and so it 
>> >> > > > > > threads that way in clients
>> >> > > > > >
>> >> > > > > > Apparently not in fastmail's client because it shows up as its 
>> >> > > > > > own thread for me. /sigh
>> >> > > > > >
>> >> > > > > > Hence the confusion. Makes sense now.
>> >> > > > > >
>> >> > > > > > On Mon, Dec 15, 2025, at 1:18 PM, Kokoori, Shylaja wrote:
>> >> > > > > >
>> >> > > > > > Thank you for your feedback, Patrick & Brandon. I have created 
>> >> > > > > > a new email thread like you suggested. Hopefully, this works.
>> >> > > > > >
>> >> > > > > > -Shylaja
>> >> > > > > >
>> >> > > > > > ________________________________
>> >> > > > > > From: Patrick McFadin <[email protected]>
>> >> > > > > > Sent: Monday, December 15, 2025 9:26 AM
>> >> > > > > > To: [email protected] <[email protected]>
>> >> > > > > > Subject: Re: [VOTE] CEP-49: Hardware-accelerated compression
>> >> > > > > >
>> >> > > > > > That was my point. It's a [DISCUSS] thread.
>> >> > > > > >
>> >> > > > > > On Mon, Dec 15, 2025 at 9:22 AM Brandon Williams 
>> >> > > > > > <[email protected]> wrote:
>> >> > > > > >
>> >> > > > > > On Mon, Dec 15, 2025 at 11:13 AM Josh McKenzie 
>> >> > > > > > <[email protected]> wrote:
>> >> > > > > > >
>> >> > > > > > > Can you put this into a [VOTE] thread?
>> >> > > > > > >
>> >> > > > > > > I'm confused - isn't the subject of this email [VOTE]?
>> >> > > > > >
>> >> > > > > > Yes but it's in reply to the discussion thread and so it 
>> >> > > > > > threads that
>> >> > > > > > way in clients, making it easy to overlook.
>> >> > > > > >
>> >> > > > > > Kind Regards,
>> >> > > > > > Brandon
>> >> > > > > >
>> >> > > > > >
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > >
>> >> > >
>> >> >
>> >> >

Re: [VOTE] CEP-49: Hardware-accelerated compression

Reply via email to