[jira] [Updated] (CASSANDRA-21209) Fix exception when training ZSTD compression dictionary and using bigger buffer for sampling data than possible to create

Stefan Miklosovic (Jira) Tue, 10 Mar 2026 12:43:23 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-21209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stefan Miklosovic updated CASSANDRA-21209:
------------------------------------------
    Description: 
We can specify how much sampling data there should be for training of a 
dictionary. The buffer for training is a direct buffer. If we say that we will 
be training on 2GiB, then it will try to create a direct buffer of size 2GiB.

This problem is visible e.g. when starting Cassandra via IDEA which uses 1GiB 
heap. I think how it works is that max direct memory size for JVM is basically 
same as xmx so it will fail. 

The fix would consist of wrapping creation of that buffer in a try-catch and 
propagate the error in some sanitized way up to a caller informing them it is 
not possible to create such a big sampling buffer.

  was:
We can specify how much sampling data there should be for training of a 
dictionary. The buffer for training is a direct buffer. If we say that we will 
be training on 2GiB, then it will try to create a direct buffer of size 2GiB.

This problem is visible e.g. when starting Cassandra via IDEA which uses 1GiB 
heap. I think it works is that max direct memory size for JVM is basically same 
as xmx so it will fail. 

The fix would consist of wrapping creation of that buffer in a try-catch and 
propagate the error in some sanitized way up to a caller informing them it is 
not possible to create such a big sampling buffer.


> Fix exception when training ZSTD compression dictionary and using bigger 
> buffer for sampling data than possible to create
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21209
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21209
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Feature/Compression
>            Reporter: Stefan Miklosovic
>            Priority: Normal
>
> We can specify how much sampling data there should be for training of a 
> dictionary. The buffer for training is a direct buffer. If we say that we 
> will be training on 2GiB, then it will try to create a direct buffer of size 
> 2GiB.
> This problem is visible e.g. when starting Cassandra via IDEA which uses 1GiB 
> heap. I think how it works is that max direct memory size for JVM is 
> basically same as xmx so it will fail. 
> The fix would consist of wrapping creation of that buffer in a try-catch and 
> propagate the error in some sanitized way up to a caller informing them it is 
> not possible to create such a big sampling buffer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-21209) Fix exception when training ZSTD compression dictionary and using bigger buffer for sampling data than possible to create

Reply via email to