Re: [DISCUSS] Replacement of SSTable's partition cardinality implementation from stream-lib to Apache Datasketches

2025-01-03 Thread Dmitry Konstantinov
Hi Brain, I wanted it to be created under
https://cwiki.apache.org/confluence/display/CASSANDRA/Discussion but it
looks like I do not have grants to add a page there and Confluence
automatically selected this space to store the page.
I do not have permission to move it too :-(
Can I get grants to create pages under
https://cwiki.apache.org/confluence/display/CASSANDRA/ ?

Thank you,
Dmitry

On Fri, 3 Jan 2025 at 14:12, Brian Proffitt  wrote:

> Dmitry:
>
> You are using a section of the Confluence wiki that is dedicated to
> Community Over Code, the Apache Conference. Please move that page to a more
> appropriate part of the Apache wiki as soon as you can.
>
> Thanks!
> BKP
>
> On 2025/01/03 13:55:49 Dmitry Konstantinov wrote:
> > I have summarized information from this mail thread to
> >
> https://cwiki.apache.org/confluence/display/COC/SSTable%27s+partition+cardinality+implementation
> > Probably later it can be transformed to a CEP..
> > Regarding experience of DataSketches library's authors and publications
> > here there is a good summary in Background section:
> >
> https://cwiki.apache.org/confluence/display/INCUBATOR/DataSketchesProposal
> > . It looks good..
> >
> > On Fri, 3 Jan 2025 at 13:06, Štefan Miklošovič 
> > wrote:
> >
> > > Right ... that sounds reasonable. Let's "sleep on it" for a while. It
> is
> > > not something which is urgent to deal with right now but I find myself
> > > quite often to identify the functionality where we go to the disk more
> > > often than necessary and this was next on the list to take a look at
> > > reading CASSANDRA-13338. So I took a look ... and here we are.
> > >
> > > If you guys go to bump SSTable version in 5.1 / 6.0, this change might
> be
> > > just shipped with that too.
> > >
> > > On Fri, Jan 3, 2025 at 1:47 PM Benedict  wrote:
> > >
> > >> I’ve had a quick skim of the data sketches library, and it does seem
> to
> > >> have made some more efficient decisions in its design than
> clearspring,
> > >> appears to maybe support off-heap representations, and has reasonably
> good
> > >> documentation about the theoretical properties of the sketches. The
> chair
> > >> of the project is a published author on the topic, and the library has
> > >> newer algorithms for cardinality estimation than HLL.
> > >>
> > >> So, honestly, it might not be a bad idea to (carefully) consider a
> > >> migration, even if the current library isn’t broken for our needs.
> > >>
> > >> It would not be high up my priority list for the project, but I would
> > >> support it if it scratches someone’s itch.
> > >>
> > >> On 3 Jan 2025, at 12:16, Štefan Miklošovič 
> > >> wrote:
> > >>
> > >> 
> > >> Okay ... first problems.
> > >>
> > >> These 2000 bytes I have mentioned in my response to Chris were indeed
> > >> correct, but that was with Datasketches and the main parameter for
> Hall
> > >> Sketch (DEFAULT_LG_K) was 12. When I changed that to 13 to match what
> we
> > >> currently have in Cassandra with Clearspring, that doubled the size to
> > >> ~4000 bytes.
> > >>
> > >> When we do not use Datasketches, what Clearspring generates is about
> > >> ~5000 bytes for the array itself but that array is wrapped into an
> > >> ICardinality object of Clearspring and we need that object in order to
> > >> merge another ICardinality into that. So, we would need to cache this
> > >> ICardinality object instead of just an array itself. If we don't cache
> > >> whole ICardinality, we would then need to do basically what
> > >> CompactionMetadata.CompactionMetadataSerializer.deserialize is doing
> which
> > >> would allocate a lot / often (ICardinality cardinality =
> > >> HyperLogLogPlus.Builder.build(that_cached_array)).
> > >>
> > >> To avoid the allocations every time we compute, we would just cache
> that
> > >> whole ICardinality of Clearspring, but that whole object measures like
> > >> 11/12 KB. So even 10k tables would occupy like 100MB. 50k tables
> 500MB.
> > >> That is becoming quite a problem.
> > >>
> > >> On the other hand, HllSketch of Datasketches, array included, adds
> > >> minimal overhead. Like an array has 5000 bytes and the whole object
> like
> > >> 5500. You got the idea ...
> > >>
> > >> If we are still OK with these sizes, sure ... I am just being
> transparent
> > >> about the consequences here.
> > >>
> > >> A user would just opt-in into this (by default it would be turned
> off).
> > >>
> > >> On the other hand, if we have 10k SSTables, reading that 10+KB from
> disk
> > >> takes around 2-3ms so we would read the disk 20/30 seconds every time
> we
> > >> would hit that metric (and we haven't even started to merge the logs).
> > >>
> > >> If this is still not something which would sell Datasketches as a
> viable
> > >> alternative then I guess we need to stick to these numbers and cache
> it all
> > >> with Clearspring, occupying way more memory.
> > >>
> > >> On Thu, Jan 2, 2025 at 10:15 PM Benedict  wrote:
> > >>
> > >>> I would like to see somebody who has 

Re: [DISCUSS] Replacement of SSTable's partition cardinality implementation from stream-lib to Apache Datasketches

2025-01-03 Thread Brian Proffitt
This would be an infra issue, as I don't have page-creation rights under 
Cassandra, either. You should file a ticket.

BKP

On 2025/01/03 14:18:22 Dmitry Konstantinov wrote:
> Hi Brain, I wanted it to be created under
> https://cwiki.apache.org/confluence/display/CASSANDRA/Discussion but it
> looks like I do not have grants to add a page there and Confluence
> automatically selected this space to store the page.
> I do not have permission to move it too :-(
> Can I get grants to create pages under
> https://cwiki.apache.org/confluence/display/CASSANDRA/ ?
> 
> Thank you,
> Dmitry
> 
> On Fri, 3 Jan 2025 at 14:12, Brian Proffitt  wrote:
> 
> > Dmitry:
> >
> > You are using a section of the Confluence wiki that is dedicated to
> > Community Over Code, the Apache Conference. Please move that page to a more
> > appropriate part of the Apache wiki as soon as you can.
> >
> > Thanks!
> > BKP
> >
> > On 2025/01/03 13:55:49 Dmitry Konstantinov wrote:
> > > I have summarized information from this mail thread to
> > >
> > https://cwiki.apache.org/confluence/display/COC/SSTable%27s+partition+cardinality+implementation
> > > Probably later it can be transformed to a CEP..
> > > Regarding experience of DataSketches library's authors and publications
> > > here there is a good summary in Background section:
> > >
> > https://cwiki.apache.org/confluence/display/INCUBATOR/DataSketchesProposal
> > > . It looks good..
> > >
> > > On Fri, 3 Jan 2025 at 13:06, Štefan Miklošovič 
> > > wrote:
> > >
> > > > Right ... that sounds reasonable. Let's "sleep on it" for a while. It
> > is
> > > > not something which is urgent to deal with right now but I find myself
> > > > quite often to identify the functionality where we go to the disk more
> > > > often than necessary and this was next on the list to take a look at
> > > > reading CASSANDRA-13338. So I took a look ... and here we are.
> > > >
> > > > If you guys go to bump SSTable version in 5.1 / 6.0, this change might
> > be
> > > > just shipped with that too.
> > > >
> > > > On Fri, Jan 3, 2025 at 1:47 PM Benedict  wrote:
> > > >
> > > >> I’ve had a quick skim of the data sketches library, and it does seem
> > to
> > > >> have made some more efficient decisions in its design than
> > clearspring,
> > > >> appears to maybe support off-heap representations, and has reasonably
> > good
> > > >> documentation about the theoretical properties of the sketches. The
> > chair
> > > >> of the project is a published author on the topic, and the library has
> > > >> newer algorithms for cardinality estimation than HLL.
> > > >>
> > > >> So, honestly, it might not be a bad idea to (carefully) consider a
> > > >> migration, even if the current library isn’t broken for our needs.
> > > >>
> > > >> It would not be high up my priority list for the project, but I would
> > > >> support it if it scratches someone’s itch.
> > > >>
> > > >> On 3 Jan 2025, at 12:16, Štefan Miklošovič 
> > > >> wrote:
> > > >>
> > > >> 
> > > >> Okay ... first problems.
> > > >>
> > > >> These 2000 bytes I have mentioned in my response to Chris were indeed
> > > >> correct, but that was with Datasketches and the main parameter for
> > Hall
> > > >> Sketch (DEFAULT_LG_K) was 12. When I changed that to 13 to match what
> > we
> > > >> currently have in Cassandra with Clearspring, that doubled the size to
> > > >> ~4000 bytes.
> > > >>
> > > >> When we do not use Datasketches, what Clearspring generates is about
> > > >> ~5000 bytes for the array itself but that array is wrapped into an
> > > >> ICardinality object of Clearspring and we need that object in order to
> > > >> merge another ICardinality into that. So, we would need to cache this
> > > >> ICardinality object instead of just an array itself. If we don't cache
> > > >> whole ICardinality, we would then need to do basically what
> > > >> CompactionMetadata.CompactionMetadataSerializer.deserialize is doing
> > which
> > > >> would allocate a lot / often (ICardinality cardinality =
> > > >> HyperLogLogPlus.Builder.build(that_cached_array)).
> > > >>
> > > >> To avoid the allocations every time we compute, we would just cache
> > that
> > > >> whole ICardinality of Clearspring, but that whole object measures like
> > > >> 11/12 KB. So even 10k tables would occupy like 100MB. 50k tables
> > 500MB.
> > > >> That is becoming quite a problem.
> > > >>
> > > >> On the other hand, HllSketch of Datasketches, array included, adds
> > > >> minimal overhead. Like an array has 5000 bytes and the whole object
> > like
> > > >> 5500. You got the idea ...
> > > >>
> > > >> If we are still OK with these sizes, sure ... I am just being
> > transparent
> > > >> about the consequences here.
> > > >>
> > > >> A user would just opt-in into this (by default it would be turned
> > off).
> > > >>
> > > >> On the other hand, if we have 10k SSTables, reading that 10+KB from
> > disk
> > > >> takes around 2-3ms so we would read the disk 20/30 seconds every time
> > we
> > > >> would h

Re: [DISCUSS] Replacement of SSTable's partition cardinality implementation from stream-lib to Apache Datasketches

2025-01-03 Thread Brandon Williams
I've granted access to the account "Dmitry Konstantinov (netudima)"

Kind Regards,
Brandon

On Fri, Jan 3, 2025 at 8:18 AM Dmitry Konstantinov  wrote:
>
> Hi Brain, I wanted it to be created under 
> https://cwiki.apache.org/confluence/display/CASSANDRA/Discussion but it looks 
> like I do not have grants to add a page there and Confluence automatically 
> selected this space to store the page.
> I do not have permission to move it too :-(
> Can I get grants to create pages under 
> https://cwiki.apache.org/confluence/display/CASSANDRA/ ?
>
> Thank you,
> Dmitry
>
> On Fri, 3 Jan 2025 at 14:12, Brian Proffitt  wrote:
>>
>> Dmitry:
>>
>> You are using a section of the Confluence wiki that is dedicated to 
>> Community Over Code, the Apache Conference. Please move that page to a more 
>> appropriate part of the Apache wiki as soon as you can.
>>
>> Thanks!
>> BKP
>>
>> On 2025/01/03 13:55:49 Dmitry Konstantinov wrote:
>> > I have summarized information from this mail thread to
>> > https://cwiki.apache.org/confluence/display/COC/SSTable%27s+partition+cardinality+implementation
>> > Probably later it can be transformed to a CEP..
>> > Regarding experience of DataSketches library's authors and publications
>> > here there is a good summary in Background section:
>> > https://cwiki.apache.org/confluence/display/INCUBATOR/DataSketchesProposal
>> > . It looks good..
>> >
>> > On Fri, 3 Jan 2025 at 13:06, Štefan Miklošovič 
>> > wrote:
>> >
>> > > Right ... that sounds reasonable. Let's "sleep on it" for a while. It is
>> > > not something which is urgent to deal with right now but I find myself
>> > > quite often to identify the functionality where we go to the disk more
>> > > often than necessary and this was next on the list to take a look at
>> > > reading CASSANDRA-13338. So I took a look ... and here we are.
>> > >
>> > > If you guys go to bump SSTable version in 5.1 / 6.0, this change might be
>> > > just shipped with that too.
>> > >
>> > > On Fri, Jan 3, 2025 at 1:47 PM Benedict  wrote:
>> > >
>> > >> I’ve had a quick skim of the data sketches library, and it does seem to
>> > >> have made some more efficient decisions in its design than clearspring,
>> > >> appears to maybe support off-heap representations, and has reasonably 
>> > >> good
>> > >> documentation about the theoretical properties of the sketches. The 
>> > >> chair
>> > >> of the project is a published author on the topic, and the library has
>> > >> newer algorithms for cardinality estimation than HLL.
>> > >>
>> > >> So, honestly, it might not be a bad idea to (carefully) consider a
>> > >> migration, even if the current library isn’t broken for our needs.
>> > >>
>> > >> It would not be high up my priority list for the project, but I would
>> > >> support it if it scratches someone’s itch.
>> > >>
>> > >> On 3 Jan 2025, at 12:16, Štefan Miklošovič 
>> > >> wrote:
>> > >>
>> > >> 
>> > >> Okay ... first problems.
>> > >>
>> > >> These 2000 bytes I have mentioned in my response to Chris were indeed
>> > >> correct, but that was with Datasketches and the main parameter for Hall
>> > >> Sketch (DEFAULT_LG_K) was 12. When I changed that to 13 to match what we
>> > >> currently have in Cassandra with Clearspring, that doubled the size to
>> > >> ~4000 bytes.
>> > >>
>> > >> When we do not use Datasketches, what Clearspring generates is about
>> > >> ~5000 bytes for the array itself but that array is wrapped into an
>> > >> ICardinality object of Clearspring and we need that object in order to
>> > >> merge another ICardinality into that. So, we would need to cache this
>> > >> ICardinality object instead of just an array itself. If we don't cache
>> > >> whole ICardinality, we would then need to do basically what
>> > >> CompactionMetadata.CompactionMetadataSerializer.deserialize is doing 
>> > >> which
>> > >> would allocate a lot / often (ICardinality cardinality =
>> > >> HyperLogLogPlus.Builder.build(that_cached_array)).
>> > >>
>> > >> To avoid the allocations every time we compute, we would just cache that
>> > >> whole ICardinality of Clearspring, but that whole object measures like
>> > >> 11/12 KB. So even 10k tables would occupy like 100MB. 50k tables 500MB.
>> > >> That is becoming quite a problem.
>> > >>
>> > >> On the other hand, HllSketch of Datasketches, array included, adds
>> > >> minimal overhead. Like an array has 5000 bytes and the whole object like
>> > >> 5500. You got the idea ...
>> > >>
>> > >> If we are still OK with these sizes, sure ... I am just being 
>> > >> transparent
>> > >> about the consequences here.
>> > >>
>> > >> A user would just opt-in into this (by default it would be turned off).
>> > >>
>> > >> On the other hand, if we have 10k SSTables, reading that 10+KB from disk
>> > >> takes around 2-3ms so we would read the disk 20/30 seconds every time we
>> > >> would hit that metric (and we haven't even started to merge the logs).
>> > >>
>> > >> If this is still not something which would sell Datasketch

Re: [DISCUSS] Replacement of SSTable's partition cardinality implementation from stream-lib to Apache Datasketches

2025-01-03 Thread Dmitry Konstantinov
Thank you, Brandon!

I have moved the page to
https://cwiki.apache.org/confluence/display/CASSANDRA/SSTable%27s+partition+cardinality+implementation

On Fri, 3 Jan 2025 at 14:45, Brandon Williams  wrote:

> I've granted access to the account "Dmitry Konstantinov (netudima)"
>
> Kind Regards,
> Brandon
>
> On Fri, Jan 3, 2025 at 8:18 AM Dmitry Konstantinov 
> wrote:
> >
> > Hi Brain, I wanted it to be created under
> https://cwiki.apache.org/confluence/display/CASSANDRA/Discussion but it
> looks like I do not have grants to add a page there and Confluence
> automatically selected this space to store the page.
> > I do not have permission to move it too :-(
> > Can I get grants to create pages under
> https://cwiki.apache.org/confluence/display/CASSANDRA/ ?
> >
> > Thank you,
> > Dmitry
> >
> > On Fri, 3 Jan 2025 at 14:12, Brian Proffitt  wrote:
> >>
> >> Dmitry:
> >>
> >> You are using a section of the Confluence wiki that is dedicated to
> Community Over Code, the Apache Conference. Please move that page to a more
> appropriate part of the Apache wiki as soon as you can.
> >>
> >> Thanks!
> >> BKP
> >>
> >> On 2025/01/03 13:55:49 Dmitry Konstantinov wrote:
> >> > I have summarized information from this mail thread to
> >> >
> https://cwiki.apache.org/confluence/display/COC/SSTable%27s+partition+cardinality+implementation
> >> > Probably later it can be transformed to a CEP..
> >> > Regarding experience of DataSketches library's authors and
> publications
> >> > here there is a good summary in Background section:
> >> >
> https://cwiki.apache.org/confluence/display/INCUBATOR/DataSketchesProposal
> >> > . It looks good..
> >> >
> >> > On Fri, 3 Jan 2025 at 13:06, Štefan Miklošovič <
> smikloso...@apache.org>
> >> > wrote:
> >> >
> >> > > Right ... that sounds reasonable. Let's "sleep on it" for a while.
> It is
> >> > > not something which is urgent to deal with right now but I find
> myself
> >> > > quite often to identify the functionality where we go to the disk
> more
> >> > > often than necessary and this was next on the list to take a look at
> >> > > reading CASSANDRA-13338. So I took a look ... and here we are.
> >> > >
> >> > > If you guys go to bump SSTable version in 5.1 / 6.0, this change
> might be
> >> > > just shipped with that too.
> >> > >
> >> > > On Fri, Jan 3, 2025 at 1:47 PM Benedict 
> wrote:
> >> > >
> >> > >> I’ve had a quick skim of the data sketches library, and it does
> seem to
> >> > >> have made some more efficient decisions in its design than
> clearspring,
> >> > >> appears to maybe support off-heap representations, and has
> reasonably good
> >> > >> documentation about the theoretical properties of the sketches.
> The chair
> >> > >> of the project is a published author on the topic, and the library
> has
> >> > >> newer algorithms for cardinality estimation than HLL.
> >> > >>
> >> > >> So, honestly, it might not be a bad idea to (carefully) consider a
> >> > >> migration, even if the current library isn’t broken for our needs.
> >> > >>
> >> > >> It would not be high up my priority list for the project, but I
> would
> >> > >> support it if it scratches someone’s itch.
> >> > >>
> >> > >> On 3 Jan 2025, at 12:16, Štefan Miklošovič  >
> >> > >> wrote:
> >> > >>
> >> > >> 
> >> > >> Okay ... first problems.
> >> > >>
> >> > >> These 2000 bytes I have mentioned in my response to Chris were
> indeed
> >> > >> correct, but that was with Datasketches and the main parameter for
> Hall
> >> > >> Sketch (DEFAULT_LG_K) was 12. When I changed that to 13 to match
> what we
> >> > >> currently have in Cassandra with Clearspring, that doubled the
> size to
> >> > >> ~4000 bytes.
> >> > >>
> >> > >> When we do not use Datasketches, what Clearspring generates is
> about
> >> > >> ~5000 bytes for the array itself but that array is wrapped into an
> >> > >> ICardinality object of Clearspring and we need that object in
> order to
> >> > >> merge another ICardinality into that. So, we would need to cache
> this
> >> > >> ICardinality object instead of just an array itself. If we don't
> cache
> >> > >> whole ICardinality, we would then need to do basically what
> >> > >> CompactionMetadata.CompactionMetadataSerializer.deserialize is
> doing which
> >> > >> would allocate a lot / often (ICardinality cardinality =
> >> > >> HyperLogLogPlus.Builder.build(that_cached_array)).
> >> > >>
> >> > >> To avoid the allocations every time we compute, we would just
> cache that
> >> > >> whole ICardinality of Clearspring, but that whole object measures
> like
> >> > >> 11/12 KB. So even 10k tables would occupy like 100MB. 50k tables
> 500MB.
> >> > >> That is becoming quite a problem.
> >> > >>
> >> > >> On the other hand, HllSketch of Datasketches, array included, adds
> >> > >> minimal overhead. Like an array has 5000 bytes and the whole
> object like
> >> > >> 5500. You got the idea ...
> >> > >>
> >> > >> If we are still OK with these sizes, sure ... I am just being
> transparent
> >> > >> about the 

Re: [DISCUSS] Replacement of SSTable's partition cardinality implementation from stream-lib to Apache Datasketches

2025-01-03 Thread Štefan Miklošovič
Right ... that sounds reasonable. Let's "sleep on it" for a while. It is
not something which is urgent to deal with right now but I find myself
quite often to identify the functionality where we go to the disk more
often than necessary and this was next on the list to take a look at
reading CASSANDRA-13338. So I took a look ... and here we are.

If you guys go to bump SSTable version in 5.1 / 6.0, this change might be
just shipped with that too.

On Fri, Jan 3, 2025 at 1:47 PM Benedict  wrote:

> I’ve had a quick skim of the data sketches library, and it does seem to
> have made some more efficient decisions in its design than clearspring,
> appears to maybe support off-heap representations, and has reasonably good
> documentation about the theoretical properties of the sketches. The chair
> of the project is a published author on the topic, and the library has
> newer algorithms for cardinality estimation than HLL.
>
> So, honestly, it might not be a bad idea to (carefully) consider a
> migration, even if the current library isn’t broken for our needs.
>
> It would not be high up my priority list for the project, but I would
> support it if it scratches someone’s itch.
>
> On 3 Jan 2025, at 12:16, Štefan Miklošovič  wrote:
>
> 
> Okay ... first problems.
>
> These 2000 bytes I have mentioned in my response to Chris were indeed
> correct, but that was with Datasketches and the main parameter for Hall
> Sketch (DEFAULT_LG_K) was 12. When I changed that to 13 to match what we
> currently have in Cassandra with Clearspring, that doubled the size to
> ~4000 bytes.
>
> When we do not use Datasketches, what Clearspring generates is about ~5000
> bytes for the array itself but that array is wrapped into an ICardinality
> object of Clearspring and we need that object in order to merge another
> ICardinality into that. So, we would need to cache this ICardinality object
> instead of just an array itself. If we don't cache whole ICardinality, we
> would then need to do basically what
> CompactionMetadata.CompactionMetadataSerializer.deserialize is doing which
> would allocate a lot / often (ICardinality cardinality =
> HyperLogLogPlus.Builder.build(that_cached_array)).
>
> To avoid the allocations every time we compute, we would just cache that
> whole ICardinality of Clearspring, but that whole object measures like
> 11/12 KB. So even 10k tables would occupy like 100MB. 50k tables 500MB.
> That is becoming quite a problem.
>
> On the other hand, HllSketch of Datasketches, array included, adds minimal
> overhead. Like an array has 5000 bytes and the whole object like 5500. You
> got the idea ...
>
> If we are still OK with these sizes, sure ... I am just being transparent
> about the consequences here.
>
> A user would just opt-in into this (by default it would be turned off).
>
> On the other hand, if we have 10k SSTables, reading that 10+KB from disk
> takes around 2-3ms so we would read the disk 20/30 seconds every time we
> would hit that metric (and we haven't even started to merge the logs).
>
> If this is still not something which would sell Datasketches as a viable
> alternative then I guess we need to stick to these numbers and cache it all
> with Clearspring, occupying way more memory.
>
> On Thu, Jan 2, 2025 at 10:15 PM Benedict  wrote:
>
>> I would like to see somebody who has some experience writing data
>> structures, preferably someone we trust as a community to be competent at
>> this (ie having some experience within the project contributing at this
>> level), look at the code like they were at least lightly reviewing the
>> feature as a contribution to this project.
>>
>> This should be the bar for any new library really, but triply so for
>> replacing a library that works fine.
>>
>> On 2 Jan 2025, at 21:02, Štefan Miklošovič 
>> wrote:
>>
>> 
>> Point 2) is pretty hard to fulfil, I can not imagine what would be
>> "enough" for you to be persuaded. What should concretely happen? Because
>> whoever comes and says "yeah this is a good lib, it works" is probably not
>> going to be enough given the vague requirements you put under 2) You would
>> like to see exactly what?
>>
>> The way it looks to me is to just shut it down because of perceived churn
>> caused by that and there will always be some argument against that.
>>
>> Based on (1) I don't think what we have is bug free.
>>
>> Jeff:
>>
>> Thank you for that answer, I think we are on the same page that caching
>> it is just fine, that's what I got from your last two paragraphs.
>>
>> So the path from here is
>>
>> 1) add datasketches and cache
>> 2) don't add datasketches and cache it anyway
>>
>> The introduction of datasketches lib is not the absolute must in order to
>> achieve that, we can cache / compute it parallel with Clearspring as well,
>> it is just a bitter-sweet solution which just doesn't feel right.
>>
>> (1) https://github.com/addthis/stream-lib/issues
>>
>> On Thu, Jan 2, 2025 at 9:26 PM Benedict  wrote:
>>
>>> Your message se

Re: [DISCUSS] Replacement of SSTable's partition cardinality implementation from stream-lib to Apache Datasketches

2025-01-03 Thread Dmitry Konstantinov
I have summarized information from this mail thread to
https://cwiki.apache.org/confluence/display/COC/SSTable%27s+partition+cardinality+implementation
Probably later it can be transformed to a CEP..
Regarding experience of DataSketches library's authors and publications
here there is a good summary in Background section:
https://cwiki.apache.org/confluence/display/INCUBATOR/DataSketchesProposal
. It looks good..

On Fri, 3 Jan 2025 at 13:06, Štefan Miklošovič 
wrote:

> Right ... that sounds reasonable. Let's "sleep on it" for a while. It is
> not something which is urgent to deal with right now but I find myself
> quite often to identify the functionality where we go to the disk more
> often than necessary and this was next on the list to take a look at
> reading CASSANDRA-13338. So I took a look ... and here we are.
>
> If you guys go to bump SSTable version in 5.1 / 6.0, this change might be
> just shipped with that too.
>
> On Fri, Jan 3, 2025 at 1:47 PM Benedict  wrote:
>
>> I’ve had a quick skim of the data sketches library, and it does seem to
>> have made some more efficient decisions in its design than clearspring,
>> appears to maybe support off-heap representations, and has reasonably good
>> documentation about the theoretical properties of the sketches. The chair
>> of the project is a published author on the topic, and the library has
>> newer algorithms for cardinality estimation than HLL.
>>
>> So, honestly, it might not be a bad idea to (carefully) consider a
>> migration, even if the current library isn’t broken for our needs.
>>
>> It would not be high up my priority list for the project, but I would
>> support it if it scratches someone’s itch.
>>
>> On 3 Jan 2025, at 12:16, Štefan Miklošovič 
>> wrote:
>>
>> 
>> Okay ... first problems.
>>
>> These 2000 bytes I have mentioned in my response to Chris were indeed
>> correct, but that was with Datasketches and the main parameter for Hall
>> Sketch (DEFAULT_LG_K) was 12. When I changed that to 13 to match what we
>> currently have in Cassandra with Clearspring, that doubled the size to
>> ~4000 bytes.
>>
>> When we do not use Datasketches, what Clearspring generates is about
>> ~5000 bytes for the array itself but that array is wrapped into an
>> ICardinality object of Clearspring and we need that object in order to
>> merge another ICardinality into that. So, we would need to cache this
>> ICardinality object instead of just an array itself. If we don't cache
>> whole ICardinality, we would then need to do basically what
>> CompactionMetadata.CompactionMetadataSerializer.deserialize is doing which
>> would allocate a lot / often (ICardinality cardinality =
>> HyperLogLogPlus.Builder.build(that_cached_array)).
>>
>> To avoid the allocations every time we compute, we would just cache that
>> whole ICardinality of Clearspring, but that whole object measures like
>> 11/12 KB. So even 10k tables would occupy like 100MB. 50k tables 500MB.
>> That is becoming quite a problem.
>>
>> On the other hand, HllSketch of Datasketches, array included, adds
>> minimal overhead. Like an array has 5000 bytes and the whole object like
>> 5500. You got the idea ...
>>
>> If we are still OK with these sizes, sure ... I am just being transparent
>> about the consequences here.
>>
>> A user would just opt-in into this (by default it would be turned off).
>>
>> On the other hand, if we have 10k SSTables, reading that 10+KB from disk
>> takes around 2-3ms so we would read the disk 20/30 seconds every time we
>> would hit that metric (and we haven't even started to merge the logs).
>>
>> If this is still not something which would sell Datasketches as a viable
>> alternative then I guess we need to stick to these numbers and cache it all
>> with Clearspring, occupying way more memory.
>>
>> On Thu, Jan 2, 2025 at 10:15 PM Benedict  wrote:
>>
>>> I would like to see somebody who has some experience writing data
>>> structures, preferably someone we trust as a community to be competent at
>>> this (ie having some experience within the project contributing at this
>>> level), look at the code like they were at least lightly reviewing the
>>> feature as a contribution to this project.
>>>
>>> This should be the bar for any new library really, but triply so for
>>> replacing a library that works fine.
>>>
>>> On 2 Jan 2025, at 21:02, Štefan Miklošovič 
>>> wrote:
>>>
>>> 
>>> Point 2) is pretty hard to fulfil, I can not imagine what would be
>>> "enough" for you to be persuaded. What should concretely happen? Because
>>> whoever comes and says "yeah this is a good lib, it works" is probably not
>>> going to be enough given the vague requirements you put under 2) You would
>>> like to see exactly what?
>>>
>>> The way it looks to me is to just shut it down because of perceived
>>> churn caused by that and there will always be some argument against that.
>>>
>>> Based on (1) I don't think what we have is bug free.
>>>
>>> Jeff:
>>>
>>> Thank you for that answer, I think

[DISCUSS] CASSANDRA-20163 DELETE partition IF static column condition is currently blocked

2025-01-03 Thread David Capwell
As part of the Accord work we have been extending the harry models to support 
even more of the CQL domain and added more test coverage for different 
features; in doing so it found the following query is currently blocked

— delete partition if static column is in a given state
— confirmed that this does not lead to a full partition read in CAS, but only a 
static row read
— We already support delete static column if static condition, which has the 
same read cost as delete partition if static condition
DELETE
FROM tbl
WHERE pk = ? — pk is the only partition key, but there are clustering keys
IF s0 = ? — s0 is static


I took a stab at fixing this in 
https://issues.apache.org/jira/browse/CASSANDRA-20163 and speaking with 
Benedict it was deemed that we should bring this up to the ML for visibility.

In 2.0.11/2.1.1 https://issues.apache.org/jira/browse/CASSANDRA-6430 was added 
which blocked this type of query.  The main argument seems to be that some 
deletes are confusing so should block instead ((assuming static columns) delete 
regular column if the partition exists, delete if regular column matches 
condition, etc.), but I argue that the case above isn’t ambiguous as we are 
working with static columns, so implicitly we are working at the partition 
level; aka partition delete.

Are people ok with this change?

Another case that was found in testing was

— delete partition if exists
DELETE
FROM tbl
WHERE pk = ? — pk is the only partition key, but there are clustering keys
IF EXISTS

I confirmed that this case does a full partition read (even though it doesn’t 
need to, just needs liveness info), my patch keeps this blocked not because its 
ambiguous, but because it has an unbounded cost (if you do this to a 1tb 
partition, you read 1tb… or try to at least…).  We could fix this but felt was 
out of scope of the work above.

Thanks!

[VOTE] Release Apache Cassandra Java Driver 3.12.0

2025-01-03 Thread Bret McGuire
Greetings all!

   I’m proposing the Cassandra Java Driver 3.12.0 for release.

sha1: dd437b2a973dbccb415a612c4c997659edffd638

git: https://github.com/apache/cassandra-java-driver/tree/3.x

Maven Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1353

   The Source release is available here:

https://dist.apache.org/repos/dist/dev/cassandra/cassandra-java-driver/3.12.0/

   This is the first release of the 3.x Java driver since its donation.
This release is nearly identical to the last DataStax release (3.11.5
).
The full changelog can be found at
https://github.com/apache/cassandra-java-driver/tree/3.x/changelog#3120

   The vote will be open for 120 hours (longer if needed) due to the
upcoming weekend. Everyone who has tested the build is invited to vote.
Votes by PMC members are considered binding. A vote passes if there are at
least three binding +1s and no -1's.

   Thanks!


- Bret -


Re: [DISCUSS] Replacement of SSTable's partition cardinality implementation from stream-lib to Apache Datasketches

2025-01-03 Thread Benedict
I’ve had a quick skim of the data sketches library, and it does seem to have made some more efficient decisions in its design than clearspring, appears to maybe support off-heap representations, and has reasonably good documentation about the theoretical properties of the sketches. The chair of the project is a published author on the topic, and the library has newer algorithms for cardinality estimation than HLL.So, honestly, it might not be a bad idea to (carefully) consider a migration, even if the current library isn’t broken for our needs.It would not be high up my priority list for the project, but I would support it if it scratches someone’s itch.On 3 Jan 2025, at 12:16, Štefan Miklošovič  wrote:Okay ... first problems.These 2000 bytes I have mentioned in my response to Chris were indeed correct, but that was with Datasketches and the main parameter for Hall Sketch (DEFAULT_LG_K) was 12. When I changed that to 13 to match what we currently have in Cassandra with Clearspring, that doubled the size to ~4000 bytes.When we do not use Datasketches, what Clearspring generates is about ~5000 bytes for the array itself but that array is wrapped into an ICardinality object of Clearspring and we need that object in order to merge another ICardinality into that. So, we would need to cache this ICardinality object instead of just an array itself. If we don't cache whole ICardinality, we would then need to do basically what CompactionMetadata.CompactionMetadataSerializer.deserialize is doing which would allocate a lot / often (ICardinality cardinality = HyperLogLogPlus.Builder.build(that_cached_array)).To avoid the allocations every time we compute, we would just cache that whole ICardinality of Clearspring, but that whole object measures like 11/12 KB. So even 10k tables would occupy like 100MB. 50k tables 500MB. That is becoming quite a problem. On the other hand, HllSketch of Datasketches, array included, adds minimal overhead. Like an array has 5000 bytes and the whole object like 5500. You got the idea ... If we are still OK with these sizes, sure ... I am just being transparent about the consequences here.A user would just opt-in into this (by default it would be turned off).On the other hand, if we have 10k SSTables, reading that 10+KB from disk takes around 2-3ms so we would read the disk 20/30 seconds every time we would hit that metric (and we haven't even started to merge the logs).If this is still not something which would sell Datasketches as a viable alternative then I guess we need to stick to these numbers and cache it all with Clearspring, occupying way more memory.On Thu, Jan 2, 2025 at 10:15 PM Benedict  wrote:I would like to see somebody who has some experience writing data structures, preferably someone we trust as a community to be competent at this (ie having some experience within the project contributing at this level), look at the code like they were at least lightly reviewing the feature as a contribution to this project.This should be the bar for any new library really, but triply so for replacing a library that works fine. On 2 Jan 2025, at 21:02, Štefan Miklošovič  wrote:Point 2) is pretty hard to fulfil, I can not imagine what would be "enough" for you to be persuaded. What should concretely happen? Because whoever comes and says "yeah this is a good lib, it works" is probably not going to be enough given the vague requirements you put under 2) You would like to see exactly what?The way it looks to me is to just shut it down because of perceived churn caused by that and there will always be some argument against that.Based on (1) I don't think what we have is bug free.Jeff: Thank you for that answer, I think we are on the same page that caching it is just fine, that's what I got from your last two paragraphs.So the path from here is1) add datasketches and cache2) don't add datasketches and cache it anyway The introduction of datasketches lib is not the absolute must in order to achieve that, we can cache / compute it parallel with Clearspring as well, it is just a bitter-sweet solution which just doesn't feel right.(1) https://github.com/addthis/stream-lib/issuesOn Thu, Jan 2, 2025 at 9:26 PM Benedict  wrote:Your message seemed to be all about the caching proposal, which I have proposed we separate, hence my confusion.To restate my answer to your question, I think that unless the new library actually offers us concrete benefits we can point to that we actually care about then yes it’s a bad idea to incur the churn of migration.I’m not inherently opposed to a migration but simply “new is better” is just plain wrong. Nothing you’ve presented yet convinces me this library is worth the effort of vetting given our current solution works fine. My position is that for any new library we should:1) Point to something it solves that we actually want and is worth the time investment2) Solicit folk in the community competent i

Re: [DISCUSS] Replacement of SSTable's partition cardinality implementation from stream-lib to Apache Datasketches

2025-01-03 Thread Brian Proffitt
Dmitry:

You are using a section of the Confluence wiki that is dedicated to Community 
Over Code, the Apache Conference. Please move that page to a more appropriate 
part of the Apache wiki as soon as you can.

Thanks!
BKP

On 2025/01/03 13:55:49 Dmitry Konstantinov wrote:
> I have summarized information from this mail thread to
> https://cwiki.apache.org/confluence/display/COC/SSTable%27s+partition+cardinality+implementation
> Probably later it can be transformed to a CEP..
> Regarding experience of DataSketches library's authors and publications
> here there is a good summary in Background section:
> https://cwiki.apache.org/confluence/display/INCUBATOR/DataSketchesProposal
> . It looks good..
> 
> On Fri, 3 Jan 2025 at 13:06, Štefan Miklošovič 
> wrote:
> 
> > Right ... that sounds reasonable. Let's "sleep on it" for a while. It is
> > not something which is urgent to deal with right now but I find myself
> > quite often to identify the functionality where we go to the disk more
> > often than necessary and this was next on the list to take a look at
> > reading CASSANDRA-13338. So I took a look ... and here we are.
> >
> > If you guys go to bump SSTable version in 5.1 / 6.0, this change might be
> > just shipped with that too.
> >
> > On Fri, Jan 3, 2025 at 1:47 PM Benedict  wrote:
> >
> >> I’ve had a quick skim of the data sketches library, and it does seem to
> >> have made some more efficient decisions in its design than clearspring,
> >> appears to maybe support off-heap representations, and has reasonably good
> >> documentation about the theoretical properties of the sketches. The chair
> >> of the project is a published author on the topic, and the library has
> >> newer algorithms for cardinality estimation than HLL.
> >>
> >> So, honestly, it might not be a bad idea to (carefully) consider a
> >> migration, even if the current library isn’t broken for our needs.
> >>
> >> It would not be high up my priority list for the project, but I would
> >> support it if it scratches someone’s itch.
> >>
> >> On 3 Jan 2025, at 12:16, Štefan Miklošovič 
> >> wrote:
> >>
> >> 
> >> Okay ... first problems.
> >>
> >> These 2000 bytes I have mentioned in my response to Chris were indeed
> >> correct, but that was with Datasketches and the main parameter for Hall
> >> Sketch (DEFAULT_LG_K) was 12. When I changed that to 13 to match what we
> >> currently have in Cassandra with Clearspring, that doubled the size to
> >> ~4000 bytes.
> >>
> >> When we do not use Datasketches, what Clearspring generates is about
> >> ~5000 bytes for the array itself but that array is wrapped into an
> >> ICardinality object of Clearspring and we need that object in order to
> >> merge another ICardinality into that. So, we would need to cache this
> >> ICardinality object instead of just an array itself. If we don't cache
> >> whole ICardinality, we would then need to do basically what
> >> CompactionMetadata.CompactionMetadataSerializer.deserialize is doing which
> >> would allocate a lot / often (ICardinality cardinality =
> >> HyperLogLogPlus.Builder.build(that_cached_array)).
> >>
> >> To avoid the allocations every time we compute, we would just cache that
> >> whole ICardinality of Clearspring, but that whole object measures like
> >> 11/12 KB. So even 10k tables would occupy like 100MB. 50k tables 500MB.
> >> That is becoming quite a problem.
> >>
> >> On the other hand, HllSketch of Datasketches, array included, adds
> >> minimal overhead. Like an array has 5000 bytes and the whole object like
> >> 5500. You got the idea ...
> >>
> >> If we are still OK with these sizes, sure ... I am just being transparent
> >> about the consequences here.
> >>
> >> A user would just opt-in into this (by default it would be turned off).
> >>
> >> On the other hand, if we have 10k SSTables, reading that 10+KB from disk
> >> takes around 2-3ms so we would read the disk 20/30 seconds every time we
> >> would hit that metric (and we haven't even started to merge the logs).
> >>
> >> If this is still not something which would sell Datasketches as a viable
> >> alternative then I guess we need to stick to these numbers and cache it all
> >> with Clearspring, occupying way more memory.
> >>
> >> On Thu, Jan 2, 2025 at 10:15 PM Benedict  wrote:
> >>
> >>> I would like to see somebody who has some experience writing data
> >>> structures, preferably someone we trust as a community to be competent at
> >>> this (ie having some experience within the project contributing at this
> >>> level), look at the code like they were at least lightly reviewing the
> >>> feature as a contribution to this project.
> >>>
> >>> This should be the bar for any new library really, but triply so for
> >>> replacing a library that works fine.
> >>>
> >>> On 2 Jan 2025, at 21:02, Štefan Miklošovič 
> >>> wrote:
> >>>
> >>> 
> >>> Point 2) is pretty hard to fulfil, I can not imagine what would be
> >>> "enough" for you to be persuaded. What should concretely happen? 

Re: [DISCUSS] Replacement of SSTable's partition cardinality implementation from stream-lib to Apache Datasketches

2025-01-03 Thread Štefan Miklošovič
Okay ... first problems.

These 2000 bytes I have mentioned in my response to Chris were indeed
correct, but that was with Datasketches and the main parameter for Hall
Sketch (DEFAULT_LG_K) was 12. When I changed that to 13 to match what we
currently have in Cassandra with Clearspring, that doubled the size to
~4000 bytes.

When we do not use Datasketches, what Clearspring generates is about ~5000
bytes for the array itself but that array is wrapped into an ICardinality
object of Clearspring and we need that object in order to merge another
ICardinality into that. So, we would need to cache this ICardinality object
instead of just an array itself. If we don't cache whole ICardinality, we
would then need to do basically what
CompactionMetadata.CompactionMetadataSerializer.deserialize is doing which
would allocate a lot / often (ICardinality cardinality =
HyperLogLogPlus.Builder.build(that_cached_array)).

To avoid the allocations every time we compute, we would just cache that
whole ICardinality of Clearspring, but that whole object measures like
11/12 KB. So even 10k tables would occupy like 100MB. 50k tables 500MB.
That is becoming quite a problem.

On the other hand, HllSketch of Datasketches, array included, adds minimal
overhead. Like an array has 5000 bytes and the whole object like 5500. You
got the idea ...

If we are still OK with these sizes, sure ... I am just being transparent
about the consequences here.

A user would just opt-in into this (by default it would be turned off).

On the other hand, if we have 10k SSTables, reading that 10+KB from disk
takes around 2-3ms so we would read the disk 20/30 seconds every time we
would hit that metric (and we haven't even started to merge the logs).

If this is still not something which would sell Datasketches as a viable
alternative then I guess we need to stick to these numbers and cache it all
with Clearspring, occupying way more memory.

On Thu, Jan 2, 2025 at 10:15 PM Benedict  wrote:

> I would like to see somebody who has some experience writing data
> structures, preferably someone we trust as a community to be competent at
> this (ie having some experience within the project contributing at this
> level), look at the code like they were at least lightly reviewing the
> feature as a contribution to this project.
>
> This should be the bar for any new library really, but triply so for
> replacing a library that works fine.
>
> On 2 Jan 2025, at 21:02, Štefan Miklošovič  wrote:
>
> 
> Point 2) is pretty hard to fulfil, I can not imagine what would be
> "enough" for you to be persuaded. What should concretely happen? Because
> whoever comes and says "yeah this is a good lib, it works" is probably not
> going to be enough given the vague requirements you put under 2) You would
> like to see exactly what?
>
> The way it looks to me is to just shut it down because of perceived churn
> caused by that and there will always be some argument against that.
>
> Based on (1) I don't think what we have is bug free.
>
> Jeff:
>
> Thank you for that answer, I think we are on the same page that caching it
> is just fine, that's what I got from your last two paragraphs.
>
> So the path from here is
>
> 1) add datasketches and cache
> 2) don't add datasketches and cache it anyway
>
> The introduction of datasketches lib is not the absolute must in order to
> achieve that, we can cache / compute it parallel with Clearspring as well,
> it is just a bitter-sweet solution which just doesn't feel right.
>
> (1) https://github.com/addthis/stream-lib/issues
>
> On Thu, Jan 2, 2025 at 9:26 PM Benedict  wrote:
>
>> Your message seemed to be all about the caching proposal, which I have
>> proposed we separate, hence my confusion.
>>
>> To restate my answer to your question, I think that unless the new
>> library actually offers us concrete benefits we can point to that we
>> actually care about then yes it’s a bad idea to incur the churn of
>> migration.
>>
>> I’m not inherently opposed to a migration but simply “new is better” is
>> just plain wrong. Nothing you’ve presented yet convinces me this library is
>> worth the effort of vetting given our current solution works fine.
>>
>> My position is that for any new library we should:
>>
>> 1) Point to something it solves that we actually want and is worth the
>> time investment
>> 2) Solicit folk in the community competent in the relevant data
>> structures to vet the library for the proposed functionality
>>
>> The existing solution never went through (2) because it dates from the
>> dark ages where we just threw dependencies in willynilly. But it has the
>> benefit of having been used for a very long time without incident.
>>
>>
>>
>> On 2 Jan 2025, at 20:12, Štefan Miklošovič 
>> wrote:
>>
>> 
>> Hi Benedict,
>>
>> you wrote:
>>
>> I am strongly opposed to updating libraries simply for the sake of it.
>> Something like HLL does not need much ongoing maintenance if it works.
>> We’re simply asking for extra work and bugs 

Re: [DISCUSS] CASSANDRA-20163 DELETE partition IF static column condition is currently blocked

2025-01-03 Thread J. D. Jordan
SGTM.

Also I think you do actually need to resolve the full partition for the second 
one. You have to merge tombstones+columns from all replicas to decide if the 
partition exists. It’s the same reason we have to read all row data for a given 
row during regular reads to decide if the row exists.  Even if only one column 
is selected.

> On Jan 3, 2025, at 3:23 PM, David Capwell  wrote:
> 
> As part of the Accord work we have been extending the harry models to 
> support even more of the CQL domain and added more test coverage for 
> different features; in doing so it found the following query is currently 
> blocked
> 
> — delete partition if static column is in a given state
> — confirmed that this does not lead to a full partition read in CAS, but only 
> a static row read
> — We already support delete static column if static condition, which has the 
> same read cost as delete partition if static condition
> DELETE
> FROM tbl
> WHERE pk = ? — pk is the only partition key, but there are clustering keys
> IF s0 = ? — s0 is static
> 
> 
> I took a stab at fixing this in 
> https://issues.apache.org/jira/browse/CASSANDRA-20163 and speaking with 
> Benedict it was deemed that we should bring this up to the ML for visibility.
> 
> In 2.0.11/2.1.1 https://issues.apache.org/jira/browse/CASSANDRA-6430 was 
> added which blocked this type of query.  The main argument seems to be that 
> some deletes are confusing so should block instead ((assuming static columns) 
> delete regular column if the partition exists, delete if regular column 
> matches condition, etc.), but I argue that the case above isn’t ambiguous as 
> we are working with static columns, so implicitly we are working at the 
> partition level; aka partition delete.
> 
> Are people ok with this change?
> 
> Another case that was found in testing was
> 
> — delete partition if exists
> DELETE
> FROM tbl
> WHERE pk = ? — pk is the only partition key, but there are clustering keys
> IF EXISTS
> 
> I confirmed that this case does a full partition read (even though it doesn’t 
> need to, just needs liveness info), my patch keeps this blocked not because 
> its ambiguous, but because it has an unbounded cost (if you do this to a 1tb 
> partition, you read 1tb… or try to at least…).  We could fix this but felt 
> was out of scope of the work above.
> 
> Thanks!