I would like to see somebody who has some experience writing data structures, preferably someone we trust as a community to be competent at this (ie having some experience within the project contributing at this level), look at the code like they were at least lightly reviewing the feature as a con
Point 2) is pretty hard to fulfil, I can not imagine what would be "enough"
for you to be persuaded. What should concretely happen? Because whoever
comes and says "yeah this is a good lib, it works" is probably not going to
be enough given the vague requirements you put under 2) You would like to
s
Your message seemed to be all about the caching proposal, which I have proposed we separate, hence my confusion.To restate my answer to your question, I think that unless the new library actually offers us concrete benefits we can point to that we actually care about then yes it’s a bad idea to inc
I think it makes sense to make the options more clear, I would suggest a
Google sheet or a table within a JIRA ticket with options and comparison
(it looks like majority of confusion in this topic is caused by different
ways to interpret the suggestion :-) )
I see a table like this:
+--
I’m going to type a lot of extra words mostly for people just barely familiar
with this part of the codebase, because it may or may not be useful to passive
observers (it wasn’t to me, so I’m mostly echo’ing the things I just went and
learned this morning):
The HLL cardinality is used for bas
Hi Benedict,
you wrote:
I am strongly opposed to updating libraries simply for the sake of it.
Something like HLL does not need much ongoing maintenance if it works.
We’re simply asking for extra work and bugs by switching, and some risk
without understanding the quality control for the new libra
Hi,
Can somebody help with reviewing of
https://issues.apache.org/jira/browse/CASSANDRA-20132.
When tombstones are expired they become almost invisible from a monitoring
point view: you do not see them in metrics and tracing except a latency
impact. I have observed such cases in production when co
-> about 800 live SSTables
Well, that would occupy 1.5MB of hyperloglogs each having 2000 bytes.
That's peanuts. Instead of going 800 times to the disk every minute.
On Thu, Jan 2, 2025 at 8:18 PM Chris Lohfink wrote:
> > Regarding allocation details. The DB host had the following stats at
> th
I’m confused Stefan, in what way do you protest? How is your proposal to cache these collections tied to the topic you started here? This should be a separate proposal, discussed on its own merits independently, should it not?I am not opposed to it happening, only to conflating the two concerns.On
> Regarding allocation details. The DB host had the following stats at that
time: 5K/sec local reads, 3K/sec local writes, about 800 live SSTables, the
profile was collected with duration = 5 minutes. I do not have an
allocation rate info for that time period.
What was the allocation rate on heap
Let me clarify my comment regarding allocation. I am not saying that
switching to another implementation will make it better and we need to do
it right now :-), any such switch is a subject for pros/cons analysis (and
memory allocation I think should be one of criteria). What I wanted to say:
this
I am strongly opposed to updating libraries simply for the sake of it. Something like HLL does not need much ongoing maintenance if it works. We’re simply asking for extra work and bugs by switching, and some risk without understanding the quality control for the new library project’s releases.That
Indeed, I plan to measure it and compare, maybe some bench test would be
cool to add ..
I strongly suspect that the primary reason for the slowness (if it is
verified to be true) is us going to the disk every time and reading stats
for every SSTable all over again.
While datasketches say that it
Sounds interesting. I took a look at the issue but I'm not seeing any data
to back up "expensive". Can this be quantified a bit more?
Anytime we have a performance related issue, there should be some data to
back it up, even if it seems obvious.
Jon
On Thu, Jan 2, 2025 at 8:20 AM Štefan Mikloš
Hello,
I just stumbled upon this library we are using for getting estimations of
the number of partitions in a SSTable which are used e.g. in
EstimatedPartitionCount metric. (1)
A user reported in (1) that it is an expensive operation. When one looks
into what it is doing, it calls SSTableReader.
15 matches
Mail list logo