Re: [VOTE] Release Apache Cassandra 4.1.0 (take2)

2022-12-12 Thread Benedict
I’m unsure that without more information it is very helpful to highlight in the release notes. We don’t even have a strong hypothesis tying this issue to 4.1.0 specifically, and don’t have a general policy of highlighting undiagnosed issues in release notes? > On 13 Dec 2022, at 00:48, Jon Me

Re: [DISCUSSION] New dependencies for SAI CEP-7

2022-12-14 Thread Benedict
I don’t believe we are ready to be prescriptive about how our randomised tests are written.1) We want as many people to write randomised tests as possible, so do not want to create impediments.2) We don’t, I expect, all agree on what a good randomised test looks like.I think Mike should include car

Re: [VOTE] CEP-25: Trie-indexed SSTable format

2022-12-19 Thread Benedict
+1 > On 19 Dec 2022, at 13:00, Branimir Lambov wrote: > >  > Hi everyone, > > I'd like to propose CEP-25 for approval. > > Proposal: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-25%3A+Trie-indexed+SSTable+format > Discussion: https://lists.apache.org/thread/3dpdg6dgm3rqxj96cyh

Re: [DISCUSS] CEP-26: Unified Compaction Strategy

2022-12-21 Thread Benedict
I’m personally very excited by this work. Compaction could do with a spring clean and this feels to formalise things much more cleanly, but density tiering in particular is something I’ve wanted to incorporate for years now, as it should significantly improve STCS behaviour (most importantly reduci

Re: [DISCUSSION] Cassandra's code style and source code analysis

2022-12-22 Thread Benedict
I like 3 or 4. We need to be sure we have a way of deactivating the check with code comments tho, as Java 8 has some bug with import order that can rarely break compilation, so we need to have some mechanism for permitting a different import order. Did we decide any changes to star imports?

Merging CEP-15 to trunk

2023-01-16 Thread Benedict
Hi Everyone, I hope you all had a lovely holiday period. Those who have been following along will have seen a steady drip of progress into the cep-15-accord feature branch over the past year. We originally discussed that feature branches would merge periodically into trunk, and we are long ove

Intra-project dependencies

2023-01-16 Thread Benedict
Those of us who have developed the in-jvm-dtest-api will know that the project’s approach to developing libraries is untenable for more complex projects. Accord makes this a pressing concern, but we would also benefit from separating utilities to their own library for use by the dtest-api and Ac

Re: Intra-project dependencies

2023-01-16 Thread Benedict
I guess option 5 is what we have today in cep-15, have the build file grab the relevant SHA for the library. This way you maintain a precise SHA for builds and scripts don’t have to be modified. I believe this is also possible with git submodules, but I’m happy to bake this into our build file

Re: Merging CEP-15 to trunk

2023-01-16 Thread Benedict
s there been review and +1 by two committer? > > If the code in the feature branch meets all of the merging criteria of the > project then I see no reason to keep it in a feature branch for ever. > > -Jeremiah > > >> On Jan 16, 2023, at 3:21 AM, Benedict wrote: >

Re: Intra-project dependencies

2023-01-16 Thread Benedict
les to build? https://git-scm.com/docs/git-submodule/#Documentation/git-submodule.txt-status--cached--recursive--ltpathgt82308203It seems like our use case is one of the primary ones git submodules are designed to address.On Mon, Jan 16, 2023, at 6:40 AM, Benedict wrote:I guess option 5 is what we have

Re: Merging CEP-15 to trunk

2023-01-16 Thread Benedict
mental?1. Same tests pass on the branch as to the root it's merging back to2. 2 committers eyes on (author + reviewer or 2 reviewers, etc)3. Disabled by default w/flag to enableSo really only the 3rd thing is different right? Probably ought to add an informal step 4 which Benedict is doing here

Re: Merging CEP-15 to trunk

2023-01-16 Thread Benedict
, so that the syntax is known to the users and they can quickly get into speed, hopefully reporting any problems soon.- - -- --- - -Jacek LewandowskiOn Mon, 16 Jan 2023 at 17:52, Benedict <bened...@apache.org> wrote:That’s fair, though for long term contributors probab

Re: Intra-project dependencies

2023-01-16 Thread Benedict
k that is consuming a Cassandra version, while for Accord it's Cassandra that depends on a specific Accord version. Because of this, the same solution may or may not be right for both of them.henrikOn Mon, Jan 16, 2023 at 6:44 PM Benedict <bened...@apache.org> wrote:How often have we modif

Re: Intra-project dependencies

2023-01-16 Thread Benedict
 Benedict, experience based on developing one feature against one branch doesn't face the problems of working, and switching frequently, between branches.Mick, please take a look at the ongoing development. Over the last week I have been actively developing five separate PRs against

Re: Intra-project dependencies

2023-01-17 Thread Benedict
The answer to all your questions is “like any other library” - this is a procedural hack to ease development. There are alternative isomorphic hacks, like compiling source jars from Accord and including them in the C* tree, if it helps your mental model. > you stated that a goal was to avoid ma

Re: Merging CEP-15 to trunk

2023-01-17 Thread Benedict
> but the pre-commit gateway here is higher than the previous tickets being > worked on Which tickets, and why? > On 17 Jan 2023, at 07:43, Mick Semb Wever wrote: > >  > > >> Could you file a bug report with more detail about which classes you think >> are lacking adequate documentation in

Re: Intra-project dependencies

2023-01-17 Thread Benedict
ery project I've worked with that uses > submodules you would never use HEAD, because the submodule itself > already records the *exact* commit associated with the parent. > > Cheers, > > Derek > > On Tue, Jan 17, 2023 at 2:28 AM Benedict <bened...@apache.org> w

Re: Intra-project dependencies

2023-01-17 Thread Benedict
> You would reference the snapshot dependency by the timestamped snapshot. This > makes it a reproducible build. How confident are we that the repository will not alter or delete them? > linking in the source code into in-tree is a significant thing to do Could you explain why? I thought your p

Re: Intra-project dependencies

2023-01-18 Thread Benedict
> Linking or merging while it is still also being a separate library and repo. I am still unclear why you think this is “a significant thing”? > On 18 Jan 2023, at 10:41, Mick Semb Wever wrote: > >  > > >>> You would reference the snapshot dependency by the timestamped snapshot. >>> This ma

Re: Intra-project dependencies

2023-01-18 Thread Benedict
ld be best if the project had automation to make sure everyone “does the right thing”?On Jan 18, 2023, at 3:06 AM, Benedict <bened...@apache.org> wrote:Linking or merging while it is still also being a separate library and repo.I am still unclear why you think this is “a significant thing”?On 18

Re: Merging CEP-15 to trunk

2023-01-20 Thread Benedict
wrote:On Tue, 17 Jan 2023 at 10:29, Benedict <bened...@apache.org> wrote:but the pre-commit gateway here is higher than the previous tickets being worked onWhich tickets, and why?All tickets resolved in the feature branch to which you are now bringing from feature branch into trunk. A quick scan

Re: Merging CEP-15 to trunk

2023-01-23 Thread Benedict
There is no merge-then-review. The work has been reviewed. This is identical to how reviews work as normal. If it helps your mental model, consider this a convenient atomic merge of many Jira that have independently met the standard project procedural requirements, as that is what it is. Squas

Re: Merging CEP-15 to trunk

2023-01-24 Thread Benedict
No, that is not the normal process. What is it you think you would be reviewing? There are no diffs produced as part of rebasing, and the purpose of review is to ensure code meets out standards, not that the committer is competent at rebasing or squashing. Nor are you familiar with the code as i

Re: Merging CEP-15 to trunk

2023-01-24 Thread Benedict
tedious, but has the benefit of making explicit what was only a change due to rebasing.) Depending on which approach you take when rebasing, a reviewer would then review accordingly.henrikOn Tue, Jan 24, 2023 at 11:14 AM Benedict <bened...@apache.org> wrote:No, that is not the normal process

Re: Merging CEP-15 to trunk

2023-01-25 Thread Benedict
he person to review all active feature branches just in case.As for 2 and 3, I certainly observe an assumption that contributors have expected to review after a rebase. But I don't see this as a significant topic to argue about. If indeed the rebase is as easy as Benedict advertised, then we sh

Re: Merging CEP-15 to trunk

2023-01-27 Thread Benedict
case is encouraging, as it also suggests the changes to Cassandra code are less invasive than maybe I and others had imagined.henrikOn Wed, Jan 25, 2023 at 1:51 PM Benedict <bened...@apache.org> wrote:contributors who didn't actively work on Accord, have assumed that they will be invited to review

Internal Documentation Contribution Guidance

2023-01-30 Thread Benedict
During public and private discussions around CEP-15, it became apparent that we lack any guidance on internal documentation and commenting in the “style guide” - which I also propose we rename to “Contribution Guide” or “Contribution and Style Guide" to better describe the broader role it has ta

Re: Internal Documentation Contribution Guidance

2023-01-30 Thread Benedict
Apologies, I think it should be opened up for comments now.On 30 Jan 2023, at 11:29, Ekaterina Dimitrova wrote:Thank you Benedict! Can you, please, give us comment access to the doc?On Mon, 30 Jan 2023 at 6:14, Benedict <bened...@apache.org> wrote:During public and private discussions arou

Re: Merging CEP-15 to trunk

2023-01-30 Thread Benedict
uple will likely yet (there isn't a rush). But it is novel to propose that such optional reviews be treated as blocking.On 30 Jan 2023, at 23:04, Henrik Ingo wrote:Ooops, I missed copy pasting this reply into my previous email:On Fri, Jan 27, 2023 at 11:21 PM Benedict <bened...@apache

Re: Merging CEP-15 to trunk

2023-01-30 Thread Benedict
comments. On Mon, 30 Jan 2023 at 20:16, Benedict <bened...@apache.org> wrote:Review should primarily ask: "is this correct?" and "could this be done differently (clearer, faster, more correct, etc)?" Blocking reviews especially, because why else would a reasonable contributor wa

Re: [DISCUSS] API modifications and when to raise a thread on the dev ML

2023-02-02 Thread Benedict
there. Should we consider them public APIs too, and require a DISCUSS thread for every change on them? Should that include new methods that wouldn't break compatibility?On Thu, 2 Feb 2023 at 09:29, Benedict Elliott Smith <bened...@apache.org> wrote:Closing the loop on seeking consens

Re: Cassandra 5.0 Documentation Plan

2023-02-02 Thread Benedict
This looks good to me, thanks Lorina > On 1 Feb 2023, at 19:24, Lorina Poland wrote: > >  > Hey all - > > I presented a potential Docs Information Architecture recently, and promised > a Doc Plan for the upcoming C* 5.0 release. Please give me feedback, > especially if you feel that the prio

Re: [DISCUSS] API modifications and when to raise a thread on the dev ML

2023-02-02 Thread Benedict
I think it’s fine to separate the systems from the policy? We are agreeing a policy for systems we want to make guarantees about to our users (regarding maintenance and compatibility)For me, this is (at minimum) CQL and virtual tables. But I don’t think the policy differs based on the contents of t

Re: [DISCUSS] API modifications and when to raise a thread on the dev ML

2023-02-02 Thread Benedict
move into that direction)On Thu, 2 Feb 2023 at 8:12, Benedict <bened...@apache.org> wrote:I think it’s fine to separate the systems from the policy? We are agreeing a policy for systems we want to make guarantees about to our users (regarding maintenance and compatibility)For me, this is (at m

Re: [DISCUSS] API modifications and when to raise a thread on the dev ML

2023-02-02 Thread Benedict
023 at 8:54, Benedict <bened...@apache.org> wrote:I think lazy consensus is fine for all of these things. If a DISCUSS thread is crickets, or just positive responses, then definitely it can proceed without further ceremony.I think “with heads-up to the mailing list” is very close to B? Only tha

Re: Implicitly enabling ALLOW FILTERING on virtual tables

2023-02-03 Thread Benedict
Why not introduce a general table option that toggles ALLOW FILTERING behaviour and just flip it for virtual tables we want this behaviour for? Users can do it too, for their own tables for which it’s suitable.On 3 Feb 2023, at 20:59, Andrés de la Peña wrote:For those eventual big virtual tables

Re: [VOTE] CEP-21 Transactional Cluster Metadata

2023-02-06 Thread Benedict
+1On 6 Feb 2023, at 16:17, Brandon Williams wrote:+1On Mon, Feb 6, 2023, 10:15 AM Sam Tunnicliffe wrote:Hi everyone,I would like to start a vote on this CEP.Proposal:https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+MetadataDiscussion:https://

Re: Downgradability

2023-02-20 Thread Benedict
In a self-organising community, things that aren’t self-policed naturally end up policed in an adhoc manner, and with difficulty. I’m not sure that’s the same as arbitrary enforcement. It seems to me the real issue is nobody noticed this was agreed and/or forgot and didn’t think about it much. But,

Re: Downgradability

2023-02-20 Thread Benedict
, so we have to make sure the data validity checks are consistent with the format we write. It isn’t as simple as writing an earlier version in this case (unless we permit truncating the TTL, perhaps) On 20 Feb 2023, at 20:24, Benedict wrote:In a self-organising community, things that aren’t self

Re: Downgradability

2023-02-21 Thread Benedict
n doc for the change.Also, if I should create a separate ticket from CASSANDRA-8110 for the clarity of the goal of the change, please let me know.On Tue, Feb 21, 2023 at 5:31 AM Benedict <bened...@apache.org> wrote:FWIW I think 8110 is the right approach, even if it isn’t a panacea. We will

Re: Downgradability

2023-02-21 Thread Benedict
cus (though Cassandra needs to implement the way to write SSTable in older versions, so it is somewhat related.)I'm preparing the design doc for the change.Also, if I should create a separate ticket from CASSANDRA-8110 for the clarity of the goal of the change, please let me know.On Tue, Feb 2

Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-02-22 Thread Benedict
Could you describe the issues? Config that is globally exposed should ideally be immutable with final members, in which case volatile is only necessary if you’re using the config parameter in a tight loop that you need to witness a new value - which shouldn’t apply to any of our config.There are so

Re: Downgradability

2023-02-22 Thread Benedict
Ok I will be honest, I was fairly sure we hadn’t yet broken downgrade - but I was wrong. CASSANDRA-18061 introduced a new column to a system table, which is a breaking change. But that’s it, as far as I can tell. I have run a downgrade test successfully after reverting that ticket, using the one li

Re: Downgradability

2023-02-22 Thread Benedict
21:23, Jeremiah D Jordan wrote:We have multiple tickets about to merge that introduce new on disk format changes.  I see no reason to block those indefinitely while we figure out how to do the on disk format downgrade stuff.-JeremiahOn Feb 22, 2023, at 3:12 PM, Benedict wrote:Ok I will be honest

Re: Downgradability

2023-02-23 Thread Benedict
Forget downgradeability for a moment: we should not be breaking format compatibility without good reason. Bumping a major version isn’t enough of a reason. Can somebody explain to me why this is being fought tooth and nail, when the work involved is absolutely minimal?Regarding tests: what more do

Re: Downgradability

2023-02-23 Thread Benedict
23, 2023 at 11:57 AM Benedict <bened...@apache.org> wrote:Can somebody explain to me why this is being fought tooth and nail, when the work involved is absolutely minimal?I don't know how each individual has been thinking about this, but it seems to me just looking at all the tasks th

Re: Downgradability

2023-02-23 Thread Benedict
her one considers "a switch" to exist already or not, might be subjective in this case, because people have different assumptions on the definition of done of such a switch.henrikOn Thu, Feb 23, 2023 at 2:53 PM Benedict <bened...@apache.org> wrote:I don’t think there’s anything ab

Re: [DISCUSS] Next release date

2023-02-28 Thread Benedict
I agree, we shouldn’t be branching annually, we should be releasing annually - and we shouldn’t assume that will take six months. We should be aiming for 1-2mo and so long as our trajectory towards that is good, I don’t think there’s anything to worry about (and we’ll get our first comparative data

Re: [DISCUSS] Allow UPDATE on settings virtual table to change running configuration

2023-03-01 Thread Benedict
econdsBound("30s”); > @Mutable > public double phi_convict_threshold = 8.0; > public String partitioner; // assume immutable by default? > > > > On Feb 22, 2023, at 6:20 AM, Benedict <bened...@apache.org> wrote: > > > > Could you describe the issues

Re: [DISCUSS] Next release date

2023-03-01 Thread Benedict
It doesn’t look like we agreed to a policy of annual branch dates, only annual releases and that we would schedule this for 4.1 based on 4.0’s branch date. Given this was the reasoning proposed I can see why folk would expect this would happen for the next release. I don’t think there was a stro

Re: Degradation of availability when using NTS and RF > number of racks

2023-03-07 Thread Benedict
My view is that if this is a pretty serious bug. I wonder if transactional metadata will make it possible to safely fix this for users without rebuilding (only via opt-in, of course). > On 7 Mar 2023, at 15:54, Miklosovic, Stefan > wrote: > > Thanks everybody for the feedback. > > I think t

Re: hsqldb test dependency in simulator code

2023-03-14 Thread Benedict
I’m sure we can use a different hash map there. > On 14 Mar 2023, at 11:49, Miklosovic, Stefan > wrote: > > Hi list, > > while removing Hadoop code in trunk, as agreed on ML recently, I did that but > we need to do this (1). By removing all Hadoop dependencies, we also removed > hsqldb libr

Re: [DISCUSS] cep-15-accord, cep-21-tcm, and trunk

2023-03-24 Thread Benedict
Accord doesn’t have a hard dependency on CEP-21 fwiw, it just needs linearizable epochs. This could be achieved with a much more modest patch, essentially avoiding almost all of the insertion points of cep-21, just making sure that joining and leaving nodes update some state via Paxos instead of vi

Re: [DISCUSS] cep-15-accord, cep-21-tcm, and trunk

2023-03-24 Thread Benedict
iver both of these things on their own timetable seems like a pretty valuable thing assuming the lift required would be modest.On Fri, Mar 24, 2023, at 6:15 AM, Benedict wrote:Accord doesn’t have a hard dependency on CEP-21 fwiw, it just needs linearizable epochs. This could be achieved with a m

Re: [DISCUSS] cep-15-accord, cep-21-tcm, and trunk

2023-03-24 Thread Benedict
le thing assuming the lift required would be modest.On Fri, Mar 24, 2023, at 6:15 AM, Benedict wrote:Accord doesn’t have a hard dependency on CEP-21 fwiw, it just needs linearizable epochs. This could be achieved with a much more modest patch, essentially avoiding almost all of the insertion points

Re: [DISCUSS] cep-15-accord, cep-21-tcm, and trunk

2023-03-27 Thread Benedict
vid in the sense that getting ourselves planted on top of TCM as soon as possible is a good idea.On Mar 24, 2023, at 3:04 PM, Benedict <bened...@apache.org> wrote:It’s not even clear such an effort would need to be different from that used by cep-21. The point is that there’s not much point litig

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-03-28 Thread Benedict
Fwiw I’m sceptical of the performance angle long term. You can do a lot more to control QoS when you understand what each query is doing, and what your SLOs are. You can also more efficiently apportion your resources (not leaving any lying fallow to ensure it’s free later) But, we’re a long way

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-03-28 Thread Benedict
I disagree with the first claim, as the process has all the information it chooses to utilise about which resources it’s using and what it’s using those resources for.The inability to isolate GC domains is something we cannot address, but also probably not a problem if we were doing everything with

Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-06 Thread Benedict
KEYSPACE is fine. If we want to introduce a standard nomenclature like DATABASE that’s also fine. Inventing brand new ones is not fine, there’s no benefit. I think it would be fine to introduce some arbitrary unrelated concept for assigning tables with similar behaviours some configuration that

Re: [DISCUSS] Next release date

2023-04-18 Thread Benedict
Finally, I expect most Europeans to be on vacation 33% of that time. Non-Europeans may want to try it too!The more northerly Europeans maybe :)On 18 Apr 2023, at 01:24, Henrik Ingo wrote:Trying to collect a few loose ends from across this thread> I'm receptive to another definition of "stabilize"

Re: Adding vector search to SAI with heirarchical navigable small world graph index

2023-04-26 Thread Benedict
We probably at least need to bike shed naming as we already have FLOAT, DOUBLE, and LIST - which are similar/overlapping types, and we shoo on should be consistent.If we introduce FLOAT32 we probably need that to be an alias of FLOAT and introduce FLOAT64 to alias DOUBLE for consistency.DENSE seem

Re: [DISCUSS] New data type for vector search

2023-04-27 Thread Benedict
That’s a bounded ring buffer, not a fixed length array.This definitely isn’t a tuple because the types are all the same, which is pretty crucial for matrix operations. Matrix libraries generally work on arrays of known dimensionality, or sparse representations.Whether we draw any semantic link betw

Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Benedict
 Could even be NON NULL TYPE[size]On Apr 27, 2023, at 9:00 AM, Benedict wrote:That’s a bounded ring buffer, not a fixed length array.This definitely isn’t a tuple because the types are all the same, which is pretty crucial for matrix operations. Matrix libraries generally work on a

Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Benedict
) QUALIFIER TYPE[size] - QUALIFIER is just a Term we use to denote this semantic…. Could even be NON NULL TYPE[size]On Apr 27, 2023, at 9:00 AM, Benedict <bened...@apache.org> wrote:That’s a bounded ring buffer, not a fixed length array.This definitely isn’t a tuple because the types are all t

Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Benedict
tring and collection and other datatypes don't make sense, typical ordered indexes don't make sense, etc.  It's just a different beast from arrays, for a different use case.On Fri, Apr 28, 2023 at 10:40 AM Benedict <bened...@apache.org> wrote:But you’re proposing introducing a ge

Re: [DISCUSS] New data type for vector search

2023-04-28 Thread Benedict
ives us 99% of the value and limits the scope so we can deliver quickly.> 2. Add a vector type for floats or bytes. This gives us another 1% of value in exchange for an extra 20% or so of effort.Is it possible to implement 1 in a way that makes 2 possible in a future version?henrikhenrikOn Fri,

Re: [DISCUSS] New data type for vector search

2023-05-01 Thread Benedict
Has anybody yet claimed it would be hard? Several folk seem ready to jump to the conclusion that this would be onerous, but as somebody with a good understanding of the storage layer I can assert with reasonable confidence that it would not be. As previously stated, the implementation largely al

Re: [DISCUSS] New data type for vector search

2023-05-01 Thread Benedict
!  What you (David) and Benedict write beautifully supports `VECTOR FLOAT[n]` imho.You are definitely bringing up valid implementation details, and that can be dealt with during patch review. This thread is about the CQL API addition.  No matter which way the technical review goes with the

Re: [DISCUSS] New data type for vector search

2023-05-01 Thread Benedict
ieve it to be, I wouldn't recommend anyone go that route.On Mon, May 1, 2023, at 4:17 PM, Benedict wrote:I have explained repeatedly why I am opposed to ML-specific data types. If we want to make an ML-specific data type, it should be in an ML plug-in. We should not pollute the general purpose la

Re: [DISCUSS] New data type for vector search

2023-05-02 Thread Benedict
correctly -- are you saying that you're fine with a vector type, but you want to see it implemented as a special case of arrays, or that you are not fine with a vector type because you would prefer to only add arrays and that should be "good enough" for ML?On Mon, May 1, 2023 at 4

Re: [POLL] Vector type for ML

2023-05-02 Thread Benedict
This is not the poll I thought we would be conducting, and I don’t really support its framing. There are two parallel questions: what the functionality should be and how they should be exposed. This poll compresses the optionality poorly.Whether or not we support a “vector” concept (or something is

Re: [POLL] Vector type for ML

2023-05-02 Thread Benedict
Could folk voting against a general purpose type (that could well be called a vector) briefly explain their reasoning?We established in the other thread that it’s technically trivial, meaning folk must think it is strictly superior to only support float rather than eg all numeric types (note: for t

Re: [POLL] Vector type for ML

2023-05-02 Thread Benedict
ions are broken, I'm voting for what I think will be awesome for users sooner. PatrickOn Tue, May 2, 2023 at 12:29 PM Benedict <bened...@apache.org> wrote:Could folk voting against a general purpose type (that could well be called a vector) briefly explain their reasoning?We establishe

Re: [POLL] Vector type for ML

2023-05-04 Thread Benedict
Hurrah for initial agreement. For syntax, I think one option was just FLOAT[N]. In VECTOR FLOAT[N], VECTOR is redundant - FLOAT[N] is fully descriptive by itself. I don’t think VECTOR should be used to simply imply non-null, as this would be very unintuitive. More logical would be NONNULL, if t

Re: [POLL] Vector type for ML

2023-05-04 Thread Benedict
is a well known term in the ML space) keyword that could be used when the array is going to be used for ML workloads. This would be optional and would function similarly to FROZEN in that it would limit the functionality of the array to ML usage. On Thu, 4 May 2023 at 09:45, Benedict <be

Re: CEP-30: Approximate Nearest Neighbor(ANN) Vector Search via Storage-Attached Indexes

2023-05-09 Thread Benedict
HNSW can in principle be made into a distributed index. But that would be quite a different paradigm to SAI.On 9 May 2023, at 19:30, Patrick McFadin wrote:Under the goals section, there is this line:Scatter/gather across replicas, combining topK from each to get global topK.But what I'm hearing i

Re: [DISCUSS] The future of CREATE INDEX

2023-05-10 Thread Benedict
I’m not convinced by the changing defaults argument here. The characteristics of the two index types are very different, and users with scripts that make indexes today shouldn’t have their behaviour change.We could introduce new syntax that properly appreciates there’s no default index, perhaps CRE

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
This creates huge headaches for everyone successfully using 2i today though, and SAI *is not* guaranteed to perform as well or better - it has a very different performance profile.I think we should deprecate CREATE INDEX, and introduce new syntax CREATE LOCAL INDEX to make clear that this is not a

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
yntax as an alias for today’s CREATE INDEX, the latter to be deprecated and (in very distant future) removed.On 12 May 2023, at 13:14, Benedict <bened...@apache.org> wrote:This creates huge headaches for everyone successfully using 2i today though, and SAI *is not* guaranteed to perform as we

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
M INDEX. Support the syntax for the foreseeable future.Can we live w/ this?I don't think any information about SAI we could possibly acquire before a 5.0 release would affect the reasonableness of this much.On Fri, May 12, 2023 at 10:54 AM Benedict <bened...@apache.org> wrote:if we didn

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
all, right?What if we just do #2 and #3 and punt on everything else?On Fri, May 12, 2023 at 11:56 AM Benedict <bened...@apache.org> wrote:A table is not a local concept at all, it has a global primary index - that’s the core idea of Cassandra.I agree with Brandon that changing CQL behaviour li

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
y 12, 2023 at 12:09 PM Benedict <bened...@apache.org> wrote:If the performance characteristics are as clear cut as you think, then maybe it will be an easy decision once the evidence is available for everyone to consider?If not, then we probably can’t do the hard cutover and so the answer is s

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
ter.On Fri, May 12, 2023 at 12:09 PM Benedict <bened...@apache.org> wrote:If the performance characteristics are as clear cut as you think, then maybe it will be an easy decision once the evidence is available for everyone to consider?If not, then we probably can’t do the hard cutover and so t

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
on as a whole.On Fri, May 12, 2023 at 12:28 PM Benedict <bened...@apache.org> wrote:I’m not convinced a default index makes any sense, no. The trade-offs in a distributed setting are much more pronounced.Indexes in a local-only RDBMS are much simpler affairs; the trade offs are much more subtl

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
If folk should be reading up on the index type, doesn’t that conflict with your support of a default? Should there be different global and local defaults, once we have global indexes, or should we always default to a local index? Or a global one? > On 12 May 2023, at 18:39, Mick Semb Wever wro

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
e can add GLOBAL to the syntax...sry, working on an ugly poll...On Fri, May 12, 2023 at 1:24 PM Benedict <bened...@apache.org> wrote:If folk should be reading up on the index type, doesn’t that conflict with your support of a default?Should there be different global and local defaults, once we

Re: [DISCUSS] The future of CREATE INDEX

2023-05-12 Thread Benedict
Given we have no data in front of us to make a decision regarding switching defaults, I don’t think it is suitable to include that option in this poll. In fact, until we have sufficient data to discuss that I’m going to put a hard veto on that on technical grounds.On 12 May 2023, at 19:41, Caleb Ra

Re: [DISCUSS] The future of CREATE INDEX

2023-05-15 Thread Benedict
3: CREATE INDEX (Otherwise 2)NoIf configurable, should be a distributed configuration. This is very different to other local configurations, as the 2i selected has semantic implications, not just performance (and the perf implications are also much greater)On 15 May 2023, at 10:45, Mike Adamson w

Re: [DISCUSS] Feature branch version hygiene

2023-05-16 Thread Benedict
Copying my rely on the ticket… We have this discussion roughly once per major. If you look back through dev@ you'll find the last one a few years back. I don't recall NA ever being the approved approach, though. ".x" lines are target versions, whereas concrete versions are the ones a fix landed

Re: [DISCUSS] Feature branch version hygiene

2023-05-18 Thread Benedict
I don’t think we should over complicate this with special CEP release targets. If we do, they shouldn’t be versioned.My personal view is that 5.0 should not be used for any resolved tickets - they should go to 5.0-alpha1, since this is the correct release for them. 5.0 can then be the target versio

Re: [DISCUSS] Feature branch version hygiene

2023-05-18 Thread Benedict
So we just rename alpha1 to beta1 if that happens? Or, we point resolved tickets straight to 5.0.0, and add 5.0-alpha1 to any tickets with *only* 5.0.0 This is probably the easiest for folk to understand when browsing. Finding new features is easy either way - look for 5.0.0. > On 18 May 2023,

Re: [DISCUSS] Feature branch version hygiene

2023-05-18 Thread Benedict
s. When the parent epic gets a new FixVersion on resolution, all children get that FixVersion (i.e. when we merge the CEP and update its FixVersion, we bulk update all children tickets)On Thu, May 18, 2023, at 9:08 AM, Benedict wrote:I don’t think we should over complicate this with special CEP re

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-24 Thread Benedict
It’s not without hiccups, and I’m sure we have more to learn. But it mostly just works, and importantly it’s a million times better than the dtest-api process - which stymies development due to the friction.On 24 May 2023, at 08:39, Mick Semb Wever wrote:WRT git submodules and CASSANDRA-18204, ar

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-24 Thread Benedict
closer to the main codebase as a forcing function to smooth out the rough edges, integrate it, and make it a collective artifact and first class citizen IMO.I have similar opinions about the dtest-api.On Wed, May 24, 2023, at 4:05 AM, Benedict wrote:It’s not without hiccups, and I’m sure we have

Re: [DISCUSS] Bring cassandra-harry in tree as a submodule

2023-05-25 Thread Benedict
through…)On May 24, 2023, at 6:54 PM, Benedict wrote:In this case Harry is a testing module - it’s not something we will develop in tandem with C* releases, and we will want improvements to be applied across all branches.So it seems a natural fit for submodules to me.On 24 May 2023, at 21:09, Caleb

Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Benedict
Given they provide no data or explanation, and that benchmarking is hard, I’m not inclined to give much weight to their analysis.Agrona was favoured in large part due to the perceived quality of the library. I’m not inclined to swap it out without proper evidence the fastutils is both materially fa

Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Benedict
the stance taken before was that we should allow multiple versions and the best one will win eventually… so I am cool having the same stance for primitive collections...On May 25, 2023, at 8:50 AM, Benedict wrote:Given they provide no data or explanation, and that benchmarking is hard, I’m not

Re: Agrona vs fastutil and fastutil-concurrent-wrapper

2023-05-25 Thread Benedict
bias, nobody performed an exhaustive analysis of agrona in November.  If Branimir had proposed fastutils at the time that's what we'd be using today.On Thu, May 25, 2023 at 10:50 AM Benedict <bened...@apache.org> wrote:Given they provide no data or explanation, and that benchmar

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Benedict
I agree that this is more suitable as a paging option, and not as a CQL LIMIT option. If it were to be a CQL LIMIT option though, then it should be accurate regarding result set IMO; there shouldn’t be any further results that could have been returned within the LIMIT.On 12 Jun 2023, at 10:16, Benj

Re: Improved DeletionTime serialization to reduce disk size

2023-06-23 Thread Benedict
If we’re doing this, why don’t we delta encode a vint from some per-sstable minimum value? I’d expect that to commonly compress to a single byte or so. > On 23 Jun 2023, at 12:55, Aleksey Yeshchenko wrote: > > Distant future people will not be happy about this, I can already tell you > now. >

  1   2   3   4   5   6   7   8   >