There remains the question of what the new syntax is - whether it augments CREATE INDEX to replace CREATE CUSTOM INDEX or if we introduce new syntax because we think it’s clearer.
I can accept settling for modifying CREATE INDEX … USING, but I maintain that CREATE LOCAL <type> INDEX is better On 12 May 2023, at 18:31, Caleb Rackliffe <calebrackli...@gmail.com> wrote:
Even if we don't want to allow a default, we can keep the same CREATE INDEX syntax in place, and have a guardrail forcing (or not) the selection of an implementation, right? This would be no worse than the YAML option we already have for enabling 2i creation as a whole. I’m not convinced a default index makes any sense, no. The trade-offs in a distributed setting are much more pronounced.
Indexes in a local-only RDBMS are much simpler affairs; the trade offs are much more subtle than here. > Now, giving this thread, there is pushback for a config to allow default impl to change… but there is 0 pushback for new syntax to make this explicit…. So maybe we should [POLL] for what syntax people want?
I think the essential question is whether we want the concept of a default index. If we do, we need to figure that out now. If we don't then a new syntax that forces it becomes interesting.
Given it seems most DBs have a default index (see Postgres, etc.), I tend to lean toward having one, but that's me... I really dislike the idea of the same CQL doing different things based upon a per-node configuration.
I agree with Brandon that changing CQL behaviour like this based on node config is really not ideal.
I am cool adding such a config, and also cool keeping CREATE INDEX disabled by default…. But would like to point out that we have many configs that impact CQL and they are almost always local configs…
Is CREATE INDEX even allowed? This is a per node config. Right now you can block globally, enable on a single instance, create the index for your users, then revert the config change on the instance….
All guardrails that define what we can do are per node configs…
Now, giving this thread, there is pushback for a config to allow default impl to change… but there is 0 pushback for new syntax to make this explicit…. So maybe we should [POLL] for what syntax people want?
if we decide before the 5.0 release that we have enough information to change the default (#1), we can change it in a matter of minutes.
I am strongly against this… SAI is new for 5.0 so should be disabled by default; else we disrespect the idea that new features are disabled by default. I am cool with our docs recommending if we do find its better in most cases, but we should not change the default in the same reason it lands in.
I don't want to cut over for 5.0 either way. I was more contrasting a configurable cutover in 5.0 vs. a hard cutover later. If the performance characteristics are as clear cut as you think, then maybe it will be an easy decision once the evidence is available for everyone to consider?
If not, then we probably can’t do the hard cutover and so the answer is still pretty simple? I don't particularly like the YAML solution either, but absent that, we're back to fighting about whether we introduce entirely new syntax or hard cut over to SAI at some point.
We already have per-node configuration in the YAML that determines whether or not we can create a 2i at all, right?
What if we just do #2 and #3 and punt on everything else? A table is not a local concept at all, it has a global primary index - that’s the core idea of Cassandra.
I agree with Brandon that changing CQL behaviour like this based on node config is really not ideal. New syntax is by far the simplest and safest solution to this IMO. It doesn’t have to use the word LOCAL, but I think that’s anyway an improvement, personally.
In future we will hopefully offer GLOBAL indexes, and IMO it is better to reify the distinction in the syntax. We don't need to know everything about SAI's performance profile to plan and execute some small, reasonable things now for 5.0. I'm going to try to summarize the least controversial package of ideas from the discussion above. I've left out creating any new syntax. For example, I think CREATE LOCAL INDEX, while explicit, is just not necessary. We don't use CREATE LOCAL TABLE, although it has the same locality as our indexes.
Okay, so the proposal for 5.0...
1.) Add a YAML option that specifies a default implementation for CREATE INDEX, and make this the legacy 2i for now. No existing DDL breaks. We don't have to commit to the absolute superiority of SAI. 2.) Add USING...WITH... support to CREATE INDEX, so we don't have to go to market using CREATE CUSTOM INDEX, which feels...not so polished. (The backend for this already exists w/ CREATE CUSTOM INDEX.) 3.) Leave in place but deprecate (client warnings could work?) CREATE CUSTOM INDEX. Support the syntax for the foreseeable future.
Can we live w/ this?
I don't think any information about SAI we could possibly acquire before a 5.0 release would affect the reasonableness of this much.
if we didn't have copious amounts of (not all public, I know, working on it) evidence
If that’s the assumption on which this proposal is based, let’s discuss the evidence base first, as given the fundamentally different way they work (almost diametrically opposite), I would want to see a very high quality of evidence to support the claim.
I don’t think we can resolve this conversation effectively until this question is settled.
> This creates huge headaches for everyone successfully using 2i today though, and SAI *is not* guaranteed to perform as well or better - it has a very different performance profile.
We wouldn't have even advanced it to this point if we didn't have copious amounts of (not all public, I know, working on it) evidence it did for the vast majority of workloads. Having said that, I don't strongly agree that we should make it the default in 5.0, because performance isn't the only concern. (correctness, DDL back-compat, which we've sort of touched w/ the YAML default option, etc.)
This conversation is now going in like 3 different directions, or at least 3 different "packages" of ideas, so there isn't even a single thing to vote on. Let me read through again and try to distill into something that we might be able to do so with... This.
I would also consider adding CREATE LEGACY INDEX syntax as an alias for today’s CREATE INDEX, the latter to be deprecated and (in very distant future) removed.
This creates huge headaches for everyone successfully using 2i today though, and SAI *is not* guaranteed to perform as well or better - it has a very different performance profile.
I think we should deprecate CREATE INDEX, and introduce new syntax CREATE LOCAL INDEX to make clear that this is not a global index, and that this should require the USING syntax to avoid this problem in future.
We should report warnings to the client when CREATE INDEX is used, indicating it is deprecated.
|