> I don't even think we have to think about *new* SAI features to see where it > will benefit from further *local* optimization... You make good points IMO. After Caleb's reasoning it makes sense to me to start working on query optimization w/even just our initial SAI feature-set given querying across multiple indices.
On Fri, Dec 22, 2023, at 8:42 AM, J. D. Jordan wrote: > > The CEP-29 “rejected alternatives” section mentions one such use case. Being > able to put NOT arbitrarily in a query. Adding an OR operator is another > thing we are likely to want to do in the near future that would benefit from > this work, those benefit from the syntax tree and reordering parts of the > proposal. > > But I think we already have enough complexity available to us to justify a > query optimizer in the fact of multi index queries today. Especially when you > have the new ANN OF operator in use combined with index queries. Depending > on what order you query the indexes in, it can dramatically change the > performance of the query. We are seeing and working through such issues in > Astra today. > > -Jeremiah > > >> On Dec 21, 2023, at 12:00 PM, Josh McKenzie <jmcken...@apache.org> wrote: >> >>> we are already late. We have several features running in production that we >>> chose to not open source yet because implementing phase 1 of the CEP would >>> have heavily simplify their designs. The cost of developing them was much >>> higher than what it would have been if the CEP had already been >>> implemented. We are also currently working on some SAI features that need >>> cost based optimization. >> Are there DISCUSS threads or CEP's for any of that work? For us to have a >> useful discussion about whether we're at a point in the project where a >> query optimizer is appropriate for the project this information would be >> vital. >> >> On Thu, Dec 21, 2023, at 12:33 PM, Benjamin Lerer wrote: >>> Hey German, >>> >>> To clarify things, we intend to push cardinalities across nodes, not costs. >>> It will be up to the Cost Model to estimate cost based on those >>> cardinalities. We will implement some functionalities to collect costs on >>> query execution to be able to provide them as the output of EXPLAIN ANALYZE. >>> >>> We will provide more details on how we will collect and distribute >>> cardinalities. We will probably not go into details on how we will estimate >>> costs before the patch for it is ready. The main reason being that there >>> are a lot of different parts that you need to account for and that it will >>> require significant testing and experimentation. >>> >>> Regarding multi-tenancy, even if you use query cost, do not forget that you >>> will have to account also for background tasks such as compaction, repair, >>> backup, ... which is not included in this CEP. >>> >>> Le jeu. 21 déc. 2023 à 00:18, German Eichberger via dev >>> <dev@cassandra.apache.org> a écrit : >>>> All, >>>> >>>> very much agree with Scott's reasoning. >>>> >>>> It seems expedient given the advent of ACCORD transactions to be more like >>>> the other distributed SQL databases and just support SQL. But just because >>>> it's expedient it isn't right and we should work out the relational >>>> features in more detail before we embark on tying us to some query >>>> planning design. >>>> >>>> The main problem in this space is pushing cost / across nodes based on >>>> data density. I understand that TCM will level out data density but the >>>> cost based optimizer proposal does a lot of hand waving when it comes to >>>> collecting/estimating costs for each node. I like to see more details on >>>> this since otherwise it will be fairly limiting. >>>> >>>> I am less tied to ALLOW FILTERING - many of my customers find allowing >>>> filtering beneficial for their workloads so I think removing it makes >>>> sense to me (and yes we try to discourage them 🙂) >>>> >>>> I am also intrigued by this proposal when I think about multi tenancy and >>>> resource governance: We have heard from several operator who run multiple >>>> internal teams on the same Cassandra cluster jut to optimize costs. Having >>>> a way to attribute those costs more fairly by adding up the costs the >>>> optimizer calculates might be hugely beneficial. There could also be a >>>> way to have a "cost budget" on a keyspace to minimize the noisy neighbor >>>> problem and do more intelligent request throttling. >>>> >>>> In summary I support the proposal with the caveats raised above. >>>> >>>> Thanks, >>>> German >>>> >>>> >>>> *From:* C. Scott Andreas <sc...@paradoxica.net> >>>> *Sent:* Wednesday, December 20, 2023 8:15 AM >>>> *To:* dev@cassandra.apache.org <dev@cassandra.apache.org> >>>> *Cc:* dev@cassandra.apache.org <dev@cassandra.apache.org> >>>> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-39: Cost Based Optimizer >>>> >>>> >>>> You don't often get email from sc...@paradoxica.net. Learn why this is >>>> important >>>> <https://urldefense.com/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!PbtH5S7Ebw!c_MD0RyXN_XIPN5B2C2--bChBdBa13WInHO75rCG_5gAk6by8K8t8sXgskQFjr1aAFaW3mhiNYlQAurs4aEL$> >>>> >>>> Thanks for this proposal and apologies for my delayed engagement during >>>> the Cassandra Summit last week. Benjamin, I appreciate your work on this >>>> and your engagement on this thread – I know it’s a lot of discussion to >>>> field. >>>> >>>> On ALLOW FILTERING: >>>> >>>> I share Chris Lohfink’s experience in operating clusters that have made >>>> heavy use of ALLOW FILTERING. It is a valuable guardrail for the database >>>> to require users specifically annotate queries that may cost 1000x+ that >>>> of a simple lookup for a primary key. For my own purposes, I’d actually >>>> like to go a step further and disable queries that require ALLOW FILTERING >>>> by default unless explicitly reviewed - but haven’t taken the step of >>>> adding such a guardrail yet. >>>> >>>> CBOs, CQL, and SQL: >>>> >>>> The CBO proposal cuts to the heart of one of the fundamental differences >>>> between SQL and CQL that I haven’t seen exercised yet. >>>> >>>> SQL allows users to define schemas that provide structure to data and to >>>> issue queries over them based on a relational algebra. SQL’s purpose is to >>>> decouple the on-disk representation of data from the query language used >>>> to access and aggregate it. This produces a very flexible query language >>>> that can be used to ask a database anything - but at a cost of execution >>>> that may be effectively infinite (think recursive subqueries). >>>> >>>> CQL is very different. While SQL is designed to decouple query language >>>> and on-disk representation, CQL is designed specifically to couple them. A >>>> CQL schema declares data placement, query routing, and disk serialization, >>>> and sorting to enable efficient retrieval. This is a very different design >>>> goal from a general-purpose query language. In time CQL may gain many >>>> SQL-like capabilities (and I hope it does!), but it will require careful >>>> work to do so without creating many footguns. >>>> >>>> Feature evolution: >>>> >>>> I agree that in the coming years, Cassandra is likely to gain >>>> semi-relational features via maturation of the byte-ordered partitioner >>>> (via range splitting, via TCM); the availability of SAI and its evolution >>>> (e.g., via new functionality enabled by Lucene libraries); potentially >>>> joins via BOP; and others. This is a really exciting future, and one that >>>> probably requires a planner and optimizer. >>>> >>>> My general inclination is that a planner + optimizer seem valuable for >>>> Cassandra, but that the proposal feels a year or two early. The database >>>> doesn’t yet have a plan of record to add support for some of the >>>> semirelational constructs we’ve talked about, and I’m not aware of active >>>> CEPs that propose designs for features like these yet. >>>> >>>> Like Jeff, I’d find this much easier to discuss in the context of a >>>> database gaining support for these features with specific designs >>>> available to discuss. The ALLOW FILTERING and constant folding examples >>>> are a little slim. Index selection is probably the best one I can think of >>>> right now - e.g., if we wanted to add the ability to issue >>>> partition-restricted queries over a base table with multiple indexes >>>> defined without users specifically declaring an index. I haven’t seen an >>>> at-scale use case that would be better served by planner-driven index >>>> selection vs. user-driven, but they might be out there. >>>> >>>> It’s not my role to suggest changes in prioritization for work that isn’t >>>> mine. But I feel that the project could design better interfaces and a >>>> better planner/optimizer if that work were oriented toward improving >>>> particular features that are in wide use. >>>> >>>> To summarize my thoughts: >>>> >>>> – I agree that it is valuable for Apache Cassandra to gain a >>>> planner/optimizer. >>>> – I disagree with removing or deprecating ALLOW FILTERING and see this as >>>> a necessary guardrail. >>>> – I think the proposal surfaces the differences between the design goals >>>> of CQL and SQL, but I don’t feel that it quite addresses it. >>>> – I think we could collectively build a stronger planner/optimizer once >>>> some of the features it’s meant to optimize are in place. >>>> – I’m not quite sold on the need for the implementation to be bespoke >>>> based on discussion so far (vs. Calcite/Catalyst etc), but haven’t done >>>> the legwork to investigate this myself. >>>> – I *love* the idea of capturing many of the execution and hotness >>>> statistics that are proposed in the CEP. It would be very valuable to >>>> surface query cost to users independent of a CBO. Stats like these would >>>> also be valuable toward retrofitting Cassandra for multitenancy by >>>> bounding or rate-limiting users on query cost. Tracking SSTable hotness >>>> would also be useful toward evaluating feasibility of tiered storage, too. >>>> >>>> Thanks for this proposal and discussion so far — appreciate and enjoying >>>> it. >>>> >>>> – Scott >>>> >>>>> On Dec 20, 2023, at 7:52 AM, Benjamin Lerer <ble...@apache.org> wrote: >>>>> >>>>> >>>>>> If we are to address that within the CEP itself then we should discuss >>>>>> it here, as I would like to fully understand the approach as well as how >>>>>> it relates to consistency of execution and the idea of triggering >>>>>> re-optimisation. >>>>> >>>>> Sure, that was my plan. >>>>> >>>>> >>>>>> I’m not sold on the proposed set of characteristics, and think my >>>>>> coupling an execution plan to a given prepared statement for clients to >>>>>> supply is perhaps simpler to implement and maintain, and has corollary >>>>>> benefits - such as providing a mechanism for users to specify their own >>>>>> execution plan. >>>>>> >>>>>> Note, my proposal cuts across all of these elements of the CEP. There is >>>>>> no obvious need for a cross-cluster re-optimisation event or cross >>>>>> cluster statistic management. >>>>> >>>>> I think that I am missing one part of your proposal. How do you plan to >>>>> build the initial execution plan for a prepared statement? >>>>> >>>>> Le mer. 20 déc. 2023 à 14:05, Benedict <bened...@apache.org> a écrit : >>>>>> >>>>>> If we are to address that within the CEP itself then we should discuss >>>>>> it here, as I would like to fully understand the approach as well as how >>>>>> it relates to consistency of execution and the idea of triggering >>>>>> re-optimisation. These ideas are all interrelated. >>>>>> >>>>>> I’m not sold on the proposed set of characteristics, and think my >>>>>> coupling an execution plan to a given prepared statement for clients to >>>>>> supply is perhaps simpler to implement and maintain, and has corollary >>>>>> benefits - such as providing a mechanism for users to specify their own >>>>>> execution plan. >>>>>> >>>>>> Note, my proposal cuts across all of these elements of the CEP. There is >>>>>> no obvious need for a cross-cluster re-optimisation event or cross >>>>>> cluster statistic management. >>>>>> >>>>>> We still also need to discuss more concretely how the base statistics >>>>>> themselves will be derived, as there is little detail here today in the >>>>>> proposal. >>>>>> >>>>>> >>>>>>> On 20 Dec 2023, at 12:58, Benjamin Lerer <b.le...@gmail.com> wrote: >>>>>>> >>>>>>> After the second phase of the CEP, we will have two optimizer >>>>>>> implementations. One will be similar to what we have today and the >>>>>>> other one will be the CBO. As those implementations will be behind the >>>>>>> new Optimizer API interfaces they will both have support for EXPLAIN >>>>>>> and they will both benefit from the simplification/normalization rules. >>>>>>> Such as the ones that David mentioned. >>>>>>> >>>>>>> Regarding functions, we are already able to determine which ones are >>>>>>> deterministic >>>>>>> (https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/Function.java#L55). >>>>>>> We simply do not take advantage of it. >>>>>>> >>>>>>> I removed the ALLOW FILTERING part and will open a discussion about it >>>>>>> at the beginning of next year. >>>>>>> >>>>>>> Regarding the statistics management part, I would like to try to >>>>>>> address it within the CEP itself, if feasible. If it turns out to be >>>>>>> too complicated, I will separate it into its own CEP. >>>>>>> >>>>>>> Le mar. 19 déc. 2023 à 22:23, David Capwell <dcapw...@apple.com> a >>>>>>> écrit : >>>>>>>>> even if the only outcome of all this work were to tighten up >>>>>>>>> inconsistencies in our grammar and provide more robust EXPLAIN and >>>>>>>>> EXPLAIN ANALYZE functionality to our end users, I think that would be >>>>>>>>> highly valuable >>>>>>>> >>>>>>>> In my mental model a no-op optimizer just becomes what we have today >>>>>>>> (since all new features really should be disabled by default, I would >>>>>>>> hope we support this), so we benefit from having a logical AST + >>>>>>>> ability to mutate it before we execute it and we can use this to make >>>>>>>> things nicer for users (as you are calling out) >>>>>>>> >>>>>>>> Here is one example that stands out to me in accord >>>>>>>> >>>>>>>> LET a = (select * from tbl where pk=0); >>>>>>>> Insert into tbl2 (pk, …) values (a.pk >>>>>>>> <https://urldefense.com/v3/__http://a.pk/__;!!PbtH5S7Ebw!c_MD0RyXN_XIPN5B2C2--bChBdBa13WInHO75rCG_5gAk6by8K8t8sXgskQFjr1aAFaW3mhiNYlQArzRaSAP$>, >>>>>>>> …); — this is not allowed as we don’t know the primary key… but this >>>>>>>> could trivially be written to replace a.pk >>>>>>>> <https://urldefense.com/v3/__http://a.pk/__;!!PbtH5S7Ebw!c_MD0RyXN_XIPN5B2C2--bChBdBa13WInHO75rCG_5gAk6by8K8t8sXgskQFjr1aAFaW3mhiNYlQArzRaSAP$> >>>>>>>> with 0… >>>>>>>> >>>>>>>> With this work we could also rethink what functions are deterministic >>>>>>>> and which ones are not (not trying to bike shed)… simple example is >>>>>>>> “now” (select now() from tbl; — each row will have a different >>>>>>>> timestamp), if we make this deterministic we can avoid calling it for >>>>>>>> each row and instead just replace it with a constant for the query… >>>>>>>> >>>>>>>> Even if the CBO is dropped in favor of no-op (what we do today), I >>>>>>>> still see value in this work. >>>>>>>> >>>>>>>> I do think that the CBO really doesn’t solve the fact some features >>>>>>>> don’t work well, if anything it could just mask it until it’s too >>>>>>>> late…. If user builds an app using filtering and everything is going >>>>>>>> well in QA, but once they see a spike in traffic in prod we start >>>>>>>> rejecting… this is a bad user experience IMO… we KNOW you must think >>>>>>>> about this before you go this route, so a CBO letting you ignore it >>>>>>>> till you hit a wall I don’t think is the best (not saying ALLOW >>>>>>>> FILTERING is the solution to this… but it at least is a signal to >>>>>>>> users to think through their data model). >>>>>>>> >>>>>>>> >>>>>>>>> On Dec 15, 2023, at 6:38 PM, Josh McKenzie <jmcken...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Goals >>>>>>>>>> • Introduce a Cascades(2) query optimizer with rules easily >>>>>>>>>> extendable >>>>>>>>>> • Improve query performance for most common queries >>>>>>>>>> • Add support for EXPLAIN and EXPLAIN ANALYZE to help with query >>>>>>>>>> optimization and troubleshooting >>>>>>>>>> • Lay the groundwork for the addition of features like joins, >>>>>>>>>> subqueries, OR/NOT and index ordering >>>>>>>>>> • Put in place some performance benchmarks to validate query >>>>>>>>>> optimizations >>>>>>>>> I think these are sensible goals. We're possibly going to face a >>>>>>>>> chicken-or-egg problem with a feature like this that so heavily >>>>>>>>> intersects with other as-yet written features where much of the value >>>>>>>>> is in the intersection of them; if we continue down the current "one >>>>>>>>> heuristic to rule them all" query planning approach we have now, >>>>>>>>> we'll struggle to meaningfully explore or conceptualize the value of >>>>>>>>> potential alternatives different optimizers could present us. Flip >>>>>>>>> side, to Benedict's point, until SAI hits and/or some other potential >>>>>>>>> future things we've all talked about, this cbo would likely fall >>>>>>>>> directly into the same path that we effectively have hard-coded today >>>>>>>>> (primary index path only). >>>>>>>>> >>>>>>>>> One thing I feel pretty strongly about: even if the only outcome of >>>>>>>>> all this work were to tighten up inconsistencies in our grammar and >>>>>>>>> provide more robust EXPLAIN and EXPLAIN ANALYZE functionality to our >>>>>>>>> end users, I think that would be highly valuable. This path of "only" >>>>>>>>> would be predicated on us not having successful introduction of a >>>>>>>>> robust secondary index implementation and a variety of other things >>>>>>>>> we have a lot of interest in, so I find it unlikely, but worth >>>>>>>>> calling out. >>>>>>>>> >>>>>>>>> re: the removal of ALLOW FILTERING - is there room for compromise >>>>>>>>> here and instead converting it to a guardrail that defaults to being >>>>>>>>> enabled? That could theoretically give us a more gradual path to >>>>>>>>> migration to a cost-based guardrail for instance, and would preserve >>>>>>>>> the current robustness of the system while making it at least a touch >>>>>>>>> more configurable. >>>>>>>>> >>>>>>>>> On Fri, Dec 15, 2023, at 11:03 AM, Chris Lohfink wrote: >>>>>>>>>> Thanks for time in addressing concerns. At least with initial >>>>>>>>>> versions, as long as there is a way to replace it with noop or >>>>>>>>>> disable it I would be happy. This is pretty standard practice with >>>>>>>>>> features nowadays but I wanted to highlight it as this might require >>>>>>>>>> some pretty tight coupling. >>>>>>>>>> >>>>>>>>>> Chris >>>>>>>>>> >>>>>>>>>> On Fri, Dec 15, 2023 at 7:57 AM Benjamin Lerer <ble...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>>> Hey Chris, >>>>>>>>>>> You raise some valid points. >>>>>>>>>>> >>>>>>>>>>> I believe that there are 3 points that you mentioned: >>>>>>>>>>> 1) CQL restrictions are some form of safety net and should be kept >>>>>>>>>>> 2) A lot of Cassandra features do not scale and/or are too easy to >>>>>>>>>>> use in a wrong way that can make the whole system collapse. We >>>>>>>>>>> should not add more to that list. Especially not joins. >>>>>>>>>>> >>>>>>>>>>> 3) Should we not start to fix features like secondary index rather >>>>>>>>>>> than adding new ones? Which is heavily linked to 2). >>>>>>>>>>> >>>>>>>>>>> Feel free to correct me if I got them wrong or missed one. >>>>>>>>>>> >>>>>>>>>>> Regarding 1), I believe that you refer to the "Removing unnecessary >>>>>>>>>>> CQL query limitations and inconsistencies" section. We are not >>>>>>>>>>> planning to remove any safety net here. >>>>>>>>>>> What we want to remove is a certain amount of limitations which >>>>>>>>>>> make things confusing for a user trying to write a query for no >>>>>>>>>>> good reason. Like "why can I define a column alias but not use it >>>>>>>>>>> anywhere in my query?" or "Why can I not create a list with 2 bind >>>>>>>>>>> parameters?". While refactoring some CQL code, I kept on finding >>>>>>>>>>> those types of exceptions that we can easily remove while >>>>>>>>>>> simplifying the code at the same time. >>>>>>>>>>> >>>>>>>>>>> For 2), I agree that at a certain scale or for some scenarios, some >>>>>>>>>>> features simply do not scale or catch users by surprise. The goal >>>>>>>>>>> of the CEP is to improve things in 2 ways. One is by making >>>>>>>>>>> Cassandra smarter in the way it chooses how to process queries, >>>>>>>>>>> hopefully improving its overall scalability. The other by being >>>>>>>>>>> transparent about how Cassandra will execute the queries through >>>>>>>>>>> the use of EXPLAIN. One problem of GROUP BY for example is that >>>>>>>>>>> most users do not realize what is actually happening under the hood >>>>>>>>>>> and therefore its limitations. I do not believe that EXPLAIN will >>>>>>>>>>> change everything but it will help people to get a better >>>>>>>>>>> understanding of the limitations of some features. >>>>>>>>>>> >>>>>>>>>>> I do not know which features will be added in the future to C*. >>>>>>>>>>> That will be discussed through some future CEPs. Nevertheless, I do >>>>>>>>>>> not believe that it makes sense to write a CEP for a query >>>>>>>>>>> optimizer without taking into account that we might at some point >>>>>>>>>>> add some level of support for joins or subqueries. We have been too >>>>>>>>>>> often delivering features without looking at what could be the >>>>>>>>>>> possible evolutions which resulted in code where adding new >>>>>>>>>>> features was more complex than it should have been. I do not want >>>>>>>>>>> to make the same mistake. I want to create an optimizer that can be >>>>>>>>>>> improved easily and considering joins or other features simply help >>>>>>>>>>> to build things in a more generic way. >>>>>>>>>>> >>>>>>>>>>> Regarding feature stabilization, I believe that it is happening. I >>>>>>>>>>> have heard plans of how to solve MVs, range queries, hot >>>>>>>>>>> partitions, ... and there was a lot of thinking behind those plans. >>>>>>>>>>> Secondary indexes are being worked on. We hope that the optimizer >>>>>>>>>>> will also help with some index queries. >>>>>>>>>>> >>>>>>>>>>> It seems to me that this proposal is going toward the direction >>>>>>>>>>> that you want without introducing new problems for scalability. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Le jeu. 14 déc. 2023 à 16:47, Chris Lohfink <clohfin...@gmail.com> >>>>>>>>>>> a écrit : >>>>>>>>>>>> I don't wanna be a blocker for this CEP or anything but did want >>>>>>>>>>>> to put my 2 cents in. This CEP is horrifying to me. >>>>>>>>>>>> >>>>>>>>>>>> I have seen thousands of clusters across multiple companies and >>>>>>>>>>>> helped them get working successfully. A vast majority of that >>>>>>>>>>>> involved blocking the use of MVs, GROUP BY, secondary indexes, and >>>>>>>>>>>> even just simple _range queries_. The "unncessary restrictions of >>>>>>>>>>>> cql" are not only necessary IMHO, more restrictions are necessary >>>>>>>>>>>> to be successful at scale. The idea of just opening up CQL to >>>>>>>>>>>> general purpose relational queries and lines like "supporting >>>>>>>>>>>> queries with joins in an efficient way" ... I would really like us >>>>>>>>>>>> to make secondary indexes be a viable option before we start >>>>>>>>>>>> opening up floodgates on stuff like this. >>>>>>>>>>>> >>>>>>>>>>>> Chris >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Dec 14, 2023 at 9:37 AM Benedict <bened...@apache.org> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> > So yes, this physical plan is the structure that you have in >>>>>>>>>>>>> > mind but the idea of sharing it is not part of the CEP. >>>>>>>>>>>>> >>>>>>>>>>>>> I think it should be. This should form a major part of the API on >>>>>>>>>>>>> which any CBO is built. >>>>>>>>>>>>> >>>>>>>>>>>>> > It seems that there is a difference between the goal of your >>>>>>>>>>>>> > proposal and the one of the CEP. The goal of the CEP is first >>>>>>>>>>>>> > to ensure optimal performance. It is ok to change the execution >>>>>>>>>>>>> > plan for one that delivers better performance. What we want to >>>>>>>>>>>>> > minimize is having a node performing queries in an inefficient >>>>>>>>>>>>> > way for a long period of time. >>>>>>>>>>>>> >>>>>>>>>>>>> You have made a goal of the CEP synchronising summary statistics >>>>>>>>>>>>> across the whole cluster in order to achieve some degree of >>>>>>>>>>>>> uniformity of query plan. So this is explicitly a goal of the >>>>>>>>>>>>> CEP, and synchronising summary statistics is a hard problem and >>>>>>>>>>>>> won’t provide strong guarantees. >>>>>>>>>>>>> >>>>>>>>>>>>> > The client side proposal targets consistency for a given query >>>>>>>>>>>>> > on a given driver instance. In practice, it would be possible >>>>>>>>>>>>> > to have 2 similar queries with 2 different execution plans on >>>>>>>>>>>>> > the same driver >>>>>>>>>>>>> >>>>>>>>>>>>> This would only be possible if the driver permitted it. A driver >>>>>>>>>>>>> could (and should) enforce that it only permits one query plan >>>>>>>>>>>>> per query. >>>>>>>>>>>>> >>>>>>>>>>>>> The opposite is true for your proposal: some queries may begin >>>>>>>>>>>>> degrading because they touch specific replicas that optimise the >>>>>>>>>>>>> query differently, and this will be hard to debug. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On 14 Dec 2023, at 15:30, Benjamin Lerer <b.le...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> The binding of the parser output to the schema (what is today >>>>>>>>>>>>>> the Raw.prepare call) will create the logical plan, expressed as >>>>>>>>>>>>>> a tree of relational operators. Simplification and normalization >>>>>>>>>>>>>> will happen on that tree to produce a new equivalent logical >>>>>>>>>>>>>> plan. That logical plan will be used as input to the optimizer. >>>>>>>>>>>>>> The output will be a physical plan producing the output >>>>>>>>>>>>>> specified by the logical plan. A tree of physical operators >>>>>>>>>>>>>> specifying how the operations should be performed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> That physical plan will be stored as part of the statements >>>>>>>>>>>>>> (SelectStatement, ModificationStatement, ...) in the prepared >>>>>>>>>>>>>> statement cache. Upon execution, variables will be bound and the >>>>>>>>>>>>>> RangeCommands/Mutations will be created based on the physical >>>>>>>>>>>>>> plan. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The string representation of a physical plan will effectively >>>>>>>>>>>>>> represent the output of an EXPLAIN statement but outside of that >>>>>>>>>>>>>> the physical plan will stay encapsulated within the statement >>>>>>>>>>>>>> classes. >>>>>>>>>>>>>> Hints will be parameters provided to the optimizer to enforce >>>>>>>>>>>>>> some specific choices. Like always using an Index Scan instead >>>>>>>>>>>>>> of a Table Scan, ignoring the cost comparison. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So yes, this physical plan is the structure that you have in >>>>>>>>>>>>>> mind but the idea of sharing it is not part of the CEP. I did >>>>>>>>>>>>>> not document it because it will simply be a tree of physical >>>>>>>>>>>>>> operators used internally. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> My proposal is that the execution plan of the coordinator that >>>>>>>>>>>>>>> prepares a query gets serialised to the client, which then >>>>>>>>>>>>>>> provides the execution plan to all future coordinators, and >>>>>>>>>>>>>>> coordinators provide it to replicas as necessary. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This means it is not possible for any conflict to arise for a >>>>>>>>>>>>>>> single client. It would guarantee consistency of execution for >>>>>>>>>>>>>>> any single client (and avoid any drift over the client’s >>>>>>>>>>>>>>> sessions), without necessarily guaranteeing consistency for all >>>>>>>>>>>>>>> clients. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It seems that there is a difference between the goal of your >>>>>>>>>>>>>> proposal and the one of the CEP. The goal of the CEP is first to >>>>>>>>>>>>>> ensure optimal performance. It is ok to change the execution >>>>>>>>>>>>>> plan for one that delivers better performance. What we want to >>>>>>>>>>>>>> minimize is having a node performing queries in an inefficient >>>>>>>>>>>>>> way for a long period of time. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The client side proposal targets consistency for a given query >>>>>>>>>>>>>> on a given driver instance. In practice, it would be possible to >>>>>>>>>>>>>> have 2 similar queries with 2 different execution plans on the >>>>>>>>>>>>>> same driver making things really confusing. Identifying the >>>>>>>>>>>>>> source of an inefficient query will also be pretty hard. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Interestingly, having 2 nodes with 2 different execution plans >>>>>>>>>>>>>> might not be a serious problem. It simply means that based on >>>>>>>>>>>>>> cardinality at t1, the optimizer on node 1 chose plan 1 while >>>>>>>>>>>>>> the one on node 2 chose plan 2 at t2. In practice if the cost >>>>>>>>>>>>>> estimates reflect properly the actual cost those 2 plans should >>>>>>>>>>>>>> have pretty similar efficiency. The problem is more about the >>>>>>>>>>>>>> fact that you would ideally want a uniform behavior around your >>>>>>>>>>>>>> cluster. >>>>>>>>>>>>>> Changes of execution plans should only occur at certain points. >>>>>>>>>>>>>> So the main problematic scenario is when the data distribution >>>>>>>>>>>>>> is around one of those points. Which is also the point where the >>>>>>>>>>>>>> change should have the least impact. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Le jeu. 14 déc. 2023 à 11:38, Benedict <bened...@apache.org> a >>>>>>>>>>>>>> écrit : >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> There surely needs to be a more succinct and abstract >>>>>>>>>>>>>>> representation in order to perform transformations on the query >>>>>>>>>>>>>>> plan? You don’t intend to manipulate the object graph directly >>>>>>>>>>>>>>> as you apply any transformations when performing simplification >>>>>>>>>>>>>>> or cost based analysis? This would also (I expect) be the form >>>>>>>>>>>>>>> used to support EXPLAIN functionality, and probably also HINTs >>>>>>>>>>>>>>> etc. This would ideally *not* be coupled to the CBO itself, and >>>>>>>>>>>>>>> would ideally be succinctly serialised. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I would very much expect the query plan to be represented >>>>>>>>>>>>>>> abstractly as part of this work, and for there to be a >>>>>>>>>>>>>>> mechanism that translates this abstract representation into the >>>>>>>>>>>>>>> object graph that executes it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If I’m incorrect, could you please elaborate more specifically >>>>>>>>>>>>>>> how you intend to go about this? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 14 Dec 2023, at 10:33, Benjamin Lerer <b.le...@gmail.com> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I mean that an important part of this work - not specified in >>>>>>>>>>>>>>>>> the CEP (AFAICT) - should probably be to define some standard >>>>>>>>>>>>>>>>> execution model, that we can manipulate and serialise, for >>>>>>>>>>>>>>>>> use across (and without) optimisers. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I am confused because for me an execution model defines how >>>>>>>>>>>>>>>> operations are executed within the database in a conceptual >>>>>>>>>>>>>>>> way, which is not something that this CEP intends to change. >>>>>>>>>>>>>>>> Do you mean the physical/execution plan? >>>>>>>>>>>>>>>> Today this plan is somehow represented for reads by the >>>>>>>>>>>>>>>> SelectStatement and its components (Selections, >>>>>>>>>>>>>>>> StatementRestrictions, ...) it is then converted at execution >>>>>>>>>>>>>>>> time after parameter binding into a ReadCommand which is sent >>>>>>>>>>>>>>>> to the replicas. >>>>>>>>>>>>>>>> We plan to refactor SelectStatement and its components but the >>>>>>>>>>>>>>>> ReadCommands change should be relatively small. What you are >>>>>>>>>>>>>>>> proposing is not part of the scope of this CEP. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Le jeu. 14 déc. 2023 à 10:24, Benjamin Lerer >>>>>>>>>>>>>>>> <b.le...@gmail.com> a écrit : >>>>>>>>>>>>>>>>>> Can you share the reasons why Apache Calcite is not suitable >>>>>>>>>>>>>>>>>> for this case and why it was rejected >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> My understanding is that Calcite was made for two main >>>>>>>>>>>>>>>>> things: to help with optimizing SQL-like languages and to let >>>>>>>>>>>>>>>>> people query different kinds of data sources together. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We could think about using it for our needs, but there are >>>>>>>>>>>>>>>>> some big problems: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. CQL is not SQL. There are significant differences between >>>>>>>>>>>>>>>>> the 2 languages >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2. Cassandra has its own specificities that will influence >>>>>>>>>>>>>>>>> the cost model and the way we deal with optimizations: >>>>>>>>>>>>>>>>> partitions, replication factors, consistency levels, LSM tree >>>>>>>>>>>>>>>>> storage, ... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 3. Every framework comes with its own limitations and >>>>>>>>>>>>>>>>> additional cost >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> From my view, there are too many big differences between what >>>>>>>>>>>>>>>>> Calcite does and what we need in Cassandra. If we used >>>>>>>>>>>>>>>>> Calcite, it would also mean relying a lot on another system >>>>>>>>>>>>>>>>> that everyone would have to learn and adjust to. The problems >>>>>>>>>>>>>>>>> and extra work this would bring don't seem worth the benefits >>>>>>>>>>>>>>>>> we might get >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Le mer. 13 déc. 2023 à 18:06, Benjamin Lerer >>>>>>>>>>>>>>>>> <b.le...@gmail.com> a écrit : >>>>>>>>>>>>>>>>>> One thing that I did not mention is the fact that this CEP >>>>>>>>>>>>>>>>>> is only a high level proposal. There will be deeper >>>>>>>>>>>>>>>>>> discussions on the dev list around the different parts of >>>>>>>>>>>>>>>>>> this proposal when we reach those parts and have enough >>>>>>>>>>>>>>>>>> details to make those discussions more meaningful. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The maintenance and distribution of summary statistics in >>>>>>>>>>>>>>>>>>> particular is worthy of its own CEP, and it might be >>>>>>>>>>>>>>>>>>> preferable to split it out. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> For maintaining node statistics the idea is to re-use the >>>>>>>>>>>>>>>>>> current Memtable/SSTable mechanism and relies on mergeable >>>>>>>>>>>>>>>>>> statistics. That will allow us to easily build node level >>>>>>>>>>>>>>>>>> statistics for a given table by merging all the statistics >>>>>>>>>>>>>>>>>> of its memtable and SSTables. For the distribution of these >>>>>>>>>>>>>>>>>> node statistics we are still exploring different options. We >>>>>>>>>>>>>>>>>> can come back with a precise proposal once we have hammered >>>>>>>>>>>>>>>>>> all the details. >>>>>>>>>>>>>>>>>> Is it for you a blocker for this CEP or do you just want to >>>>>>>>>>>>>>>>>> make sure that this part is discussed in deeper details >>>>>>>>>>>>>>>>>> before we implement it? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The proposal also seems to imply we are aiming for >>>>>>>>>>>>>>>>>>> coordinators to all make the same decision for a query, >>>>>>>>>>>>>>>>>>> which I think is challenging, and it would be worth >>>>>>>>>>>>>>>>>>> fleshing out the design here a little (perhaps just in >>>>>>>>>>>>>>>>>>> Jira). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The goal is that the large majority of nodes preparing a >>>>>>>>>>>>>>>>>> query at a given point in time should make the same decision >>>>>>>>>>>>>>>>>> and that over time all nodes should converge toward the same >>>>>>>>>>>>>>>>>> decision. This part is dependent on the node statistics >>>>>>>>>>>>>>>>>> distribution, the cost model and the triggers for >>>>>>>>>>>>>>>>>> re-optimization (that will require some experimentation). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> There’s also not much discussion of the execution model: I >>>>>>>>>>>>>>>>>>> think it would make most sense for this to be independent >>>>>>>>>>>>>>>>>>> of any cost and optimiser models (though they might want to >>>>>>>>>>>>>>>>>>> operate on them), so that EXPLAIN and hints can work across >>>>>>>>>>>>>>>>>>> optimisers (a suitable hint might essentially bypass the >>>>>>>>>>>>>>>>>>> optimiser, if the optimiser permits it, by providing a >>>>>>>>>>>>>>>>>>> standard execution model) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It is not clear to me what you mean by "a standard execution >>>>>>>>>>>>>>>>>> model"? Otherwise, we were not planning to have the >>>>>>>>>>>>>>>>>> execution model or the hints depending on the optimizer. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I think it would be worth considering providing the >>>>>>>>>>>>>>>>>>> execution plan to the client as part of query preparation, >>>>>>>>>>>>>>>>>>> as an opaque payload to supply to coordinators on first >>>>>>>>>>>>>>>>>>> contact, as this might simplify the problem of ensuring >>>>>>>>>>>>>>>>>>> queries behave the same without adopting a lot of >>>>>>>>>>>>>>>>>>> complexity for synchronising statistics (which will never >>>>>>>>>>>>>>>>>>> provide strong guarantees). Of course, re-preparing a query >>>>>>>>>>>>>>>>>>> might lead to a new plan, though any coordinators with the >>>>>>>>>>>>>>>>>>> query in their cache should be able to retrieve it cheaply. >>>>>>>>>>>>>>>>>>> If the execution model is efficiently serialised this might >>>>>>>>>>>>>>>>>>> have the ancillary benefit of improving the occupancy of >>>>>>>>>>>>>>>>>>> our prepared query cache. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I am not sure that I understand your proposal. If 2 nodes >>>>>>>>>>>>>>>>>> build a different execution plan how do you solve that >>>>>>>>>>>>>>>>>> conflict? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Le mer. 13 déc. 2023 à 09:55, Benedict <bened...@apache.org> >>>>>>>>>>>>>>>>>> a écrit : >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> A CBO can only make worse decisions than the status quo for >>>>>>>>>>>>>>>>>>> what I presume are the majority of queries - i.e. those >>>>>>>>>>>>>>>>>>> that touch only primary indexes. In general, there are >>>>>>>>>>>>>>>>>>> plenty of use cases that prefer determinism. So I agree >>>>>>>>>>>>>>>>>>> that there should at least be a CBO implementation that >>>>>>>>>>>>>>>>>>> makes the same decisions as the status quo, >>>>>>>>>>>>>>>>>>> deterministically. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I do support the proposal, but would like to see some >>>>>>>>>>>>>>>>>>> elements discussed in more detail. The maintenance and >>>>>>>>>>>>>>>>>>> distribution of summary statistics in particular is worthy >>>>>>>>>>>>>>>>>>> of its own CEP, and it might be preferable to split it out. >>>>>>>>>>>>>>>>>>> The proposal also seems to imply we are aiming for >>>>>>>>>>>>>>>>>>> coordinators to all make the same decision for a query, >>>>>>>>>>>>>>>>>>> which I think is challenging, and it would be worth >>>>>>>>>>>>>>>>>>> fleshing out the design here a little (perhaps just in >>>>>>>>>>>>>>>>>>> Jira). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> While I’m not a fan of ALLOW FILTERING, I’m not convinced >>>>>>>>>>>>>>>>>>> that this CEP deprecates it. It is a concrete qualitative >>>>>>>>>>>>>>>>>>> guard rail, that I expect some users will prefer to a >>>>>>>>>>>>>>>>>>> cost-based guard rail. Perhaps this could be left to the >>>>>>>>>>>>>>>>>>> CBO to decide how to treat. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> There’s also not much discussion of the execution model: I >>>>>>>>>>>>>>>>>>> think it would make most sense for this to be independent >>>>>>>>>>>>>>>>>>> of any cost and optimiser models (though they might want to >>>>>>>>>>>>>>>>>>> operate on them), so that EXPLAIN and hints can work across >>>>>>>>>>>>>>>>>>> optimisers (a suitable hint might essentially bypass the >>>>>>>>>>>>>>>>>>> optimiser, if the optimiser permits it, by providing a >>>>>>>>>>>>>>>>>>> standard execution model) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I think it would be worth considering providing the >>>>>>>>>>>>>>>>>>> execution plan to the client as part of query preparation, >>>>>>>>>>>>>>>>>>> as an opaque payload to supply to coordinators on first >>>>>>>>>>>>>>>>>>> contact, as this might simplify the problem of ensuring >>>>>>>>>>>>>>>>>>> queries behave the same without adopting a lot of >>>>>>>>>>>>>>>>>>> complexity for synchronising statistics (which will never >>>>>>>>>>>>>>>>>>> provide strong guarantees). Of course, re-preparing a query >>>>>>>>>>>>>>>>>>> might lead to a new plan, though any coordinators with the >>>>>>>>>>>>>>>>>>> query in their cache should be able to retrieve it cheaply. >>>>>>>>>>>>>>>>>>> If the execution model is efficiently serialised this might >>>>>>>>>>>>>>>>>>> have the ancillary benefit of improving the occupancy of >>>>>>>>>>>>>>>>>>> our prepared query cache. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 13 Dec 2023, at 00:44, Jon Haddad <j...@jonhaddad.com> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I think it makes sense to see what the actual overhead is >>>>>>>>>>>>>>>>>>>> of CBO before making the assumption it'll be so high that >>>>>>>>>>>>>>>>>>>> we need to have two code paths. I'm happy to provide >>>>>>>>>>>>>>>>>>>> thorough benchmarking and analysis when it reaches a >>>>>>>>>>>>>>>>>>>> testing phase. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I'm excited to see where this goes. I think it sounds >>>>>>>>>>>>>>>>>>>> very forward looking and opens up a lot of possibilities. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Jon >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Tue, Dec 12, 2023 at 4:25 PM guo Maxwell >>>>>>>>>>>>>>>>>>>> <cclive1...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>> Nothing expresses my thoughts better than +1 >>>>>>>>>>>>>>>>>>>>> ,It feels like it means a lot to Cassandra. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I have a question. Is it easy to turn off cbo's optimizer >>>>>>>>>>>>>>>>>>>>> or by pass in some way? Because some simple read and >>>>>>>>>>>>>>>>>>>>> write requests will have better performance without cbo, >>>>>>>>>>>>>>>>>>>>> which is also the advantage of Cassandra compared to some >>>>>>>>>>>>>>>>>>>>> rdbms. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> David Capwell <dcapw...@apple.com>于2023年12月13日 周三上午3:37写道: >>>>>>>>>>>>>>>>>>>>>> Overall LGTM. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Dec 12, 2023, at 5:29 AM, Benjamin Lerer >>>>>>>>>>>>>>>>>>>>>>> <ble...@apache.org> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hi everybody, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I would like to open the discussion on the introduction >>>>>>>>>>>>>>>>>>>>>>> of a cost based optimizer to allow Cassandra to pick >>>>>>>>>>>>>>>>>>>>>>> the best execution plan based on the data >>>>>>>>>>>>>>>>>>>>>>> distribution.Therefore, improving the overall query >>>>>>>>>>>>>>>>>>>>>>> performance. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> This CEP should also lay the groundwork for the future >>>>>>>>>>>>>>>>>>>>>>> addition of features like joins, subqueries, OR/NOT and >>>>>>>>>>>>>>>>>>>>>>> index ordering. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> The proposal is here: >>>>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thank you in advance for your feedback. >>>> >>