Re: [DISCUSS] Single specification, multiple implementations

Julian Hyde Sun, 22 Feb 2026 19:07:45 -0800

Gavin,

I have tried reaching out. I had a long conversation with Andrew Lamb while we 
were in Santiago for SIGMOD, and I sent a couple of proposals on the Substrait 
list. Those approaches have come to nothing because, not surprisingly, the 
projects are busy working on their goals and don’t want to work on mine. They 
want to serve the needs of their existing users, not some hypothetical set of 
users that might exist if they made a major engineering investment.


In a very loose sense, DataFusion and Substrait are forks of Calcite. They saw 
patterns and APIs in Calcite that worked and implemented them in a new 
language/framework. But no one expects them to ‘un-fork’. They are under no 
legal or moral obligation to contribute their changes upstream, and neither 
have we pulled in their changes. So, they continue to diverge.

To illustrate my point, let me pick on DataFusion for a minute.

It sounds as if you (Gavin) are happy with DataFusion because it is in Rust, 
uses JSON as its plan format, and has a multi-threaded engine. But I’m not 
totally happy with DataFusion. As I write the Rust implementation of Morel, in 
addition to Rust support and a multi-threaded engine, I also want a planner 
that can generate heterogeneous plans, query backend databases in multiple 
dialects of SQL, and use materialized views. The things that Calcite does well. 

I don’t really mind whether the project that implements this test suite in Rust 
is DataFusion or some new project. If we do a little work to prime the pump — 
make tests like RelOptRulesTest runnable from other languages — then an 
implementation may just happen.

Julian

> On Feb 22, 2026, at 3:25 AM, Gavin Ray <[email protected]> wrote:
> 
> Let me voice a somewhat-alternative idea:
> 
> For Calcite-like tools, you essentially have Calcite in Java, and
> DataFusion, in Rust.
> The ecosystem of adjacent tooling usually either supports one or the other.
> 
> For example -- at my workplace, when they rebuilt our core platform, they
> adopted Rust and DataFusion for our query engine.
> (We also evaluated using Substrait as our Query Plan IR, but it's both
> fairly verbose and lacked some of the nodes we'd need. Plus, being tied to
> Protobuf (we use JSON) wasn't ideal.)
> 
> Personally, I feel like there'd be more value in reaching out to Andy (and
> maybe other owners of tools like Trino, etc) to see whether a Common
> Interface for these query engines could be agreed upon.
> Language-agnostic integration with a platform of Query Engine-like tools
> seems like it would unlock a lot of value.
> 
> This is quite a large endeavor, though, and potentially not possible.
> 
> Can I ask: Suppose Calcite had an API defined, something like "class
> MyLanguageEngine implements CalciteImplementation"
> Do folks think implementations in other languages will pop up? It seems
> like quite a hefty/herculean task, unless you only ported parts of it.
> 
> 
> On Sun, Feb 22, 2026 at 5:48 AM jensen <[email protected]> wrote:
> 
>> I personally strongly support this idea! It would allow us to fully
>> leverage Calcite’s strengths in SQL parsing and optimization. I think
>> Substrait is a relatively good reference at this stage (though there might
>> be even better alternatives). Currently, substrait-java already uses
>> Calcite as the reference implementation of the standard, which is a great
>> starting point.
>> I’ve previously worked on using Substrait to integrate with third-party
>> execution engines. As Alessandro mentioned, execution layers vary
>> significantly, making integration extremely challenging—even if you only
>> aim to maintain compatibility with just two different execution backends.
>> Third-party operators often differ substantially from Calcite’s model, so
>> the primary task would likely be defining a unified specification.
>> However, should this specification be developed by referencing both
>> Calcite and several recommended execution engines to ensure broad
>> applicability?
>> 
>> 
>> 
>> Best regards,
>> 
>> Zhen Chen
>> 
>> ---- Replied Message ----
>> | From | Alessandro Solimando<[email protected]> |
>> | Date | 2/22/2026 17:55 |
>> | To | <[email protected]> |
>> | Subject | [DISCUSS] Single specification, multiple implementations |
>> Following the very interesting take of Julian in [1], I'd like to start a
>> brainstorming discussion on the opportunity of turning Calcite into a
>> specification-first project, with the aim of providing multiple
>> implementations, and what that would look like.
>> 
>> I am supportive of the initiative, as I am seeing the same trends and
>> discussions Julian mentions, projects like Substrait allow using Calcite
>> as-a-service, but lowering the barrier for new database projects in other
>> languages would surely help adoption and the project to stay relevant in
>> the future.
>> 
>> I have witnessed successful rewrites via LLMs as snapshots at time T, but I
>> don't have any experience in maintaining multiple versions over time, which
>> is probably the biggest challenge.
>> 
>> The first step Julian suggests is to turn spec and tests into a language
>> agnostic format (like Quidem), so that we have both a "description" of what
>> we want, and a way to verify derived implementations, and I can't agree
>> more.
>> 
>> New features and bugs in the specification would be handled as changes to
>> the specification and tests, which are shared, but how would we ensure
>> "consistency" for implementation-specific concepts?
>> 
>> Since those concepts are per-implementation, the risk of drift from the
>> "snapshot rewrite at T" might become an issue.
>> 
>> I am not talking about lower-level consistency, as that is impossible to
>> achieve across radically different languages, I am talking about conceptual
>> consistency and the capability to match higher-level concepts and
>> interfaces (e.g., the metadata provider), across different implementations.
>> 
>> I wonder if a protobuf specification of the internal representation of
>> Calcite would be a good tool to keep the different implementations from
>> drifting.
>> 
>> If we move to multiple implementations, we also need to ensure that the
>> burden on maintainers doesn't double, as this is already a problem we face.
>> A "spec-first" approach should ideally automate the validation of all
>> implementations whenever a shared test changes.
>> 
>> There are many examples of projects with multiple implementations/bindings
>> (Substrait, Apache Arrow, to name a few projects which are spec-heavy), so
>> maybe looking at what those communities do could be useful.
>> 
>> Looking forward to hearing your thoughts and ideas!
>> 
>> Best regards,
>> Alessandro
>>

Re: [DISCUSS] Single specification, multiple implementations

Reply via email to