I take it from your reply that there isn't really a "system field" concept like 
this in Calcite today, even though there appear to be things that are kind of 
similar. So we can discuss what it might look like.

Warning: I'm only vaguely familiar with how this works in other systems, and 
completely ignorant of whether the SQL standard touches on this. But that being 
said, these seem like reasonable answers to your questions:

>  * Does a system field appear in all tables (relations, including tables 
> derived via a query) or only base tables?

I only have a need for them in base tables, so I'd say only base tables. My 
feeling is that system fields are these quasi-physical things, so they make 
sense coming from a table, but they don't really make sense coming from any 
arbitrary relation. I have a hard time coming up with something they'd be 
useful for given that an optimizer can morph your relations any which way it 
chooses.

Btw, this makes me consider the case where an optimizer morphs a base table 
call (for example it can swap in a materialized view). There's various ways to 
deal with that, I suppose. One is to not allow base table calls to be replaced 
with something else if their system fields are being referred to. Another is to 
have the system fields show up as null if the base table is replaced. Another 
is to have the behavior be undefined and let implementors do whatever they 
want: they may be null, or they may take on values that make sense in the 
context of whatever the new plan is. (Like, perhaps selecting __filename would 
refer to a file that backs the materialized view.) My first thought is that 
it's best if the behavior is undefined at Calcite's level, i.e. dependent on 
the implementation of the particular optimization and the particular system 
field. But that's only because I can't immediately think of a specific behavior 
that is always going to be good.

>  * If there are many system fields (hundreds), how do we keep the plan of a 
> manageable size?

I don't know of systems out there that have hundreds of system fields, so this 
may be a non issue. In Druid we certainly aren't planning to add more than 1 
right now, and perhaps a small handful in the foreseeable future. Not sure if 
this is a real answer though :)

>  * Do system fields belong to the whole query (as CURRENT_DATE does), or just 
> a record (table alias)?

For the use case I have in mind, they'd need to belong to the table records. 
Different rows in different tables would have different values of the system 
fields.

On 2023/10/27 19:59:46 Julian Hyde wrote:
> You’re thinking about this the same way that I do — first agree what is the 
> desired behavior of a so-called ’system field’.
> 
> I have questions like:
>  * Does a system field appear in all tables (relations, including tables 
> derived via a query) or only base tables?
>  * If there are many system fields (hundreds), how do we keep the plan of a 
> manageable size?
>  * Do system fields belong to the whole query (as CURRENT_DATE does), or just 
> a record (table alias)?
> 
> I think the notion of a field that is present in the row but is not included 
> in the expansion of ‘*’ or ‘alias.*’ would be useful. It would be fairly easy 
> to implement because it only affects star expansion (and therefore only 
> affects the validator and sql-to-rel-converter, not the planner).
> 
> Julian
> 
> 
> > On Oct 27, 2023, at 12:45 PM, Gian Merlino <[email protected]> wrote:
> > 
> > I'm looking at adding system fields to certain tables in Druid: they should
> > be referenceable by name but shouldn't show up through star-expansion. For
> > example, we'd like to expose the name of the data file currently being read
> > as "__filename".
> > 
> > I'm trying to figure out if this is something we can do with Calcite right
> > now or if we need to do some Calcite changes first.
> > 
> > I did find a couple of things, but AFAICT they are not exactly what I'm
> > looking for:
> > 
> > - There is discussion in CALCITE-2755 about treating Elasticsearch's _id
> > field as a system field, but it looks like the _id field is a key in a map
> > rather than a top-level field, and its visibility is managed at runtime by
> > ElasticsearchProject rather than being handled at the validator.
> > 
> > - In SqlValidatorImpl expandStar, there's a variable "includeSystemVars"
> > that is set to false when called by SqlToRelConverter. Seems promisingly
> > named, although it doesn't seem to do anything as far as I can tell. I'm
> > wondering if I'm missing something, or if this is an incomplete feature.
> > It's also not clear to me how the validator would figure out which fields
> > are system fields anyway, since I don't see an obvious place to mark them
> > in a RelDataType, which I think is how the validator learns about available
> > fields from its tables.
> > 
> > Happy to do a contribution to Calcite if one is needed.
> > 
> > Thanks,
> > Gian
> 
> 

Reply via email to