In some projects where I was involved and there was the notion of "system fields" these used to be "special" columns on the table level that could be used in queries but they were ignored when expanding the "*".
Implementation wise, I think we introduced a new interface (not necessarily connected to Table interface) with a single method (something along the lines isSystemField(int pos)) and then customized the validator to check the method and act accordingly. It can be something really similar to how we handle rolled up columns in expandStar [1]. Best, Stamatis [1] https://github.com/apache/calcite/blob/782d327d24c04e2161102b22f8880204462befd4/core/src/main/java/org/apache/calcite/sql/validate/SqlValidatorImpl.java#L667 On Mon, Oct 30, 2023 at 11:30 PM Julian Hyde <[email protected]> wrote: > > That makes sense. > > I asked the questions because I have seen several other definitions of > ’system fields’ that seemed - to the person asking for them - to be the only > possible definition of ’system fields’. > > Rather than trying to come up with a grand notion of ‘system field’ I would > prefer to focus on specific behaviors. In this case, the desired behavior > seems to be a table [1] to declare its row type and somehow indicate that > some of those columns should be ignored when expanding “*” or “alias.*”. > > One way to accomplish this is modifying the Table interface. Say, add a > method ‘boolean includeInStarExpansion(RelDataTypeField field)’. To use this, > the schema author would have to change their implementation of Table. > > Another way is to extend RelDataType. Drill took this route, adding > RelDataType.isDynamicStruct(), when they wanted to support late-binding > schema [2]. To use this, the author would probably need to provide a new type > factory or type system. > > Another way is a global property, say a regular expression for field names to > ignore during star expansion. Simple, but rather inflexible, because it would > apply to all tables (and queries) regardless of their source. > > Whichever way we provide, it would need to be plumbed through to the > SqlValidatorImpl.expandStar method. Very likely there would be changes to > interface SqlValidatorNamespace. Or we could re-use > SqlValidatorNamespace.getRowTypeSansSystemColumns() method, which seems > purpose-built for this (because it was!) > > Julian > > [1] > https://calcite.apache.org/javadocAggregate/org/apache/calcite/schema/Table.html > > [2] https://issues.apache.org/jira/browse/CALCITE-1150 > > > > On Oct 27, 2023, at 3:06 PM, Gian Merlino <[email protected]> wrote: > > > > I take it from your reply that there isn't really a "system field" concept > > like this in Calcite today, even though there appear to be things that are > > kind of similar. So we can discuss what it might look like. > > > > Warning: I'm only vaguely familiar with how this works in other systems, > > and completely ignorant of whether the SQL standard touches on this. But > > that being said, these seem like reasonable answers to your questions: > > > >> * Does a system field appear in all tables (relations, including tables > >> derived via a query) or only base tables? > > > > I only have a need for them in base tables, so I'd say only base tables. My > > feeling is that system fields are these quasi-physical things, so they make > > sense coming from a table, but they don't really make sense coming from any > > arbitrary relation. I have a hard time coming up with something they'd be > > useful for given that an optimizer can morph your relations any which way > > it chooses. > > > > Btw, this makes me consider the case where an optimizer morphs a base table > > call (for example it can swap in a materialized view). There's various ways > > to deal with that, I suppose. One is to not allow base table calls to be > > replaced with something else if their system fields are being referred to. > > Another is to have the system fields show up as null if the base table is > > replaced. Another is to have the behavior be undefined and let implementors > > do whatever they want: they may be null, or they may take on values that > > make sense in the context of whatever the new plan is. (Like, perhaps > > selecting __filename would refer to a file that backs the materialized > > view.) My first thought is that it's best if the behavior is undefined at > > Calcite's level, i.e. dependent on the implementation of the particular > > optimization and the particular system field. But that's only because I > > can't immediately think of a specific behavior that is always going to be > > good. > > > >> * If there are many system fields (hundreds), how do we keep the plan of a > >> manageable size? > > > > I don't know of systems out there that have hundreds of system fields, so > > this may be a non issue. In Druid we certainly aren't planning to add more > > than 1 right now, and perhaps a small handful in the foreseeable future. > > Not sure if this is a real answer though :) > > > >> * Do system fields belong to the whole query (as CURRENT_DATE does), or > >> just a record (table alias)? > > > > For the use case I have in mind, they'd need to belong to the table > > records. Different rows in different tables would have different values of > > the system fields. > > > > On 2023/10/27 19:59:46 Julian Hyde wrote: > >> You’re thinking about this the same way that I do — first agree what is > >> the desired behavior of a so-called ’system field’. > >> > >> I have questions like: > >> * Does a system field appear in all tables (relations, including tables > >> derived via a query) or only base tables? > >> * If there are many system fields (hundreds), how do we keep the plan of a > >> manageable size? > >> * Do system fields belong to the whole query (as CURRENT_DATE does), or > >> just a record (table alias)? > >> > >> I think the notion of a field that is present in the row but is not > >> included in the expansion of ‘*’ or ‘alias.*’ would be useful. It would be > >> fairly easy to implement because it only affects star expansion (and > >> therefore only affects the validator and sql-to-rel-converter, not the > >> planner). > >> > >> Julian > >> > >> > >>> On Oct 27, 2023, at 12:45 PM, Gian Merlino <[email protected]> wrote: > >>> > >>> I'm looking at adding system fields to certain tables in Druid: they > >>> should > >>> be referenceable by name but shouldn't show up through star-expansion. For > >>> example, we'd like to expose the name of the data file currently being > >>> read > >>> as "__filename". > >>> > >>> I'm trying to figure out if this is something we can do with Calcite right > >>> now or if we need to do some Calcite changes first. > >>> > >>> I did find a couple of things, but AFAICT they are not exactly what I'm > >>> looking for: > >>> > >>> - There is discussion in CALCITE-2755 about treating Elasticsearch's _id > >>> field as a system field, but it looks like the _id field is a key in a map > >>> rather than a top-level field, and its visibility is managed at runtime by > >>> ElasticsearchProject rather than being handled at the validator. > >>> > >>> - In SqlValidatorImpl expandStar, there's a variable "includeSystemVars" > >>> that is set to false when called by SqlToRelConverter. Seems promisingly > >>> named, although it doesn't seem to do anything as far as I can tell. I'm > >>> wondering if I'm missing something, or if this is an incomplete feature. > >>> It's also not clear to me how the validator would figure out which fields > >>> are system fields anyway, since I don't see an obvious place to mark them > >>> in a RelDataType, which I think is how the validator learns about > >>> available > >>> fields from its tables. > >>> > >>> Happy to do a contribution to Calcite if one is needed. > >>> > >>> Thanks, > >>> Gian > >> > >> >
