emkornfield commented on code in PR #7504:
URL: https://github.com/apache/iceberg/pull/7504#discussion_r1186560318
##########
format/view-spec.md:
##########
@@ -328,3 +330,17 @@
s3://bucket/warehouse/default.db/event_agg/metadata/00002-(uuid).metadata.json
} ]
}
```
+
+## Appendix B: Well Known (canonical) dialects
+
+The following dialects names are reserved and indicate dialects for specific
systems:
+
+| Dialect Name | Description
| Versioning |
+|--------------|-------------------------------------------------------------------------------------------------------|-------------|
+|athena | [Amazon Athena
Dialect](https://docs.aws.amazon.com/athena/latest/ug/ddl-sql-reference.html)
| TBD |
+|google_sql | [Google's SQL
Dialect](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax)
| TBD |
+|spark | [Apache Spark SQL
Dialect](https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html)
| TBD |
+|trino | [Trino SQL
Dialect](https://trino.io/docs/current/language.html)
| TBD |
+
+Dialect names starting with "iceberg." are reserved for future extension. Well
known dialects
+may be added in the future with the prefix of "iceberg." (E.g.
"iceberg.new_dialect_name").
Review Comment:
I though I replied so I apologize if this is duplicate pose.
> Do you mean spec is locked at this point (say after this PR is merged),
and only "iceberg." dialects can be used in the future?
My understanding is specifications are not official until they officially
ratified. So I see this PR as an evolution and we should try to get more
members of the community to chime in if there are other dialects to reserve
(maybe dremio).
The problem is a free-form string in the specification today. So I think
there are three options:
1. Declare a list of "valid values" as part of a specification and say any
value not-listed is unofficial and may have name collisions in the future if
the community decides to add values to the list (essentially declare view
dialects closed).
2. Use this approach of something similar, which officially carves out a
reserved namespace where we can make changes without breaking clients that
choose to add there own dialect. Another option here is to specify all
dialects not declared in the specification should be a declared prefix (e.g.
"custom.")
3. Have no guidance on how to populate the field, which means that people
using there own dialect string might or might not have a name collision with
names added to the specification.
I don't have a strong preference here other then avoid 3, even though in
practice we might not ever run into a problem. There might also be other
options that I haven't considered.
> How do you compare that to updating the spec with the new dialect?
I'm not sure I fully understand this question but I think this depends on
which option above is chosen. If option 3 is chosen and a new dialect is
added, it risks confusion on interpretation for downstream systems (so
technically it should require a new specification revision. Keeping reserved
namespaces reduce the risk of adding dialects.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]