emkornfield commented on code in PR #7504:
URL: https://github.com/apache/iceberg/pull/7504#discussion_r1186560318


##########
format/view-spec.md:
##########
@@ -328,3 +330,17 @@ 
s3://bucket/warehouse/default.db/event_agg/metadata/00002-(uuid).metadata.json
   } ]
 }
 ```
+
+## Appendix B: Well Known (canonical) dialects
+
+The following dialects names are reserved and indicate dialects for specific 
systems:
+
+| Dialect Name | Description                                                   
                                        | Versioning  |
+|--------------|-------------------------------------------------------------------------------------------------------|-------------|
+|athena        |  [Amazon Athena 
Dialect](https://docs.aws.amazon.com/athena/latest/ug/ddl-sql-reference.html)   
      | TBD         |
+|google_sql    |  [Google's SQL 
Dialect](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax)
   | TBD         |
+|spark         |  [Apache Spark SQL 
Dialect](https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html)     
   | TBD         | 
+|trino         |  [Trino SQL 
Dialect](https://trino.io/docs/current/language.html)                           
          | TBD         |
+
+Dialect names starting with "iceberg." are reserved for future extension. Well 
known dialects
+may be added in the future with the prefix of "iceberg." (E.g. 
"iceberg.new_dialect_name").

Review Comment:
   I though I replied so I apologize if this is duplicate pose.
   
   > Do you mean spec is locked at this point (say after this PR is merged), 
and only "iceberg." dialects can be used in the future?
   
   My understanding is specifications are not official until they officially 
ratified.  So I see this PR as an evolution and we should try to get more 
members of the community to chime in if there are other dialects to reserve 
(maybe dremio).  
   
   The problem is a free-form string in the specification today.  So I think 
there are three options:
   1.  Declare a list of "valid values" as part of a specification and say any 
value not-listed is unofficial and may have name collisions in the future if 
the community decides to add values to the list (essentially declare view 
dialects closed). 
   2. Use this approach of something similar, which officially carves out a 
reserved namespace where we can make changes without breaking clients that 
choose to add there own dialect.  Another option here is to specify all 
dialects not declared in the specification should be a declared prefix (e.g. 
"custom.")
   3.  Have no guidance on how to populate the field, which means that people 
using there own dialect string might or might not have a name collision with 
names added to the specification.
   
   I don't have a strong preference here other then avoid 3, even though in 
practice we might not ever run into a problem.  There might also be other 
options that I haven't considered.
   
   > How do you compare that to updating the spec with the new dialect?
   
   I'm not sure I fully understand this question but I think this depends on 
which option above is chosen.  If option 3 is chosen and a new dialect is 
added, it risks confusion on interpretation for downstream systems (so 
technically it should require a new specification revision. Keeping reserved 
namespaces reduce the risk of adding dialects.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to