rdblue commented on PR #7526: URL: https://github.com/apache/iceberg/pull/7526#issuecomment-1563576351
This was not originally part of the spec because the Parquet/Avro/ORC schema is the source of truth. We don't want to have multiple places for the same information that could conflict with one another. That would make the spec more complicated and harder to implement. (Also, what if some lazy person writes `iceberg.schema` but not field IDs? We don't want to need to reconcile them.) We wrote the Iceberg schema into file metadata primarily for debugging and informational purposes. I think we could add a recommendation to the spec to do the same, but I don't think that we should require `iceberg.schema`. The problem with making it required now is not just that it would make the spec larger and complicate the source of truth for a schema. The issue is that this would not help because some writers don't currently write the property and there are existing files in tables without it. If we can't rely on it, what is the value? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
