rdblue commented on PR #7526:
URL: https://github.com/apache/iceberg/pull/7526#issuecomment-1563576351

   This was not originally part of the spec because the Parquet/Avro/ORC schema 
is the source of truth. We don't want to have multiple places for the same 
information that could conflict with one another. That would make the spec more 
complicated and harder to implement. (Also, what if some lazy person writes 
`iceberg.schema` but not field IDs? We don't want to need to reconcile them.)
   
   We wrote the Iceberg schema into file metadata primarily for debugging and 
informational purposes. I think we could add a recommendation to the spec to do 
the same, but I don't think that we should require `iceberg.schema`.
   
   The problem with making it required now is not just that it would make the 
spec larger and complicate the source of truth for a schema. The issue is that 
this would not help because some writers don't currently write the property and 
there are existing files in tables without it. If we can't rely on it, what is 
the value?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to