bobmerevel commented on issue #16838:
URL: https://github.com/apache/iceberg/issues/16838#issuecomment-4722667824
After investigating this further, I think there is an opportunity to improve
the failure mode in ExpireSnapshotsSparkAction.
In my opinion, the action could validate metadata.metadataFileLocation()
before performing the commit. This would prevent a successful metadata commit
followed by a failure during expired file calculation due to an invalid
TableMetadata.
For example, immediately after:
TableMetadata originalMetadata = ops.current();
the action could verify that originalMetadata.metadataFileLocation() is not
null and fail with a descriptive error message if it is.
Similarly, after:
TableMetadata updatedMetadata = ops.refresh();
the refreshed metadata could be validated before it is used to construct
static tables.
Additionally, it may be beneficial to add a defensive validation in
BaseSparkAction.newStaticTable() itself, since it currently assumes that
metadata.metadataFileLocation() is always present:
protected Table newStaticTable(TableMetadata metadata, FileIO io) {
StaticTableOperations ops = new StaticTableOperations(metadata, io);
return new BaseTable(ops, metadata.metadataFileLocation());
}
At the moment, if metadataFileLocation() is null, the code eventually fails
with a NullPointerException, which makes the real root cause difficult to
identify.
A fail-fast validation with an explicit error message such as:
“Table metadata file location is null. This may indicate an invalid catalog
implementation or an invalid REST Catalog response missing the
metadata-location field.”
would make debugging much easier for catalog implementers.
This would not change the behavior of valid implementations, but it would
provide a much clearer diagnostic when the catalog contract is violated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]