laskoviymishka commented on issue #701:
URL: https://github.com/apache/iceberg-go/issues/701#issuecomment-4000641684
This is a valid issue. The root cause is that `constructTableInput` always
serializes all schema columns (current + all historical) into
`StorageDescriptor.Columns` on every create/update call. For schemas with
thousands of fields this reliably hits Glue's request payload limit.
This has been discussed in the Java implementation before:
- apache/iceberg#7584 — request to hide non-current fields in Glue (May 2023)
- apache/iceberg#11334 — PR attempting `glue.non-current-fields-disabled`
option (Nov 2024, not merged)
- apache/iceberg#12664 — follow-up PR (Jun 2025, closed due to inactivity)
I made fix https://github.com/apache/iceberg-go/pull/769 via a new
`glue.schema-columns` property (default `true`) — setting it to `false` omits
columns from the StorageDescriptor entirely, keeping the table readable since
the true schema lives in the metadata JSON.
Worth noting: **both pyiceberg and iceberg-rust have the same gap** —
neither exposes any property to opt out of writing columns to the
StorageDescriptor. It would be worth reviving the stalled Java PRs and filing
symmetric issues/PRs across all implementations to keep behavior consistent.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]