wypoon commented on code in PR #12217: URL: https://github.com/apache/iceberg/pull/12217#discussion_r1951565008
########## docs/docs/spark-configuration.md: ########## @@ -155,16 +155,18 @@ spark.read .table("catalog.db.table") ``` -| Spark option | Default | Description | -| --------------- | --------------------- | ----------------------------------------------------------------------------------------- | -| snapshot-id | (latest) | Snapshot ID of the table snapshot to read | -| as-of-timestamp | (latest) | A timestamp in milliseconds; the snapshot used will be the snapshot current at this time. | -| split-size | As per table property | Overrides this table's read.split.target-size and read.split.metadata-target-size | -| lookback | As per table property | Overrides this table's read.split.planning-lookback | -| file-open-cost | As per table property | Overrides this table's read.split.open-file-cost | -| vectorization-enabled | As per table property | Overrides this table's read.parquet.vectorization.enabled | -| batch-size | As per table property | Overrides this table's read.parquet.vectorization.batch-size | -| stream-from-timestamp | (none) | A timestamp in milliseconds to stream from; if before the oldest known ancestor snapshot, the oldest will be used | +| Spark option | Default | Description | +|-------------------------------------| --------------------- |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| snapshot-id | (latest) | Snapshot ID of the table snapshot to read | +| as-of-timestamp | (latest) | A timestamp in milliseconds; the snapshot used will be the snapshot current at this time. | +| split-size | As per table property | Overrides this table's read.split.target-size and read.split.metadata-target-size | +| lookback | As per table property | Overrides this table's read.split.planning-lookback | +| file-open-cost | As per table property | Overrides this table's read.split.open-file-cost | +| vectorization-enabled | As per table property | Overrides this table's read.parquet.vectorization.enabled | +| batch-size | As per table property | Overrides this table's read.parquet.vectorization.batch-size | +| stream-from-timestamp | (none) | A timestamp in milliseconds to stream from; if before the oldest known ancestor snapshot, the oldest will be used | +| streaming-max-files-per-micro-batch | INT_MAX | Maximum number of files per microbatch | +| streaming-max-rows-per-micro-batch | INT_MAX | Maximum number of rows per microbatch. Note : smallest granuality supported is 1 file, please make sure number of records per file is always greater than the number of records in the largest file possible, otherwise it can lead to stream being stuck | Review Comment: Just curious - what is our usual practice for updating a table? Do we reformat all the rows to align the column widths (which is only partially done here)? Or should we minimize churn and not reformat existing rows? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org