Re: [PR] Spark 3.5: Fix flaky TestSparkExecutorCache [iceberg]
aokolnychyi closed pull request #9583: Spark 3.5: Fix flaky TestSparkExecutorCache URL: https://github.com/apache/iceberg/pull/9583 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Partition Evolution [iceberg-python]
HonahX commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1469034571 ## pyiceberg/table/__init__.py: ## @@ -533,6 +551,39 @@ def _(update: SetCurrentSchemaUpdate, base_metadata: TableMetadata, context: _Ta return base_metadata.model_copy(update={"current_schema_id": new_schema_id}) +@_apply_table_update.register(AddPartitionSpecUpdate) +def _(update: AddPartitionSpecUpdate, base_metadata: TableMetadata, context: _TableMetadataUpdateContext) -> TableMetadata: +for spec in base_metadata.partition_specs: +if spec.spec_id == update.spec_id: +raise ValueError(f"Partition spec with id {spec.spec_id} already exists: {spec}") + +context.add_update(update) +return base_metadata.model_copy( +update={ +"partition_specs": base_metadata.partition_specs + [update.spec], +} +) + + +@_apply_table_update.register(SetDefaultSpecUpdate) +def _(update: SetDefaultSpecUpdate, base_metadata: TableMetadata, context: _TableMetadataUpdateContext) -> TableMetadata: +new_spec_id = update.spec_id +if new_spec_id == base_metadata.default_spec_id: Review Comment: Shall we add the support for `-1` spec_id which represents the last spec added in this transaction? https://github.com/apache/iceberg/blob/6852278d6cf01bec2998be954d7bd6f6cc37cc94/open-api/rest-catalog-open-api.yaml#L2242 ## pyiceberg/table/__init__.py: ## @@ -2271,3 +2325,240 @@ def commit(self) -> Snapshot: ) return snapshot + + +class UpdateSpec: +_table: Table +_name_to_field: Dict[str, PartitionField] = {} +_name_to_added_field: Dict[str, PartitionField] = {} +_transform_to_field: Dict[Tuple[int, str], PartitionField] = {} +_transform_to_added_field: Dict[Tuple[int, str], PartitionField] = {} +_renames: Dict[str, str] = {} +_added_time_fields: Dict[int, PartitionField] = {} +_case_sensitive: bool +_adds: List[PartitionField] +_deletes: Set[int] +_last_assigned_partition_id: int +_transaction: Optional[Transaction] +_unassigned_field_name = 'unassigned_field_name' Review Comment: Thanks for the analysis! I agree that we should keep the validation. I was thinking a little bit of refactoring may make things look better. I've left a comment at the place where we provide the name for the new field. Please let me know what you think. ## pyiceberg/table/__init__.py: ## @@ -2271,3 +2325,243 @@ def commit(self) -> Snapshot: ) return snapshot + + +class UpdateSpec: +_table: Table +_name_to_field: Dict[str, PartitionField] = {} +_name_to_added_field: Dict[str, PartitionField] = {} +_transform_to_field: Dict[Tuple[int, str], PartitionField] = {} +_transform_to_added_field: Dict[Tuple[int, str], PartitionField] = {} +_renames: Dict[str, str] = {} +_added_time_fields: Dict[int, PartitionField] = {} +_case_sensitive: bool +_adds: List[PartitionField] +_deletes: Set[int] +_last_assigned_partition_id: int +_transaction: Optional[Transaction] +_unassigned_field_name = 'unassigned_field_name' + +def __init__(self, table: Table, transaction: Optional[Transaction] = None, case_sensitive: bool = True) -> None: +self._table = table +self._name_to_field = {field.name: field for field in table.spec().fields} +self._name_to_added_field = {} +self._transform_to_field = {(field.source_id, repr(field.transform)): field for field in table.spec().fields} +self._transform_to_added_field = {} +self._adds = [] +self._deletes = set() +if len(table.specs()) == 1: +self._last_assigned_partition_id = PARTITION_FIELD_ID_START - 1 +else: +self._last_assigned_partition_id = table.spec().last_assigned_field_id +self._renames = {} +self._transaction = transaction +self._case_sensitive = case_sensitive +self._added_time_fields = {} + +def add_field( +self, +source_column_name: str, +transform: Transform[Any, Any], +partition_field_name: Optional[str] = _unassigned_field_name, +) -> UpdateSpec: +ref = Reference(source_column_name) +bound_ref = ref.bind(self._table.schema(), self._case_sensitive) +# verify transform can actually bind it +output_type = bound_ref.field.field_type +if not transform.can_transform(output_type): +raise ValueError(f"{transform} cannot transform {output_type} values from {bound_ref.field.name}") + +transform_key = (bound_ref.field.field_id, repr(transform)) +existing_partition_field = self._transform_to_field.get(transform_key) +if existing_partition_field and self._is_duplicate_partition(transform, existing_partition_field): +raise ValueError(f"Duplicate partition field f
Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]
pvary merged PR #9524: URL: https://github.com/apache/iceberg/pull/9524 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]
pvary commented on PR #9524: URL: https://github.com/apache/iceberg/pull/9524#issuecomment-1916303375 @mas-chen: Please backport this to 1.17, 1.18 and continue with the Junit5 migration PR. Thanks @mas-chen for the PR and @stevenzwu for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Docs: Enhance Java quickstart example [iceberg]
manuzhang opened a new pull request, #9585: URL: https://github.com/apache/iceberg/pull/9585 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Cannot drop table with AWS Glue Catalog [iceberg]
duwanqiebi opened a new issue, #9586: URL: https://github.com/apache/iceberg/issues/9586 ### Apache Iceberg version None ### Query engine None ### Please describe the bug 🐞 - Conf: - "spark.sql.catalog.spark_catalog": "org.apache.iceberg.spark.SparkSessionCatalog" - "spark.sql.catalog.spark_catalog.catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog" - "spark.sql.catalog.spark_catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO" - "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" Create Table, write, read is OK. But I get the following error when I drop table: ``` spark.sql("drop table $db.test_iceberg_0 purge") pyspark.errors.exceptions.captured.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table test_iceberg_0. StorageDescriptor#InputFormat cannot be null for table: test_iceberg_0 (Service: null; Status Code: 0; Error Code: null; Request ID: null; Proxy: null) ``` ### -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Add an option not to delete files in ExpireSnapshots [iceberg]
manuzhang closed pull request #9584: Spark 3.5: Add an option not to delete files in ExpireSnapshots URL: https://github.com/apache/iceberg/pull/9584 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] feat: Add user guide for website. [iceberg-rust]
liurenjie1024 opened a new pull request, #178: URL: https://github.com/apache/iceberg-rust/pull/178 This pr add user guide for website. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Add user guide for website. [iceberg-rust]
liurenjie1024 commented on PR #178: URL: https://github.com/apache/iceberg-rust/pull/178#issuecomment-1916332649 cc @Xuanwo @ZENOTME @Fokko PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: add parquet writer [iceberg-rust]
liurenjie1024 commented on code in PR #176: URL: https://github.com/apache/iceberg-rust/pull/176#discussion_r1470792480 ## crates/iceberg/src/spec/manifest.rs: ## @@ -924,7 +924,7 @@ impl TryFrom for ManifestStatus { } /// Data file carries data file path, partition tuple, metrics, … -#[derive(Debug, PartialEq, Clone, Eq, TypedBuilder)] +#[derive(Debug, PartialEq, Clone, Eq, Builder)] Review Comment: Oh, I see, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Cannot drop table with AWS Glue Catalog [iceberg]
nastra commented on issue #9586: URL: https://github.com/apache/iceberg/issues/9586#issuecomment-1916390985 Might be similar to https://github.com/apache/iceberg/issues/5565, I wonder if you're missing the catalog name in the table identifier -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Remove `unwrap()` in `ManifestListWriter.close()` [iceberg-rust]
liurenjie1024 commented on issue #177: URL: https://github.com/apache/iceberg-rust/issues/177#issuecomment-1916433902 Thanks for reporting! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]
wolfeidau commented on PR #51: URL: https://github.com/apache/iceberg-go/pull/51#issuecomment-1916574894 @zeroshade done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Python: Introduce setters [iceberg-python]
Fokko closed pull request #304: Python: Introduce setters URL: https://github.com/apache/iceberg-python/pull/304 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Rewrite identifier when using Subquery expressions in View [iceberg]
nastra commented on code in PR #9587: URL: https://github.com/apache/iceberg/pull/9587#discussion_r1471120727 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -877,6 +877,9 @@ public void dropV1View() { v1SessionCatalog() .tableExists(new org.apache.spark.sql.catalyst.TableIdentifier(v1View))) .isFalse(); + +sql("USE spark_catalog"); +sql("DROP TABLE IF EXISTS %s", tableName); Review Comment: the existence of that table in `spark_catalog` would not cause the expected failures in the newly added tests when doing a read against the table -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Use `.read_bytes()` instead [iceberg-python]
Fokko opened a new pull request, #325: URL: https://github.com/apache/iceberg-python/pull/325 Noticed this while profiling. This simplifies the code a bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Performance optimization [iceberg]
Fokko opened a new issue, #9588: URL: https://github.com/apache/iceberg/issues/9588 ### Apache Iceberg version None ### Query engine None ### Please describe the bug 🐞 Seeing if there is anything we can do to improve performance:  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Performance optimization [iceberg]
liurenjie1024 commented on issue #9588: URL: https://github.com/apache/iceberg/issues/9588#issuecomment-1916860122 Seems a lot of time are spent on decompression. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Performance optimization [iceberg]
Fokko commented on issue #9588: URL: https://github.com/apache/iceberg/issues/9588#issuecomment-1916881774 s3fs:  PyArrow:  Also, the PyArrow import is very slow :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] InMemory Catalog Implementation [iceberg-python]
syun64 commented on code in PR #289: URL: https://github.com/apache/iceberg-python/pull/289#discussion_r1471348272 ## tests/catalog/test_base.py: ## Review Comment: This is a _super-nit_ (please feel free to take it or leave it): should we rename this test module to **test_in_memory.py** to honor the fact that InMemory Catalog is now a proper catalog implementation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Support deletion in Apache Flink [iceberg]
VidakM commented on issue #8718: URL: https://github.com/apache/iceberg/issues/8718#issuecomment-1917071229 Is `DELTE FROM` still on the roadmap for Flink or could we help out in some way? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Add an option not to delete files in ExpireSnapshots [iceberg]
nastra commented on code in PR #9584: URL: https://github.com/apache/iceberg/pull/9584#discussion_r1471390452 ## api/src/main/java/org/apache/iceberg/ExpireSnapshots.java: ## @@ -118,4 +118,9 @@ public interface ExpireSnapshots extends PendingUpdate> { * @return this for method chaining */ ExpireSnapshots cleanExpiredFiles(boolean clean); + + /** Returns number of expired snapshots */ + default long expiredSnapshotsCount() { +return 0; Review Comment: ```suggestion return 0L; ``` ## .palantir/revapi.yml: ## @@ -873,6 +873,13 @@ acceptedBreaks: new: "method void org.apache.iceberg.encryption.Ciphers::()" justification: "Static utility class - should not have public constructor" "1.4.0": +org.apache.iceberg:iceberg-api: Review Comment: this shouldn't be necessary -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Add an option not to delete files in ExpireSnapshots [iceberg]
nastra commented on code in PR #9584: URL: https://github.com/apache/iceberg/pull/9584#discussion_r1471391266 ## api/src/main/java/org/apache/iceberg/actions/ExpireSnapshots.java: ## @@ -99,6 +99,11 @@ public interface ExpireSnapshots extends Action
Re: [PR] Spark 3.5: Add an option not to delete files in ExpireSnapshots [iceberg]
nastra commented on code in PR #9584: URL: https://github.com/apache/iceberg/pull/9584#discussion_r1471397337 ## core/src/main/java/org/apache/iceberg/RemoveSnapshots.java: ## @@ -85,6 +85,8 @@ public void accept(String file) { private ExecutorService planExecutorService = ThreadPools.getWorkerPool(); private Boolean incrementalCleanup; + private List expiredSnapshots; Review Comment: I think it would be better to keep a Set of snapshot IDs, because removing from a list is O(n) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Add an option not to delete files in ExpireSnapshots [iceberg]
nastra commented on code in PR #9584: URL: https://github.com/apache/iceberg/pull/9584#discussion_r1471399965 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestExpireSnapshotsProcedure.java: ## @@ -541,6 +541,29 @@ public void testExpireSnapshotsWithPartitionStatisticFiles() { .exists(); } + @Test + public void testExpireSnapshotsNotDeletingFiles() { +sql("CREATE TABLE %s (id bigint NOT NULL, data string) USING iceberg", tableName); + +sql("INSERT INTO TABLE %s VALUES (1, 'a')", tableName); +sql("INSERT INTO TABLE %s VALUES (2, 'b')", tableName); + +Table table = validationCatalog.loadTable(tableIdent); + +Assert.assertEquals("Should be 2 snapshots", 2, Iterables.size(table.snapshots())); Review Comment: ```suggestion assertThat(table.snapshots()).hasSize(2); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Add an option not to delete files in ExpireSnapshots [iceberg]
nastra commented on code in PR #9584: URL: https://github.com/apache/iceberg/pull/9584#discussion_r1471405013 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/ExpireSnapshotsSparkAction.java: ## @@ -67,6 +67,8 @@ public class ExpireSnapshotsSparkAction extends BaseSparkAction
Re: [PR] Partition Evolution [iceberg-python]
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1471453813 ## pyiceberg/table/__init__.py: ## @@ -2271,3 +2325,243 @@ def commit(self) -> Snapshot: ) return snapshot + + +class UpdateSpec: +_table: Table +_name_to_field: Dict[str, PartitionField] = {} +_name_to_added_field: Dict[str, PartitionField] = {} +_transform_to_field: Dict[Tuple[int, str], PartitionField] = {} +_transform_to_added_field: Dict[Tuple[int, str], PartitionField] = {} +_renames: Dict[str, str] = {} +_added_time_fields: Dict[int, PartitionField] = {} +_case_sensitive: bool +_adds: List[PartitionField] +_deletes: Set[int] +_last_assigned_partition_id: int +_transaction: Optional[Transaction] +_unassigned_field_name = 'unassigned_field_name' + +def __init__(self, table: Table, transaction: Optional[Transaction] = None, case_sensitive: bool = True) -> None: +self._table = table +self._name_to_field = {field.name: field for field in table.spec().fields} +self._name_to_added_field = {} +self._transform_to_field = {(field.source_id, repr(field.transform)): field for field in table.spec().fields} +self._transform_to_added_field = {} +self._adds = [] +self._deletes = set() +if len(table.specs()) == 1: +self._last_assigned_partition_id = PARTITION_FIELD_ID_START - 1 +else: +self._last_assigned_partition_id = table.spec().last_assigned_field_id +self._renames = {} +self._transaction = transaction +self._case_sensitive = case_sensitive +self._added_time_fields = {} + +def add_field( +self, +source_column_name: str, +transform: Transform[Any, Any], +partition_field_name: Optional[str] = _unassigned_field_name, +) -> UpdateSpec: +ref = Reference(source_column_name) +bound_ref = ref.bind(self._table.schema(), self._case_sensitive) +# verify transform can actually bind it +output_type = bound_ref.field.field_type +if not transform.can_transform(output_type): +raise ValueError(f"{transform} cannot transform {output_type} values from {bound_ref.field.name}") + +transform_key = (bound_ref.field.field_id, repr(transform)) +existing_partition_field = self._transform_to_field.get(transform_key) +if existing_partition_field and self._is_duplicate_partition(transform, existing_partition_field): +raise ValueError(f"Duplicate partition field for ${ref.name}=${ref}, ${existing_partition_field} already exists") + +added = self._transform_to_added_field.get(transform_key) +if added: +raise ValueError(f"Already added partition: {added.name}") + +new_field = self._partition_field((bound_ref.field.field_id, transform), partition_field_name) +if new_field.name == self._unassigned_field_name: Review Comment: Ah yeah good point :) the unassigned name is really only temporarily needed. I'll update to do this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] add github add to check md link [iceberg-python]
kevinjqliu commented on PR #324: URL: https://github.com/apache/iceberg-python/pull/324#issuecomment-1917449075 Here's the result from the GitHub Action run from my fork https://github.com/kevinjqliu/iceberg-python/actions/runs/7707078539/job/21003669945 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] InMemory Catalog Implementation [iceberg-python]
kevinjqliu commented on code in PR #289: URL: https://github.com/apache/iceberg-python/pull/289#discussion_r1471604886 ## tests/catalog/test_base.py: ## Review Comment: thanks for the review @syun64. Since the In-Memory catalog is not a "production" catalog, I'm inclined to keep `test_base.py` for now. I plan to repurpose it later so that we can test the set of behaviors we expect from all catalogs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] InMemory Catalog Implementation [iceberg-python]
syun64 commented on code in PR #289: URL: https://github.com/apache/iceberg-python/pull/289#discussion_r1471609730 ## tests/catalog/test_base.py: ## Review Comment: Makes sense @kevinjqliu💯 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Move nightly versioned docs to top-level docs directory [iceberg]
Fokko merged PR #9578: URL: https://github.com/apache/iceberg/pull/9578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Fix flaky TestSparkExecutorCache [iceberg]
aokolnychyi closed pull request #9583: Spark 3.5: Fix flaky TestSparkExecutorCache URL: https://github.com/apache/iceberg/pull/9583 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.4: Fix writing of default values in CoW for rows with NULL columns which are unmatched [iceberg]
rdblue merged PR #9556: URL: https://github.com/apache/iceberg/pull/9556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.4: Fix writing of default values in CoW for rows with NULL columns which are unmatched [iceberg]
rdblue commented on PR #9556: URL: https://github.com/apache/iceberg/pull/9556#issuecomment-1917527435 Merging this. Thanks @amogh-jahagirdar! > It is still not clear how a wrong nullability info in MergeRows led to inserting default values. @aokolnychyi, I think what's happening is that the nullability flag is not checked when copying the row from the `MergeRows` output. The "default" values are null values coming from looking at unset values in `UnsafeRow`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Add support for describing/showing views [iceberg]
rdblue commented on code in PR #9513: URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471656403 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala: ## @@ -60,6 +63,13 @@ case class RewriteViewCommands(spark: SparkSession) extends Rule[LogicalPlan] wi properties = properties, allowExisting = allowExisting, replace = replace) + +case ShowViews(UnresolvedNamespace(Seq()), pattern, output) if isViewCatalog(catalogManager.currentCatalog) => + ShowIcebergViews(ResolvedNamespace(catalogManager.currentCatalog, Seq.empty), pattern, output) Review Comment: Won't this case be handled the same way by running `CatalogAndNamespace.unapply(Seq())` in the next rule? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Add support for describing/showing views [iceberg]
rdblue commented on code in PR #9513: URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471663971 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.scala: ## @@ -123,6 +128,18 @@ case class ExtendedDataSourceV2Strategy(spark: SparkSession) extends Strategy wi allowExisting = allowExisting, replace = replace) :: Nil +case d@DescribeRelation(ResolvedV2View(catalog, ident), _, isExtended, _) => Review Comment: I understand not wanting to bother with `output`, but I think the best practice is to preserve `output` from the logical plan to the physical plan. That's because resolution has occurred with the IDs from the logical plan and we don't want to reassign internal IDs in the physical plan. They should always match. That's why most Spark plans (like `DescribeTableExec`) copy and reuse the output attrs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Add support for describing/showing views [iceberg]
rdblue commented on code in PR #9513: URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471663971 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.scala: ## @@ -123,6 +128,18 @@ case class ExtendedDataSourceV2Strategy(spark: SparkSession) extends Strategy wi allowExisting = allowExisting, replace = replace) :: Nil +case d@DescribeRelation(ResolvedV2View(catalog, ident), _, isExtended, _) => Review Comment: I understand not wanting to bother with `output`, but I think the best practice is to preserve `output` from the logical plan to the physical plan. That's because resolution has occurred with the IDs from the logical plan and we don't want to reassign internal IDs in the physical plan. They should always match. That's why most Spark plans (like `DescribeTableExec`) copy and reuse the output attrs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Add support for describing/showing views [iceberg]
rdblue commented on code in PR #9513: URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471666384 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.scala: ## @@ -123,6 +128,18 @@ case class ExtendedDataSourceV2Strategy(spark: SparkSession) extends Strategy wi allowExisting = allowExisting, replace = replace) :: Nil +case d@DescribeRelation(ResolvedV2View(catalog, ident), _, isExtended, _) => + DescribeV2ViewExec(d.output, catalog.loadView(ident), isExtended) :: Nil + +case show@ShowTableProperties(ResolvedV2View(catalog, ident), propertyKey, _) => + ShowV2ViewPropertiesExec(show.output, catalog.loadView(ident), propertyKey) :: Nil Review Comment: It's a little odd to use `show.output` but then name `propertyKey` in the pattern. Why not name `output` in the pattern and reference it directly? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Add support for describing/showing views [iceberg]
rdblue commented on code in PR #9513: URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471669400 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -1149,6 +1149,159 @@ public void createViewWithSubqueryExpressionUsingGlobalTempView() { "because it references to the temporary object global_temp.%s", globalTempView)); } + @Test + public void describeView() { +String viewName = "describeView"; + +sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, tableName); +assertThat(sql("DESCRIBE %s", viewName)) +.containsExactlyInAnyOrder(row("id", "int", ""), row("data", "string", "")); Review Comment: I think `InAnyOrder` is incorrect. Don't we want these to be in the schema order? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Add support for describing/showing views [iceberg]
rdblue commented on code in PR #9513: URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471671329 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeV2ViewExec.scala: ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.util.escapeSingleQuotedString +import org.apache.spark.sql.connector.catalog.View +import org.apache.spark.sql.connector.catalog.ViewCatalog +import org.apache.spark.sql.execution.LeafExecNode +import scala.collection.JavaConverters._ + +case class DescribeV2ViewExec( + output: Seq[Attribute], + view: View, + isExtended: Boolean) extends V2CommandExec with LeafExecNode { + + import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ + + override protected def run(): Seq[InternalRow] = { +if (isExtended) { + (describeSchema :+ emptyRow) ++ describeExtended +} else { + describeSchema +} + } + + private def describeSchema: Seq[InternalRow] = +view.schema().map { column => + toCatalystRow( +column.name, +column.dataType.simpleString, +column.getComment().getOrElse("")) +} + + private def emptyRow: InternalRow = toCatalystRow("", "", "") + + private def describeExtended: Seq[InternalRow] = { +val outputColumns = view.queryColumnNames.mkString("[", ", ", "]") +val properties: Map[String, String] = view.properties.asScala.toMap -- ViewCatalog.RESERVED_PROPERTIES.asScala +val viewCatalogAndNamespace: Seq[String] = view.currentCatalog +: view.currentNamespace.toSeq +val viewProperties = properties.toSeq.sortBy(_._1).map { + case (key, value) => +s"'${escapeSingleQuotedString(key)}' = '${escapeSingleQuotedString(value)}'" +}.mkString("[", ", ", "]") + + +toCatalystRow("# Detailed View Information", "", "") :: + toCatalystRow("Comment", view.properties.getOrDefault(ViewCatalog.PROP_COMMENT, ""), "") :: + toCatalystRow("View Text", view.query, "") :: Review Comment: Do we want to include the view text? I think this should be suppressed in favor of showing it for `SHOW CREATE`. Otherwise the output will be formatted really strangely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Add support for describing/showing views [iceberg]
rdblue commented on code in PR #9513: URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471676481 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -1149,6 +1149,159 @@ public void createViewWithSubqueryExpressionUsingGlobalTempView() { "because it references to the temporary object global_temp.%s", globalTempView)); } + @Test + public void describeView() { +String viewName = "describeView"; + +sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, tableName); +assertThat(sql("DESCRIBE %s", viewName)) +.containsExactlyInAnyOrder(row("id", "int", ""), row("data", "string", "")); + } + + @Test + public void describeExtendedView() { +String viewName = "describeExtendedView"; +String sql = String.format("SELECT id, data FROM %s WHERE id <= 3", tableName); + +sql( +"CREATE VIEW %s (new_id COMMENT 'ID', new_data COMMENT 'DATA') COMMENT 'view comment' AS %s", +viewName, sql); +assertThat(sql("DESCRIBE EXTENDED %s", viewName)) +.contains( +row("new_id", "int", "ID"), +row("new_data", "string", "DATA"), +row("Comment", "view comment", ""), +row("View Text", sql, ""), +row("View Catalog and Namespace", String.format("%s.%s", catalogName, NAMESPACE), ""), +row("View Query Output Columns", "[id, data]", ""), +row( +"View Properties", +String.format( +"['format-version' = '1', 'location' = '/%s/%s', 'provider' = 'iceberg']", +NAMESPACE, viewName), +"")); + } + + @Test + public void showViewProperties() { +String viewName = "showViewProps"; + +sql( +"CREATE VIEW %s TBLPROPERTIES ('key1'='val1', 'key2'='val2') AS SELECT id, data FROM %s WHERE id <= 3", +viewName, tableName); +assertThat(sql("SHOW TBLPROPERTIES %s", viewName)) +.contains(row("key1", "val1"), row("key2", "val2")); + } + + @Test + public void showViewPropertiesByKey() { +String viewName = "showViewPropsByKey"; + +sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, tableName); +assertThat(sql("SHOW TBLPROPERTIES %s", viewName)).contains(row("provider", "iceberg")); + +assertThat(sql("SHOW TBLPROPERTIES %s (provider)", viewName)) +.contains(row("provider", "iceberg")); + +assertThat(sql("SHOW TBLPROPERTIES %s (non.existing)", viewName)) +.contains( +row( +"non.existing", +String.format( +"View %s.%s.%s does not have property: non.existing", +catalogName, NAMESPACE, viewName))); + } + + @Test + public void showViews() throws NoSuchTableException { +insertRows(6); +String sql = String.format("SELECT * from %s", tableName); +sql("CREATE VIEW v1 AS %s", sql); +sql("CREATE VIEW prefixV2 AS %s", sql); +sql("CREATE VIEW prefixV3 AS %s", sql); +sql("CREATE GLOBAL TEMPORARY VIEW globalViewForListing AS %s", sql); +sql("CREATE TEMPORARY VIEW tempViewForListing AS %s", sql); + +// spark stores temp views case-insensitive by default +Object[] tempView = row("", "tempviewforlisting", true); +assertThat(sql("SHOW VIEWS")) +.contains( +row(NAMESPACE.toString(), "prefixV2", false), +row(NAMESPACE.toString(), "prefixV3", false), +row(NAMESPACE.toString(), "v1", false), +tempView); + +assertThat(sql("SHOW VIEWS IN %s", catalogName)) +.contains( +row(NAMESPACE.toString(), "prefixV2", false), +row(NAMESPACE.toString(), "prefixV3", false), +row(NAMESPACE.toString(), "v1", false), +tempView); Review Comment: I trust you that this is Spark's behavior. But this is weird because it isn't in the catalog. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Add support for describing/showing views [iceberg]
rdblue commented on code in PR #9513: URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471676989 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -1149,6 +1149,159 @@ public void createViewWithSubqueryExpressionUsingGlobalTempView() { "because it references to the temporary object global_temp.%s", globalTempView)); } + @Test + public void describeView() { +String viewName = "describeView"; + +sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, tableName); +assertThat(sql("DESCRIBE %s", viewName)) +.containsExactlyInAnyOrder(row("id", "int", ""), row("data", "string", "")); + } + + @Test + public void describeExtendedView() { +String viewName = "describeExtendedView"; +String sql = String.format("SELECT id, data FROM %s WHERE id <= 3", tableName); + +sql( +"CREATE VIEW %s (new_id COMMENT 'ID', new_data COMMENT 'DATA') COMMENT 'view comment' AS %s", +viewName, sql); +assertThat(sql("DESCRIBE EXTENDED %s", viewName)) +.contains( +row("new_id", "int", "ID"), +row("new_data", "string", "DATA"), +row("Comment", "view comment", ""), +row("View Text", sql, ""), +row("View Catalog and Namespace", String.format("%s.%s", catalogName, NAMESPACE), ""), +row("View Query Output Columns", "[id, data]", ""), +row( +"View Properties", +String.format( +"['format-version' = '1', 'location' = '/%s/%s', 'provider' = 'iceberg']", +NAMESPACE, viewName), +"")); + } + + @Test + public void showViewProperties() { +String viewName = "showViewProps"; + +sql( +"CREATE VIEW %s TBLPROPERTIES ('key1'='val1', 'key2'='val2') AS SELECT id, data FROM %s WHERE id <= 3", +viewName, tableName); +assertThat(sql("SHOW TBLPROPERTIES %s", viewName)) +.contains(row("key1", "val1"), row("key2", "val2")); + } + + @Test + public void showViewPropertiesByKey() { +String viewName = "showViewPropsByKey"; + +sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, tableName); +assertThat(sql("SHOW TBLPROPERTIES %s", viewName)).contains(row("provider", "iceberg")); + +assertThat(sql("SHOW TBLPROPERTIES %s (provider)", viewName)) +.contains(row("provider", "iceberg")); + +assertThat(sql("SHOW TBLPROPERTIES %s (non.existing)", viewName)) +.contains( +row( +"non.existing", +String.format( +"View %s.%s.%s does not have property: non.existing", +catalogName, NAMESPACE, viewName))); + } + + @Test + public void showViews() throws NoSuchTableException { +insertRows(6); +String sql = String.format("SELECT * from %s", tableName); +sql("CREATE VIEW v1 AS %s", sql); +sql("CREATE VIEW prefixV2 AS %s", sql); +sql("CREATE VIEW prefixV3 AS %s", sql); +sql("CREATE GLOBAL TEMPORARY VIEW globalViewForListing AS %s", sql); +sql("CREATE TEMPORARY VIEW tempViewForListing AS %s", sql); + +// spark stores temp views case-insensitive by default +Object[] tempView = row("", "tempviewforlisting", true); +assertThat(sql("SHOW VIEWS")) +.contains( +row(NAMESPACE.toString(), "prefixV2", false), +row(NAMESPACE.toString(), "prefixV3", false), +row(NAMESPACE.toString(), "v1", false), +tempView); + +assertThat(sql("SHOW VIEWS IN %s", catalogName)) +.contains( +row(NAMESPACE.toString(), "prefixV2", false), +row(NAMESPACE.toString(), "prefixV3", false), +row(NAMESPACE.toString(), "v1", false), +tempView); Review Comment: Also weird when filtering by NAMESPACE below. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Add support for describing/showing views [iceberg]
rdblue commented on code in PR #9513: URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471677391 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -1149,6 +1149,159 @@ public void createViewWithSubqueryExpressionUsingGlobalTempView() { "because it references to the temporary object global_temp.%s", globalTempView)); } + @Test + public void describeView() { +String viewName = "describeView"; + +sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, tableName); +assertThat(sql("DESCRIBE %s", viewName)) +.containsExactlyInAnyOrder(row("id", "int", ""), row("data", "string", "")); + } + + @Test + public void describeExtendedView() { +String viewName = "describeExtendedView"; +String sql = String.format("SELECT id, data FROM %s WHERE id <= 3", tableName); + +sql( +"CREATE VIEW %s (new_id COMMENT 'ID', new_data COMMENT 'DATA') COMMENT 'view comment' AS %s", +viewName, sql); +assertThat(sql("DESCRIBE EXTENDED %s", viewName)) +.contains( +row("new_id", "int", "ID"), +row("new_data", "string", "DATA"), +row("Comment", "view comment", ""), +row("View Text", sql, ""), +row("View Catalog and Namespace", String.format("%s.%s", catalogName, NAMESPACE), ""), +row("View Query Output Columns", "[id, data]", ""), +row( +"View Properties", +String.format( +"['format-version' = '1', 'location' = '/%s/%s', 'provider' = 'iceberg']", +NAMESPACE, viewName), +"")); + } + + @Test + public void showViewProperties() { +String viewName = "showViewProps"; + +sql( +"CREATE VIEW %s TBLPROPERTIES ('key1'='val1', 'key2'='val2') AS SELECT id, data FROM %s WHERE id <= 3", +viewName, tableName); +assertThat(sql("SHOW TBLPROPERTIES %s", viewName)) +.contains(row("key1", "val1"), row("key2", "val2")); + } + + @Test + public void showViewPropertiesByKey() { +String viewName = "showViewPropsByKey"; + +sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, tableName); +assertThat(sql("SHOW TBLPROPERTIES %s", viewName)).contains(row("provider", "iceberg")); + +assertThat(sql("SHOW TBLPROPERTIES %s (provider)", viewName)) +.contains(row("provider", "iceberg")); + +assertThat(sql("SHOW TBLPROPERTIES %s (non.existing)", viewName)) +.contains( +row( +"non.existing", +String.format( +"View %s.%s.%s does not have property: non.existing", +catalogName, NAMESPACE, viewName))); + } + + @Test + public void showViews() throws NoSuchTableException { +insertRows(6); +String sql = String.format("SELECT * from %s", tableName); +sql("CREATE VIEW v1 AS %s", sql); +sql("CREATE VIEW prefixV2 AS %s", sql); +sql("CREATE VIEW prefixV3 AS %s", sql); +sql("CREATE GLOBAL TEMPORARY VIEW globalViewForListing AS %s", sql); +sql("CREATE TEMPORARY VIEW tempViewForListing AS %s", sql); + +// spark stores temp views case-insensitive by default +Object[] tempView = row("", "tempviewforlisting", true); +assertThat(sql("SHOW VIEWS")) +.contains( +row(NAMESPACE.toString(), "prefixV2", false), +row(NAMESPACE.toString(), "prefixV3", false), +row(NAMESPACE.toString(), "v1", false), +tempView); + +assertThat(sql("SHOW VIEWS IN %s", catalogName)) +.contains( +row(NAMESPACE.toString(), "prefixV2", false), +row(NAMESPACE.toString(), "prefixV3", false), +row(NAMESPACE.toString(), "v1", false), +tempView); + +assertThat(sql("SHOW VIEWS IN %s.%s", catalogName, "default")) Review Comment: Instead of "default", should this use `NAMESPACE.toString()`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Support compaction [iceberg-python]
HonahX commented on issue #270: URL: https://github.com/apache/iceberg-python/issues/270#issuecomment-1917560747 I am interested in taking this if no one has started working on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Add support for describing/showing views [iceberg]
rdblue commented on code in PR #9513: URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471678490 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -1149,6 +1149,159 @@ public void createViewWithSubqueryExpressionUsingGlobalTempView() { "because it references to the temporary object global_temp.%s", globalTempView)); } + @Test + public void describeView() { +String viewName = "describeView"; + +sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, tableName); +assertThat(sql("DESCRIBE %s", viewName)) +.containsExactlyInAnyOrder(row("id", "int", ""), row("data", "string", "")); + } + + @Test + public void describeExtendedView() { +String viewName = "describeExtendedView"; +String sql = String.format("SELECT id, data FROM %s WHERE id <= 3", tableName); + +sql( +"CREATE VIEW %s (new_id COMMENT 'ID', new_data COMMENT 'DATA') COMMENT 'view comment' AS %s", +viewName, sql); +assertThat(sql("DESCRIBE EXTENDED %s", viewName)) +.contains( +row("new_id", "int", "ID"), +row("new_data", "string", "DATA"), +row("Comment", "view comment", ""), +row("View Text", sql, ""), +row("View Catalog and Namespace", String.format("%s.%s", catalogName, NAMESPACE), ""), +row("View Query Output Columns", "[id, data]", ""), +row( +"View Properties", +String.format( +"['format-version' = '1', 'location' = '/%s/%s', 'provider' = 'iceberg']", +NAMESPACE, viewName), +"")); + } + + @Test + public void showViewProperties() { +String viewName = "showViewProps"; + +sql( +"CREATE VIEW %s TBLPROPERTIES ('key1'='val1', 'key2'='val2') AS SELECT id, data FROM %s WHERE id <= 3", +viewName, tableName); +assertThat(sql("SHOW TBLPROPERTIES %s", viewName)) +.contains(row("key1", "val1"), row("key2", "val2")); + } + + @Test + public void showViewPropertiesByKey() { +String viewName = "showViewPropsByKey"; + +sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, tableName); +assertThat(sql("SHOW TBLPROPERTIES %s", viewName)).contains(row("provider", "iceberg")); + +assertThat(sql("SHOW TBLPROPERTIES %s (provider)", viewName)) +.contains(row("provider", "iceberg")); + +assertThat(sql("SHOW TBLPROPERTIES %s (non.existing)", viewName)) +.contains( +row( +"non.existing", +String.format( +"View %s.%s.%s does not have property: non.existing", +catalogName, NAMESPACE, viewName))); + } + + @Test + public void showViews() throws NoSuchTableException { +insertRows(6); +String sql = String.format("SELECT * from %s", tableName); +sql("CREATE VIEW v1 AS %s", sql); +sql("CREATE VIEW prefixV2 AS %s", sql); +sql("CREATE VIEW prefixV3 AS %s", sql); +sql("CREATE GLOBAL TEMPORARY VIEW globalViewForListing AS %s", sql); +sql("CREATE TEMPORARY VIEW tempViewForListing AS %s", sql); + +// spark stores temp views case-insensitive by default +Object[] tempView = row("", "tempviewforlisting", true); +assertThat(sql("SHOW VIEWS")) +.contains( +row(NAMESPACE.toString(), "prefixV2", false), +row(NAMESPACE.toString(), "prefixV3", false), +row(NAMESPACE.toString(), "v1", false), +tempView); + +assertThat(sql("SHOW VIEWS IN %s", catalogName)) +.contains( +row(NAMESPACE.toString(), "prefixV2", false), +row(NAMESPACE.toString(), "prefixV3", false), +row(NAMESPACE.toString(), "v1", false), +tempView); + +assertThat(sql("SHOW VIEWS IN %s.%s", catalogName, "default")) +.contains( +row(NAMESPACE.toString(), "prefixV2", false), +row(NAMESPACE.toString(), "prefixV3", false), +row(NAMESPACE.toString(), "v1", false), +tempView); + +assertThat(sql("SHOW VIEWS LIKE 'pref*'")) +.contains( +row(NAMESPACE.toString(), "prefixV2", false), +row(NAMESPACE.toString(), "prefixV3", false)); + +assertThat(sql("SHOW VIEWS LIKE 'non-existing'")).isEmpty(); + +assertThat(sql("SHOW VIEWS IN spark_catalog.default")).contains(tempView); + +assertThat(sql("SHOW VIEWS IN global_temp")) +.contains( +// spark stores temp views case-insensitive by default +row("global_temp", "globalviewforlisting", true), tempView); + } + + @Test + public void showCreateSimpleView() { +String viewName = "showCreateSimpleView"; +String sql = String.format("SELECT id, data FROM %s WHERE id <= 3", tableName); + +sql("CREATE VIEW %s AS %s", viewName, sql); + +String expected = +
Re: [PR] Spark: Add support for describing/showing views [iceberg]
rdblue commented on code in PR #9513: URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471683079 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowCreateV2ViewExec.scala: ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.util.escapeSingleQuotedString +import org.apache.spark.sql.connector.catalog.View +import org.apache.spark.sql.connector.catalog.ViewCatalog +import org.apache.spark.sql.execution.LeafExecNode +import scala.collection.JavaConverters._ + +case class ShowCreateV2ViewExec(output: Seq[Attribute], view: View) + extends V2CommandExec with LeafExecNode { + + override protected def run(): Seq[InternalRow] = { +val builder = new StringBuilder +builder ++= s"CREATE VIEW ${view.name} " +showColumns(view, builder) +showComment(view, builder) +showProperties(view, builder) +builder ++= s"AS\n${view.query}\n" + +Seq(toCatalystRow(builder.toString)) + } + + private def showColumns(view: View, builder: StringBuilder): Unit = { +val columns = view.schema().fields Review Comment: I think this is fine for now, but we should consider how to handle the aliases in the future -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Add support for describing/showing views [iceberg]
rdblue commented on PR #9513: URL: https://github.com/apache/iceberg/pull/9513#issuecomment-1917565679 I pointed out a few things that I think I would change, but there is nothing blocking this. +1 overall. Please fix what you agree with (including the `SHOW VIEWS` behavior with temp views) and merge when you're ready! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support altering view properties [iceberg]
rdblue commented on code in PR #9582: URL: https://github.com/apache/iceberg/pull/9582#discussion_r1471690098 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckViews.scala: ## @@ -36,6 +38,13 @@ object CheckViews extends (LogicalPlan => Unit) { verifyColumnCount(ident, columnAliases, query) SchemaUtils.checkColumnNameDuplication(query.schema.fieldNames, SQLConf.get.resolver) + case UnsetViewProperties(ResolvedV2View(catalog, ident), propertyKeys, ifExists) => Review Comment: Check rules are intended to ensure that SQL is correct and valid, not to check the execution. If you have this check here, it would fail when you run `EXPLAIN ALTER VIEW ...` because this is doing execution work (checking for the property) in the planner. Another way to think about this is that this isn't an analysis failure, it is a runtime failure due to data. The SQL is perfectly valid. This should either be done in `AlterV2ViewExec` or not at all. I would lean toward not doing this check at all. Shouldn't this be idempotent? Is there value in failing to remove a property if it isn't there? I don't see value so I'd ignore this and let the catalog throw an exception if it chooses to. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support altering view properties [iceberg]
rdblue commented on code in PR #9582: URL: https://github.com/apache/iceberg/pull/9582#discussion_r1471691985 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala: ## @@ -21,19 +21,21 @@ package org.apache.spark.sql.catalyst.analysis import org.apache.spark.sql.AnalysisException import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.analysis.ViewUtil.IcebergViewHelper +import org.apache.spark.sql.catalyst.analysis.ViewUtil.isViewCatalog Review Comment: Let's also avoid static imports in Scala. Can you import `ViewUtil` and call these method from there? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Add example of using PyIceberg with minimal external dependencies [iceberg-python]
kevinjqliu opened a new issue, #326: URL: https://github.com/apache/iceberg-python/issues/326 ### Feature Request / Improvement With #299 resolved, let's add an example to show working with Iceberg using minimal external dependencies. Get started with Iceberg using the local filesystem and a simple catalog. `Local Filesystem` + `SqlCatalog` ## Catalog config ``` from pyiceberg.catalog.sql import SqlCatalog catalog_path = "/tmp/warehouse/pyiceberg.db" warehouse_path = "/tmp/warehouse/" catalog = SqlCatalog("default", **{ "uri": f"sqlite:///{catalog_path}", "warehouse": f"file://{warehouse_path}", }) ``` Note that, the `/tmp/warehouse/` folder needs to exist first. ## Create a new table and write to it https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md#write-support ## Query the table with `pyiceberg` `~/.pyiceberg.yaml` file: ``` catalog: default: uri: sqlite:tmp/warehouse/pyiceberg.db warehouse: file:///tmp/warehouse/ ``` ## Run `pyiceberg` CLI `pyiceberg list` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Add partition evolution [iceberg-python]
syun64 commented on issue #193: URL: https://github.com/apache/iceberg-python/issues/193#issuecomment-1917576022 @Fokko Should we move this to 0.7.0 release Milestone? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support altering view properties [iceberg]
rdblue commented on code in PR #9582: URL: https://github.com/apache/iceberg/pull/9582#discussion_r1471693576 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala: ## @@ -60,17 +62,18 @@ case class RewriteViewCommands(spark: SparkSession) extends Rule[LogicalPlan] wi properties = properties, allowExisting = allowExisting, replace = replace) + +case u@UnresolvedView(ResolvedView(resolved), _, _, _) => Review Comment: Why does this need to be done here instead of in `ResolveViews`? This is just resolving the view. I think it's probably that we have to resolve early to avoid an `AnalysisException`, but it would be great to clarify with a comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Use `.read_bytes()` instead [iceberg-python]
HonahX merged PR #325: URL: https://github.com/apache/iceberg-python/pull/325 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Throw exception on `ALTER VIEW AS ` [iceberg]
nastra commented on code in PR #9510: URL: https://github.com/apache/iceberg/pull/9510#discussion_r1471696587 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckViews.scala: ## @@ -36,6 +38,9 @@ object CheckViews extends (LogicalPlan => Unit) { verifyColumnCount(ident, columnAliases, query) SchemaUtils.checkColumnNameDuplication(query.schema.fieldNames, SQLConf.get.resolver) +case AlterViewAs(ResolvedV2View(_, _, _), _, query) => + SchemaUtils.checkColumnNameDuplication(query.schema.fieldNames, SQLConf.get.resolver) Review Comment: To summarize this topic, we discussed this offline and decided that we should not support `ALTER VIEW ... AS` for now and rather throw an exception (indicating to use `CREATE OR REPLACE VIEW`), as it leads to surprising behaviors that column aliases/comments are lost after updating the underlying SQL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support altering view properties [iceberg]
rdblue commented on code in PR #9582: URL: https://github.com/apache/iceberg/pull/9582#discussion_r1471696609 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckViews.scala: ## @@ -36,6 +38,13 @@ object CheckViews extends (LogicalPlan => Unit) { verifyColumnCount(ident, columnAliases, query) SchemaUtils.checkColumnNameDuplication(query.schema.fieldNames, SQLConf.get.resolver) + case UnsetViewProperties(ResolvedV2View(catalog, ident), propertyKeys, ifExists) => Review Comment: Okay, I missed the `ifExists` part. looks like this behavior is required by Spark. In that case, this should be moved to the Exec plan. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support altering view properties [iceberg]
rdblue commented on code in PR #9582: URL: https://github.com/apache/iceberg/pull/9582#discussion_r1471699613 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java: ## @@ -606,6 +607,29 @@ public View createView( @Override public View alterView(Identifier ident, ViewChange... changes) throws NoSuchViewException, IllegalArgumentException { +if (null != asViewCatalog) { + try { +org.apache.iceberg.view.View view = asViewCatalog.loadView(buildIdentifier(ident)); +UpdateViewProperties updateViewProperties = view.updateProperties(); Review Comment: Can you also reject reserved properties? I don't think that you should be able to set `format-version`, `location`, or `provider`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support altering view properties [iceberg]
nastra commented on code in PR #9582: URL: https://github.com/apache/iceberg/pull/9582#discussion_r1471703651 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala: ## @@ -60,17 +62,18 @@ case class RewriteViewCommands(spark: SparkSession) extends Rule[LogicalPlan] wi properties = properties, allowExisting = allowExisting, replace = replace) + +case u@UnresolvedView(ResolvedView(resolved), _, _, _) => Review Comment: yes this needs to be done early because otherwise the Analyzer will resolve views and fail (since v2 views aren't supported). I'll add a comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Rewrite identifier when using Subquery expressions in View [iceberg]
rdblue commented on PR #9587: URL: https://github.com/apache/iceberg/pull/9587#issuecomment-1917587797 Thanks, @nastra! Good to have this fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Support altering view properties [iceberg]
nastra commented on code in PR #9582: URL: https://github.com/apache/iceberg/pull/9582#discussion_r1471703651 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala: ## @@ -60,17 +62,18 @@ case class RewriteViewCommands(spark: SparkSession) extends Rule[LogicalPlan] wi properties = properties, allowExisting = allowExisting, replace = replace) + +case u@UnresolvedView(ResolvedView(resolved), _, _, _) => Review Comment: this needs to be done early because otherwise the Analyzer will resolve views and fail (since v2 views aren't supported). I'll add a comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Rewrite identifier when using Subquery expressions in View [iceberg]
rdblue merged PR #9587: URL: https://github.com/apache/iceberg/pull/9587 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump software.amazon.s3.accessgrants:aws-s3-accessgrants-java-plugin from 1.0.1 to 2.0.0 [iceberg]
jackye1995 merged PR #9575: URL: https://github.com/apache/iceberg/pull/9575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark 3.5: Fix flaky TestSparkExecutorCache [iceberg]
aokolnychyi commented on PR #9583: URL: https://github.com/apache/iceberg/pull/9583#issuecomment-1917593034 @amogh-jahagirdar @szehon-ho @nastra, could you check this one? I had 5 successful CI runs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump software.amazon.awssdk:bom from 2.23.2 to 2.23.12 [iceberg]
jackye1995 merged PR #9573: URL: https://github.com/apache/iceberg/pull/9573 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] [Proposal] Iceberg Materialized View Spec [iceberg]
szehon-ho commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1917629464 Great, this all makes sense to me except following points: > 3. Question: Most people seem to be for Option 1 As we clarified in the google doc, it seems at least locally we are moving to Option 2: https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?pli=1&disco=AAABFByy0mU > 3. As can be seen in the last point of https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?pli=1#heading=h.m5kli4l5q7ui, we agreed to leave the refresh strategy to the query engine. If available, an incremental refresh is always preferable to a full refresh. Did we consider making this property configurable? We should at least store what the materialized view refresh strategy was used in the snapshot for debugging? We can discuss on the google doc. So Im open to either, we can make a PR, or fill out the Specification Draft part of the doc ? The main barrier right now to more folks reviewing is that the doc is just a list of options and long threads that take forever to see where the consensus is, it took me quite some time to follow. Having a central section of what concrete additions we are proposing will greatly streamline the review experience. Feel free to take my summarized spec changes in my previous comment as a starting point. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Create Iceberg Table from pyarrow Schema with no IDs [iceberg-python]
syun64 closed issue #278: Create Iceberg Table from pyarrow Schema with no IDs URL: https://github.com/apache/iceberg-python/issues/278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Throw exception on `ALTER VIEW AS ` [iceberg]
rdblue commented on code in PR #9510: URL: https://github.com/apache/iceberg/pull/9510#discussion_r1471786657 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala: ## @@ -60,6 +62,9 @@ case class RewriteViewCommands(spark: SparkSession) extends Rule[LogicalPlan] wi properties = properties, allowExisting = allowExisting, replace = replace) + +case AlterViewAs(ResolvedV2View(_, _), _, _) => + throw new AnalysisException("ALTER VIEW AS is not supported. Use CREATE OR REPLACE VIEW instead") Review Comment: Does this need to happen here instead of in `CheckViews` so that Spark doesn't throw a different analysis exception? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] REST: Set fresh IDs on create-table [iceberg-python]
Fokko opened a new pull request, #327: URL: https://github.com/apache/iceberg-python/pull/327 For the other implementation we do this when writing the metadata, but for the REST catalog we just post the schema itself. The REST spec does not mention anything about setting fresh IDs, so it is best to do it ourselves. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Remove ASF publish property [iceberg-docs]
rdblue merged PR #308: URL: https://github.com/apache/iceberg-docs/pull/308 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Move nightly versioned docs to top-level docs directory [iceberg]
rdblue commented on PR #9578: URL: https://github.com/apache/iceberg/pull/9578#issuecomment-1917693542 Thanks for reverting this, @Fokko! I think we need to make sure that the docs don't move so that we have history and we can validate the changes from merging nightly into these docs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] New docs switch [iceberg]
rdblue merged PR #9520: URL: https://github.com/apache/iceberg/pull/9520 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Problems in release doc about versioned documentation [iceberg]
rdblue closed issue #8151: Problems in release doc about versioned documentation URL: https://github.com/apache/iceberg/issues/8151 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] REST: Set fresh IDs on create-table [iceberg-python]
rdblue commented on code in PR #327: URL: https://github.com/apache/iceberg-python/pull/327#discussion_r1471795199 ## pyiceberg/catalog/rest.py: ## @@ -447,13 +447,14 @@ def create_table( sort_order: SortOrder = UNSORTED_SORT_ORDER, properties: Properties = EMPTY_DICT, ) -> Table: -schema: Schema = self._convert_schema_if_needed(schema) # type: ignore +iceberg_schema = self._convert_schema_if_needed(schema) +iceberg_schema = assign_fresh_schema_ids(iceberg_schema) namespace_and_table = self._split_identifier_for_path(identifier) request = CreateTableRequest( name=namespace_and_table["table"], location=location, -table_schema=schema, +table_schema=iceberg_schema, Review Comment: Isn't this done on the server side? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org