Re: [PR] Spark 3.5: Fix flaky TestSparkExecutorCache [iceberg]

2024-01-30 Thread via GitHub


aokolnychyi closed pull request #9583: Spark 3.5: Fix flaky 
TestSparkExecutorCache
URL: https://github.com/apache/iceberg/pull/9583


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Partition Evolution [iceberg-python]

2024-01-30 Thread via GitHub


HonahX commented on code in PR #245:
URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1469034571


##
pyiceberg/table/__init__.py:
##
@@ -533,6 +551,39 @@ def _(update: SetCurrentSchemaUpdate, base_metadata: 
TableMetadata, context: _Ta
 return base_metadata.model_copy(update={"current_schema_id": 
new_schema_id})
 
 
+@_apply_table_update.register(AddPartitionSpecUpdate)
+def _(update: AddPartitionSpecUpdate, base_metadata: TableMetadata, context: 
_TableMetadataUpdateContext) -> TableMetadata:
+for spec in base_metadata.partition_specs:
+if spec.spec_id == update.spec_id:
+raise ValueError(f"Partition spec with id {spec.spec_id} already 
exists: {spec}")
+
+context.add_update(update)
+return base_metadata.model_copy(
+update={
+"partition_specs": base_metadata.partition_specs + [update.spec],
+}
+)
+
+
+@_apply_table_update.register(SetDefaultSpecUpdate)
+def _(update: SetDefaultSpecUpdate, base_metadata: TableMetadata, context: 
_TableMetadataUpdateContext) -> TableMetadata:
+new_spec_id = update.spec_id
+if new_spec_id == base_metadata.default_spec_id:

Review Comment:
   Shall we add the support for `-1` spec_id which represents the last spec 
added in this transaction?
   
   
   
https://github.com/apache/iceberg/blob/6852278d6cf01bec2998be954d7bd6f6cc37cc94/open-api/rest-catalog-open-api.yaml#L2242



##
pyiceberg/table/__init__.py:
##
@@ -2271,3 +2325,240 @@ def commit(self) -> Snapshot:
 )
 
 return snapshot
+
+
+class UpdateSpec:
+_table: Table
+_name_to_field: Dict[str, PartitionField] = {}
+_name_to_added_field: Dict[str, PartitionField] = {}
+_transform_to_field: Dict[Tuple[int, str], PartitionField] = {}
+_transform_to_added_field: Dict[Tuple[int, str], PartitionField] = {}
+_renames: Dict[str, str] = {}
+_added_time_fields: Dict[int, PartitionField] = {}
+_case_sensitive: bool
+_adds: List[PartitionField]
+_deletes: Set[int]
+_last_assigned_partition_id: int
+_transaction: Optional[Transaction]
+_unassigned_field_name = 'unassigned_field_name'

Review Comment:
   Thanks for the analysis! I agree that we should keep the validation. I was 
thinking a little bit of refactoring may make things look better. I've left a 
comment at the place where we provide the name for the new field. Please let me 
know what you think.



##
pyiceberg/table/__init__.py:
##
@@ -2271,3 +2325,243 @@ def commit(self) -> Snapshot:
 )
 
 return snapshot
+
+
+class UpdateSpec:
+_table: Table
+_name_to_field: Dict[str, PartitionField] = {}
+_name_to_added_field: Dict[str, PartitionField] = {}
+_transform_to_field: Dict[Tuple[int, str], PartitionField] = {}
+_transform_to_added_field: Dict[Tuple[int, str], PartitionField] = {}
+_renames: Dict[str, str] = {}
+_added_time_fields: Dict[int, PartitionField] = {}
+_case_sensitive: bool
+_adds: List[PartitionField]
+_deletes: Set[int]
+_last_assigned_partition_id: int
+_transaction: Optional[Transaction]
+_unassigned_field_name = 'unassigned_field_name'
+
+def __init__(self, table: Table, transaction: Optional[Transaction] = 
None, case_sensitive: bool = True) -> None:
+self._table = table
+self._name_to_field = {field.name: field for field in 
table.spec().fields}
+self._name_to_added_field = {}
+self._transform_to_field = {(field.source_id, repr(field.transform)): 
field for field in table.spec().fields}
+self._transform_to_added_field = {}
+self._adds = []
+self._deletes = set()
+if len(table.specs()) == 1:
+self._last_assigned_partition_id = PARTITION_FIELD_ID_START - 1
+else:
+self._last_assigned_partition_id = 
table.spec().last_assigned_field_id
+self._renames = {}
+self._transaction = transaction
+self._case_sensitive = case_sensitive
+self._added_time_fields = {}
+
+def add_field(
+self,
+source_column_name: str,
+transform: Transform[Any, Any],
+partition_field_name: Optional[str] = _unassigned_field_name,
+) -> UpdateSpec:
+ref = Reference(source_column_name)
+bound_ref = ref.bind(self._table.schema(), self._case_sensitive)
+# verify transform can actually bind it
+output_type = bound_ref.field.field_type
+if not transform.can_transform(output_type):
+raise ValueError(f"{transform} cannot transform {output_type} 
values from {bound_ref.field.name}")
+
+transform_key = (bound_ref.field.field_id, repr(transform))
+existing_partition_field = self._transform_to_field.get(transform_key)
+if existing_partition_field and 
self._is_duplicate_partition(transform, existing_partition_field):
+raise ValueError(f"Duplicate partition field f

Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-30 Thread via GitHub


pvary merged PR #9524:
URL: https://github.com/apache/iceberg/pull/9524


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Flink: Implement enumerator metrics for pending splits, pending recor… [iceberg]

2024-01-30 Thread via GitHub


pvary commented on PR #9524:
URL: https://github.com/apache/iceberg/pull/9524#issuecomment-1916303375

   @mas-chen: Please backport this to 1.17, 1.18 and continue with the Junit5 
migration PR.
   Thanks @mas-chen for the PR and @stevenzwu for the review!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Docs: Enhance Java quickstart example [iceberg]

2024-01-30 Thread via GitHub


manuzhang opened a new pull request, #9585:
URL: https://github.com/apache/iceberg/pull/9585

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Cannot drop table with AWS Glue Catalog [iceberg]

2024-01-30 Thread via GitHub


duwanqiebi opened a new issue, #9586:
URL: https://github.com/apache/iceberg/issues/9586

   ### Apache Iceberg version
   
   None
   
   ### Query engine
   
   None
   
   ### Please describe the bug 🐞
   
   - Conf:
   - "spark.sql.catalog.spark_catalog": 
"org.apache.iceberg.spark.SparkSessionCatalog"
   - "spark.sql.catalog.spark_catalog.catalog-impl": 
"org.apache.iceberg.aws.glue.GlueCatalog"
   - "spark.sql.catalog.spark_catalog.io-impl": 
"org.apache.iceberg.aws.s3.S3FileIO"
   - "spark.sql.extensions": 
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"
   
   Create Table, write, read is OK.
   But I get the following error when I drop table:
   ```
   spark.sql("drop table $db.test_iceberg_0 purge")
   
   
   pyspark.errors.exceptions.captured.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
test_iceberg_0. StorageDescriptor#InputFormat cannot be null for table: 
test_iceberg_0 (Service: null; Status Code: 0; Error Code: null; Request ID: 
null; Proxy: null)
   ```
  
   ### 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Add an option not to delete files in ExpireSnapshots [iceberg]

2024-01-30 Thread via GitHub


manuzhang closed pull request #9584: Spark 3.5: Add an option not to delete 
files in ExpireSnapshots
URL: https://github.com/apache/iceberg/pull/9584


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] feat: Add user guide for website. [iceberg-rust]

2024-01-30 Thread via GitHub


liurenjie1024 opened a new pull request, #178:
URL: https://github.com/apache/iceberg-rust/pull/178

   This pr add user guide for website.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: Add user guide for website. [iceberg-rust]

2024-01-30 Thread via GitHub


liurenjie1024 commented on PR #178:
URL: https://github.com/apache/iceberg-rust/pull/178#issuecomment-1916332649

   cc @Xuanwo @ZENOTME @Fokko PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: add parquet writer [iceberg-rust]

2024-01-30 Thread via GitHub


liurenjie1024 commented on code in PR #176:
URL: https://github.com/apache/iceberg-rust/pull/176#discussion_r1470792480


##
crates/iceberg/src/spec/manifest.rs:
##
@@ -924,7 +924,7 @@ impl TryFrom for ManifestStatus {
 }
 
 /// Data file carries data file path, partition tuple, metrics, …
-#[derive(Debug, PartialEq, Clone, Eq, TypedBuilder)]
+#[derive(Debug, PartialEq, Clone, Eq, Builder)]

Review Comment:
   Oh, I see, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Cannot drop table with AWS Glue Catalog [iceberg]

2024-01-30 Thread via GitHub


nastra commented on issue #9586:
URL: https://github.com/apache/iceberg/issues/9586#issuecomment-1916390985

   Might be similar to https://github.com/apache/iceberg/issues/5565, I wonder 
if you're missing the catalog name in the table identifier


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Remove `unwrap()` in `ManifestListWriter.close()` [iceberg-rust]

2024-01-30 Thread via GitHub


liurenjie1024 commented on issue #177:
URL: https://github.com/apache/iceberg-rust/issues/177#issuecomment-1916433902

   Thanks for reporting!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] feat: add support for catalogs with glue implementation to start [iceberg-go]

2024-01-30 Thread via GitHub


wolfeidau commented on PR #51:
URL: https://github.com/apache/iceberg-go/pull/51#issuecomment-1916574894

   @zeroshade done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Python: Introduce setters [iceberg-python]

2024-01-30 Thread via GitHub


Fokko closed pull request #304: Python: Introduce setters
URL: https://github.com/apache/iceberg-python/pull/304


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Rewrite identifier when using Subquery expressions in View [iceberg]

2024-01-30 Thread via GitHub


nastra commented on code in PR #9587:
URL: https://github.com/apache/iceberg/pull/9587#discussion_r1471120727


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -877,6 +877,9 @@ public void dropV1View() {
 v1SessionCatalog()
 .tableExists(new 
org.apache.spark.sql.catalyst.TableIdentifier(v1View)))
 .isFalse();
+
+sql("USE spark_catalog");
+sql("DROP TABLE IF EXISTS %s", tableName);

Review Comment:
   the existence of that table in `spark_catalog` would not cause the expected 
failures in the newly added tests when doing a read against the table



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Use `.read_bytes()` instead [iceberg-python]

2024-01-30 Thread via GitHub


Fokko opened a new pull request, #325:
URL: https://github.com/apache/iceberg-python/pull/325

   Noticed this while profiling. This simplifies the code a bit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Performance optimization [iceberg]

2024-01-30 Thread via GitHub


Fokko opened a new issue, #9588:
URL: https://github.com/apache/iceberg/issues/9588

   ### Apache Iceberg version
   
   None
   
   ### Query engine
   
   None
   
   ### Please describe the bug 🐞
   
   Seeing if there is anything we can do to improve performance:
   
   
   
![profile](https://github.com/apache/iceberg/assets/1134248/e562ebe9-6471-487e-87db-b7f730abfdcd)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Performance optimization [iceberg]

2024-01-30 Thread via GitHub


liurenjie1024 commented on issue #9588:
URL: https://github.com/apache/iceberg/issues/9588#issuecomment-1916860122

   Seems a lot of time are spent on decompression.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Performance optimization [iceberg]

2024-01-30 Thread via GitHub


Fokko commented on issue #9588:
URL: https://github.com/apache/iceberg/issues/9588#issuecomment-1916881774

   s3fs:
   
![output](https://github.com/apache/iceberg/assets/1134248/4332cd61-d280-48b5-88de-30574fd728fc)
   
   PyArrow:
   
![output](https://github.com/apache/iceberg/assets/1134248/bb6a73bd-cd63-4dcd-b319-5ba180335883)
   
   Also, the PyArrow import is very slow :(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] InMemory Catalog Implementation [iceberg-python]

2024-01-30 Thread via GitHub


syun64 commented on code in PR #289:
URL: https://github.com/apache/iceberg-python/pull/289#discussion_r1471348272


##
tests/catalog/test_base.py:
##


Review Comment:
   This is a _super-nit_ (please feel free to take it or leave it): should we 
rename this test module to **test_in_memory.py** to honor the fact that 
InMemory Catalog is now a proper catalog implementation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Support deletion in Apache Flink [iceberg]

2024-01-30 Thread via GitHub


VidakM commented on issue #8718:
URL: https://github.com/apache/iceberg/issues/8718#issuecomment-1917071229

   Is `DELTE FROM` still on the roadmap for Flink or could we help out in some 
way?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Add an option not to delete files in ExpireSnapshots [iceberg]

2024-01-30 Thread via GitHub


nastra commented on code in PR #9584:
URL: https://github.com/apache/iceberg/pull/9584#discussion_r1471390452


##
api/src/main/java/org/apache/iceberg/ExpireSnapshots.java:
##
@@ -118,4 +118,9 @@ public interface ExpireSnapshots extends 
PendingUpdate> {
* @return this for method chaining
*/
   ExpireSnapshots cleanExpiredFiles(boolean clean);
+
+  /** Returns number of expired snapshots */
+  default long expiredSnapshotsCount() {
+return 0;

Review Comment:
   ```suggestion
   return 0L;
   ```



##
.palantir/revapi.yml:
##
@@ -873,6 +873,13 @@ acceptedBreaks:
   new: "method void org.apache.iceberg.encryption.Ciphers::()"
   justification: "Static utility class - should not have public 
constructor"
   "1.4.0":
+org.apache.iceberg:iceberg-api:

Review Comment:
   this shouldn't be necessary



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Add an option not to delete files in ExpireSnapshots [iceberg]

2024-01-30 Thread via GitHub


nastra commented on code in PR #9584:
URL: https://github.com/apache/iceberg/pull/9584#discussion_r1471391266


##
api/src/main/java/org/apache/iceberg/actions/ExpireSnapshots.java:
##
@@ -99,6 +99,11 @@ public interface ExpireSnapshots extends 
Action

Re: [PR] Spark 3.5: Add an option not to delete files in ExpireSnapshots [iceberg]

2024-01-30 Thread via GitHub


nastra commented on code in PR #9584:
URL: https://github.com/apache/iceberg/pull/9584#discussion_r1471397337


##
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##
@@ -85,6 +85,8 @@ public void accept(String file) {
   private ExecutorService planExecutorService = ThreadPools.getWorkerPool();
   private Boolean incrementalCleanup;
 
+  private List expiredSnapshots;

Review Comment:
   I think it would be better to keep a Set of snapshot IDs, because removing 
from a list is O(n)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Add an option not to delete files in ExpireSnapshots [iceberg]

2024-01-30 Thread via GitHub


nastra commented on code in PR #9584:
URL: https://github.com/apache/iceberg/pull/9584#discussion_r1471399965


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestExpireSnapshotsProcedure.java:
##
@@ -541,6 +541,29 @@ public void 
testExpireSnapshotsWithPartitionStatisticFiles() {
 .exists();
   }
 
+  @Test
+  public void testExpireSnapshotsNotDeletingFiles() {
+sql("CREATE TABLE %s (id bigint NOT NULL, data string) USING iceberg", 
tableName);
+
+sql("INSERT INTO TABLE %s VALUES (1, 'a')", tableName);
+sql("INSERT INTO TABLE %s VALUES (2, 'b')", tableName);
+
+Table table = validationCatalog.loadTable(tableIdent);
+
+Assert.assertEquals("Should be 2 snapshots", 2, 
Iterables.size(table.snapshots()));

Review Comment:
   ```suggestion
   assertThat(table.snapshots()).hasSize(2);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Add an option not to delete files in ExpireSnapshots [iceberg]

2024-01-30 Thread via GitHub


nastra commented on code in PR #9584:
URL: https://github.com/apache/iceberg/pull/9584#discussion_r1471405013


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/ExpireSnapshotsSparkAction.java:
##
@@ -67,6 +67,8 @@ public class ExpireSnapshotsSparkAction extends 
BaseSparkAction

Re: [PR] Partition Evolution [iceberg-python]

2024-01-30 Thread via GitHub


amogh-jahagirdar commented on code in PR #245:
URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1471453813


##
pyiceberg/table/__init__.py:
##
@@ -2271,3 +2325,243 @@ def commit(self) -> Snapshot:
 )
 
 return snapshot
+
+
+class UpdateSpec:
+_table: Table
+_name_to_field: Dict[str, PartitionField] = {}
+_name_to_added_field: Dict[str, PartitionField] = {}
+_transform_to_field: Dict[Tuple[int, str], PartitionField] = {}
+_transform_to_added_field: Dict[Tuple[int, str], PartitionField] = {}
+_renames: Dict[str, str] = {}
+_added_time_fields: Dict[int, PartitionField] = {}
+_case_sensitive: bool
+_adds: List[PartitionField]
+_deletes: Set[int]
+_last_assigned_partition_id: int
+_transaction: Optional[Transaction]
+_unassigned_field_name = 'unassigned_field_name'
+
+def __init__(self, table: Table, transaction: Optional[Transaction] = 
None, case_sensitive: bool = True) -> None:
+self._table = table
+self._name_to_field = {field.name: field for field in 
table.spec().fields}
+self._name_to_added_field = {}
+self._transform_to_field = {(field.source_id, repr(field.transform)): 
field for field in table.spec().fields}
+self._transform_to_added_field = {}
+self._adds = []
+self._deletes = set()
+if len(table.specs()) == 1:
+self._last_assigned_partition_id = PARTITION_FIELD_ID_START - 1
+else:
+self._last_assigned_partition_id = 
table.spec().last_assigned_field_id
+self._renames = {}
+self._transaction = transaction
+self._case_sensitive = case_sensitive
+self._added_time_fields = {}
+
+def add_field(
+self,
+source_column_name: str,
+transform: Transform[Any, Any],
+partition_field_name: Optional[str] = _unassigned_field_name,
+) -> UpdateSpec:
+ref = Reference(source_column_name)
+bound_ref = ref.bind(self._table.schema(), self._case_sensitive)
+# verify transform can actually bind it
+output_type = bound_ref.field.field_type
+if not transform.can_transform(output_type):
+raise ValueError(f"{transform} cannot transform {output_type} 
values from {bound_ref.field.name}")
+
+transform_key = (bound_ref.field.field_id, repr(transform))
+existing_partition_field = self._transform_to_field.get(transform_key)
+if existing_partition_field and 
self._is_duplicate_partition(transform, existing_partition_field):
+raise ValueError(f"Duplicate partition field for 
${ref.name}=${ref}, ${existing_partition_field} already exists")
+
+added = self._transform_to_added_field.get(transform_key)
+if added:
+raise ValueError(f"Already added partition: {added.name}")
+
+new_field = self._partition_field((bound_ref.field.field_id, 
transform), partition_field_name)
+if new_field.name == self._unassigned_field_name:

Review Comment:
   Ah yeah good point :) the unassigned name is really only temporarily needed. 
I'll update to do this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] add github add to check md link [iceberg-python]

2024-01-30 Thread via GitHub


kevinjqliu commented on PR #324:
URL: https://github.com/apache/iceberg-python/pull/324#issuecomment-1917449075

   Here's the result from the GitHub Action run from my fork
   
https://github.com/kevinjqliu/iceberg-python/actions/runs/7707078539/job/21003669945


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] InMemory Catalog Implementation [iceberg-python]

2024-01-30 Thread via GitHub


kevinjqliu commented on code in PR #289:
URL: https://github.com/apache/iceberg-python/pull/289#discussion_r1471604886


##
tests/catalog/test_base.py:
##


Review Comment:
   thanks for the review @syun64. Since the In-Memory catalog is not a 
"production" catalog, I'm inclined to keep `test_base.py` for now. I plan to 
repurpose it later so that we can test the set of behaviors we expect from all 
catalogs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] InMemory Catalog Implementation [iceberg-python]

2024-01-30 Thread via GitHub


syun64 commented on code in PR #289:
URL: https://github.com/apache/iceberg-python/pull/289#discussion_r1471609730


##
tests/catalog/test_base.py:
##


Review Comment:
   Makes sense @kevinjqliu💯 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Move nightly versioned docs to top-level docs directory [iceberg]

2024-01-30 Thread via GitHub


Fokko merged PR #9578:
URL: https://github.com/apache/iceberg/pull/9578


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Fix flaky TestSparkExecutorCache [iceberg]

2024-01-30 Thread via GitHub


aokolnychyi closed pull request #9583: Spark 3.5: Fix flaky 
TestSparkExecutorCache
URL: https://github.com/apache/iceberg/pull/9583


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.4: Fix writing of default values in CoW for rows with NULL columns which are unmatched [iceberg]

2024-01-30 Thread via GitHub


rdblue merged PR #9556:
URL: https://github.com/apache/iceberg/pull/9556


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.4: Fix writing of default values in CoW for rows with NULL columns which are unmatched [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on PR #9556:
URL: https://github.com/apache/iceberg/pull/9556#issuecomment-1917527435

   Merging this. Thanks @amogh-jahagirdar!
   
   > It is still not clear how a wrong nullability info in MergeRows led to 
inserting default values.
   
   @aokolnychyi, I think what's happening is that the nullability flag is not 
checked when copying the row from the `MergeRows` output. The "default" values 
are null values coming from looking at unset values in `UnsafeRow`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Add support for describing/showing views [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9513:
URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471656403


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala:
##
@@ -60,6 +63,13 @@ case class RewriteViewCommands(spark: SparkSession) extends 
Rule[LogicalPlan] wi
 properties = properties,
 allowExisting = allowExisting,
 replace = replace)
+
+case ShowViews(UnresolvedNamespace(Seq()), pattern, output) if 
isViewCatalog(catalogManager.currentCatalog) =>
+  ShowIcebergViews(ResolvedNamespace(catalogManager.currentCatalog, 
Seq.empty), pattern, output)

Review Comment:
   Won't this case be handled the same way by running 
`CatalogAndNamespace.unapply(Seq())` in the next rule?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Add support for describing/showing views [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9513:
URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471663971


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.scala:
##
@@ -123,6 +128,18 @@ case class ExtendedDataSourceV2Strategy(spark: 
SparkSession) extends Strategy wi
 allowExisting = allowExisting,
 replace = replace) :: Nil
 
+case d@DescribeRelation(ResolvedV2View(catalog, ident), _, isExtended, _) 
=>

Review Comment:
   I understand not wanting to bother with `output`, but I think the best 
practice is to preserve `output` from the logical plan to the physical plan. 
That's because resolution has occurred with the IDs from the logical plan and 
we don't want to reassign internal IDs in the physical plan. They should always 
match. That's why most Spark plans (like `DescribeTableExec`) copy and reuse 
the output attrs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Add support for describing/showing views [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9513:
URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471663971


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.scala:
##
@@ -123,6 +128,18 @@ case class ExtendedDataSourceV2Strategy(spark: 
SparkSession) extends Strategy wi
 allowExisting = allowExisting,
 replace = replace) :: Nil
 
+case d@DescribeRelation(ResolvedV2View(catalog, ident), _, isExtended, _) 
=>

Review Comment:
   I understand not wanting to bother with `output`, but I think the best 
practice is to preserve `output` from the logical plan to the physical plan. 
That's because resolution has occurred with the IDs from the logical plan and 
we don't want to reassign internal IDs in the physical plan. They should always 
match. That's why most Spark plans (like `DescribeTableExec`) copy and reuse 
the output attrs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Add support for describing/showing views [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9513:
URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471666384


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.scala:
##
@@ -123,6 +128,18 @@ case class ExtendedDataSourceV2Strategy(spark: 
SparkSession) extends Strategy wi
 allowExisting = allowExisting,
 replace = replace) :: Nil
 
+case d@DescribeRelation(ResolvedV2View(catalog, ident), _, isExtended, _) 
=>
+  DescribeV2ViewExec(d.output, catalog.loadView(ident), isExtended) :: Nil
+
+case show@ShowTableProperties(ResolvedV2View(catalog, ident), propertyKey, 
_) =>
+  ShowV2ViewPropertiesExec(show.output, catalog.loadView(ident), 
propertyKey) :: Nil

Review Comment:
   It's a little odd to use `show.output` but then name `propertyKey` in the 
pattern. Why not name `output` in the pattern and reference it directly?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Add support for describing/showing views [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9513:
URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471669400


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -1149,6 +1149,159 @@ public void 
createViewWithSubqueryExpressionUsingGlobalTempView() {
 "because it references to the temporary object 
global_temp.%s", globalTempView));
   }
 
+  @Test
+  public void describeView() {
+String viewName = "describeView";
+
+sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, 
tableName);
+assertThat(sql("DESCRIBE %s", viewName))
+.containsExactlyInAnyOrder(row("id", "int", ""), row("data", "string", 
""));

Review Comment:
   I think `InAnyOrder` is incorrect. Don't we want these to be in the schema 
order?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Add support for describing/showing views [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9513:
URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471671329


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeV2ViewExec.scala:
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.util.escapeSingleQuotedString
+import org.apache.spark.sql.connector.catalog.View
+import org.apache.spark.sql.connector.catalog.ViewCatalog
+import org.apache.spark.sql.execution.LeafExecNode
+import scala.collection.JavaConverters._
+
+case class DescribeV2ViewExec(
+  output: Seq[Attribute],
+  view: View,
+  isExtended: Boolean) extends V2CommandExec with LeafExecNode {
+
+  import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
+
+  override protected def run(): Seq[InternalRow] = {
+if (isExtended) {
+  (describeSchema :+ emptyRow) ++ describeExtended
+} else {
+  describeSchema
+}
+  }
+
+  private def describeSchema: Seq[InternalRow] =
+view.schema().map { column =>
+  toCatalystRow(
+column.name,
+column.dataType.simpleString,
+column.getComment().getOrElse(""))
+}
+
+  private def emptyRow: InternalRow = toCatalystRow("", "", "")
+
+  private def describeExtended: Seq[InternalRow] = {
+val outputColumns = view.queryColumnNames.mkString("[", ", ", "]")
+val properties: Map[String, String] = view.properties.asScala.toMap -- 
ViewCatalog.RESERVED_PROPERTIES.asScala
+val viewCatalogAndNamespace: Seq[String] = view.currentCatalog +: 
view.currentNamespace.toSeq
+val viewProperties = properties.toSeq.sortBy(_._1).map {
+  case (key, value) =>
+s"'${escapeSingleQuotedString(key)}' = 
'${escapeSingleQuotedString(value)}'"
+}.mkString("[", ", ", "]")
+
+
+toCatalystRow("# Detailed View Information", "", "") ::
+  toCatalystRow("Comment", 
view.properties.getOrDefault(ViewCatalog.PROP_COMMENT, ""), "") ::
+  toCatalystRow("View Text", view.query, "") ::

Review Comment:
   Do we want to include the view text? I think this should be suppressed in 
favor of showing it for `SHOW CREATE`. Otherwise the output will be formatted 
really strangely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Add support for describing/showing views [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9513:
URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471676481


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -1149,6 +1149,159 @@ public void 
createViewWithSubqueryExpressionUsingGlobalTempView() {
 "because it references to the temporary object 
global_temp.%s", globalTempView));
   }
 
+  @Test
+  public void describeView() {
+String viewName = "describeView";
+
+sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, 
tableName);
+assertThat(sql("DESCRIBE %s", viewName))
+.containsExactlyInAnyOrder(row("id", "int", ""), row("data", "string", 
""));
+  }
+
+  @Test
+  public void describeExtendedView() {
+String viewName = "describeExtendedView";
+String sql = String.format("SELECT id, data FROM %s WHERE id <= 3", 
tableName);
+
+sql(
+"CREATE VIEW %s (new_id COMMENT 'ID', new_data COMMENT 'DATA') COMMENT 
'view comment' AS %s",
+viewName, sql);
+assertThat(sql("DESCRIBE EXTENDED %s", viewName))
+.contains(
+row("new_id", "int", "ID"),
+row("new_data", "string", "DATA"),
+row("Comment", "view comment", ""),
+row("View Text", sql, ""),
+row("View Catalog and Namespace", String.format("%s.%s", 
catalogName, NAMESPACE), ""),
+row("View Query Output Columns", "[id, data]", ""),
+row(
+"View Properties",
+String.format(
+"['format-version' = '1', 'location' = '/%s/%s', 
'provider' = 'iceberg']",
+NAMESPACE, viewName),
+""));
+  }
+
+  @Test
+  public void showViewProperties() {
+String viewName = "showViewProps";
+
+sql(
+"CREATE VIEW %s TBLPROPERTIES ('key1'='val1', 'key2'='val2') AS SELECT 
id, data FROM %s WHERE id <= 3",
+viewName, tableName);
+assertThat(sql("SHOW TBLPROPERTIES %s", viewName))
+.contains(row("key1", "val1"), row("key2", "val2"));
+  }
+
+  @Test
+  public void showViewPropertiesByKey() {
+String viewName = "showViewPropsByKey";
+
+sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, 
tableName);
+assertThat(sql("SHOW TBLPROPERTIES %s", 
viewName)).contains(row("provider", "iceberg"));
+
+assertThat(sql("SHOW TBLPROPERTIES %s (provider)", viewName))
+.contains(row("provider", "iceberg"));
+
+assertThat(sql("SHOW TBLPROPERTIES %s (non.existing)", viewName))
+.contains(
+row(
+"non.existing",
+String.format(
+"View %s.%s.%s does not have property: non.existing",
+catalogName, NAMESPACE, viewName)));
+  }
+
+  @Test
+  public void showViews() throws NoSuchTableException {
+insertRows(6);
+String sql = String.format("SELECT * from %s", tableName);
+sql("CREATE VIEW v1 AS %s", sql);
+sql("CREATE VIEW prefixV2 AS %s", sql);
+sql("CREATE VIEW prefixV3 AS %s", sql);
+sql("CREATE GLOBAL TEMPORARY VIEW globalViewForListing AS %s", sql);
+sql("CREATE TEMPORARY VIEW tempViewForListing AS %s", sql);
+
+// spark stores temp views case-insensitive by default
+Object[] tempView = row("", "tempviewforlisting", true);
+assertThat(sql("SHOW VIEWS"))
+.contains(
+row(NAMESPACE.toString(), "prefixV2", false),
+row(NAMESPACE.toString(), "prefixV3", false),
+row(NAMESPACE.toString(), "v1", false),
+tempView);
+
+assertThat(sql("SHOW VIEWS IN %s", catalogName))
+.contains(
+row(NAMESPACE.toString(), "prefixV2", false),
+row(NAMESPACE.toString(), "prefixV3", false),
+row(NAMESPACE.toString(), "v1", false),
+tempView);

Review Comment:
   I trust you that this is Spark's behavior. But this is weird because it 
isn't in the catalog.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Add support for describing/showing views [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9513:
URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471676989


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -1149,6 +1149,159 @@ public void 
createViewWithSubqueryExpressionUsingGlobalTempView() {
 "because it references to the temporary object 
global_temp.%s", globalTempView));
   }
 
+  @Test
+  public void describeView() {
+String viewName = "describeView";
+
+sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, 
tableName);
+assertThat(sql("DESCRIBE %s", viewName))
+.containsExactlyInAnyOrder(row("id", "int", ""), row("data", "string", 
""));
+  }
+
+  @Test
+  public void describeExtendedView() {
+String viewName = "describeExtendedView";
+String sql = String.format("SELECT id, data FROM %s WHERE id <= 3", 
tableName);
+
+sql(
+"CREATE VIEW %s (new_id COMMENT 'ID', new_data COMMENT 'DATA') COMMENT 
'view comment' AS %s",
+viewName, sql);
+assertThat(sql("DESCRIBE EXTENDED %s", viewName))
+.contains(
+row("new_id", "int", "ID"),
+row("new_data", "string", "DATA"),
+row("Comment", "view comment", ""),
+row("View Text", sql, ""),
+row("View Catalog and Namespace", String.format("%s.%s", 
catalogName, NAMESPACE), ""),
+row("View Query Output Columns", "[id, data]", ""),
+row(
+"View Properties",
+String.format(
+"['format-version' = '1', 'location' = '/%s/%s', 
'provider' = 'iceberg']",
+NAMESPACE, viewName),
+""));
+  }
+
+  @Test
+  public void showViewProperties() {
+String viewName = "showViewProps";
+
+sql(
+"CREATE VIEW %s TBLPROPERTIES ('key1'='val1', 'key2'='val2') AS SELECT 
id, data FROM %s WHERE id <= 3",
+viewName, tableName);
+assertThat(sql("SHOW TBLPROPERTIES %s", viewName))
+.contains(row("key1", "val1"), row("key2", "val2"));
+  }
+
+  @Test
+  public void showViewPropertiesByKey() {
+String viewName = "showViewPropsByKey";
+
+sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, 
tableName);
+assertThat(sql("SHOW TBLPROPERTIES %s", 
viewName)).contains(row("provider", "iceberg"));
+
+assertThat(sql("SHOW TBLPROPERTIES %s (provider)", viewName))
+.contains(row("provider", "iceberg"));
+
+assertThat(sql("SHOW TBLPROPERTIES %s (non.existing)", viewName))
+.contains(
+row(
+"non.existing",
+String.format(
+"View %s.%s.%s does not have property: non.existing",
+catalogName, NAMESPACE, viewName)));
+  }
+
+  @Test
+  public void showViews() throws NoSuchTableException {
+insertRows(6);
+String sql = String.format("SELECT * from %s", tableName);
+sql("CREATE VIEW v1 AS %s", sql);
+sql("CREATE VIEW prefixV2 AS %s", sql);
+sql("CREATE VIEW prefixV3 AS %s", sql);
+sql("CREATE GLOBAL TEMPORARY VIEW globalViewForListing AS %s", sql);
+sql("CREATE TEMPORARY VIEW tempViewForListing AS %s", sql);
+
+// spark stores temp views case-insensitive by default
+Object[] tempView = row("", "tempviewforlisting", true);
+assertThat(sql("SHOW VIEWS"))
+.contains(
+row(NAMESPACE.toString(), "prefixV2", false),
+row(NAMESPACE.toString(), "prefixV3", false),
+row(NAMESPACE.toString(), "v1", false),
+tempView);
+
+assertThat(sql("SHOW VIEWS IN %s", catalogName))
+.contains(
+row(NAMESPACE.toString(), "prefixV2", false),
+row(NAMESPACE.toString(), "prefixV3", false),
+row(NAMESPACE.toString(), "v1", false),
+tempView);

Review Comment:
   Also weird when filtering by NAMESPACE below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Add support for describing/showing views [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9513:
URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471677391


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -1149,6 +1149,159 @@ public void 
createViewWithSubqueryExpressionUsingGlobalTempView() {
 "because it references to the temporary object 
global_temp.%s", globalTempView));
   }
 
+  @Test
+  public void describeView() {
+String viewName = "describeView";
+
+sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, 
tableName);
+assertThat(sql("DESCRIBE %s", viewName))
+.containsExactlyInAnyOrder(row("id", "int", ""), row("data", "string", 
""));
+  }
+
+  @Test
+  public void describeExtendedView() {
+String viewName = "describeExtendedView";
+String sql = String.format("SELECT id, data FROM %s WHERE id <= 3", 
tableName);
+
+sql(
+"CREATE VIEW %s (new_id COMMENT 'ID', new_data COMMENT 'DATA') COMMENT 
'view comment' AS %s",
+viewName, sql);
+assertThat(sql("DESCRIBE EXTENDED %s", viewName))
+.contains(
+row("new_id", "int", "ID"),
+row("new_data", "string", "DATA"),
+row("Comment", "view comment", ""),
+row("View Text", sql, ""),
+row("View Catalog and Namespace", String.format("%s.%s", 
catalogName, NAMESPACE), ""),
+row("View Query Output Columns", "[id, data]", ""),
+row(
+"View Properties",
+String.format(
+"['format-version' = '1', 'location' = '/%s/%s', 
'provider' = 'iceberg']",
+NAMESPACE, viewName),
+""));
+  }
+
+  @Test
+  public void showViewProperties() {
+String viewName = "showViewProps";
+
+sql(
+"CREATE VIEW %s TBLPROPERTIES ('key1'='val1', 'key2'='val2') AS SELECT 
id, data FROM %s WHERE id <= 3",
+viewName, tableName);
+assertThat(sql("SHOW TBLPROPERTIES %s", viewName))
+.contains(row("key1", "val1"), row("key2", "val2"));
+  }
+
+  @Test
+  public void showViewPropertiesByKey() {
+String viewName = "showViewPropsByKey";
+
+sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, 
tableName);
+assertThat(sql("SHOW TBLPROPERTIES %s", 
viewName)).contains(row("provider", "iceberg"));
+
+assertThat(sql("SHOW TBLPROPERTIES %s (provider)", viewName))
+.contains(row("provider", "iceberg"));
+
+assertThat(sql("SHOW TBLPROPERTIES %s (non.existing)", viewName))
+.contains(
+row(
+"non.existing",
+String.format(
+"View %s.%s.%s does not have property: non.existing",
+catalogName, NAMESPACE, viewName)));
+  }
+
+  @Test
+  public void showViews() throws NoSuchTableException {
+insertRows(6);
+String sql = String.format("SELECT * from %s", tableName);
+sql("CREATE VIEW v1 AS %s", sql);
+sql("CREATE VIEW prefixV2 AS %s", sql);
+sql("CREATE VIEW prefixV3 AS %s", sql);
+sql("CREATE GLOBAL TEMPORARY VIEW globalViewForListing AS %s", sql);
+sql("CREATE TEMPORARY VIEW tempViewForListing AS %s", sql);
+
+// spark stores temp views case-insensitive by default
+Object[] tempView = row("", "tempviewforlisting", true);
+assertThat(sql("SHOW VIEWS"))
+.contains(
+row(NAMESPACE.toString(), "prefixV2", false),
+row(NAMESPACE.toString(), "prefixV3", false),
+row(NAMESPACE.toString(), "v1", false),
+tempView);
+
+assertThat(sql("SHOW VIEWS IN %s", catalogName))
+.contains(
+row(NAMESPACE.toString(), "prefixV2", false),
+row(NAMESPACE.toString(), "prefixV3", false),
+row(NAMESPACE.toString(), "v1", false),
+tempView);
+
+assertThat(sql("SHOW VIEWS IN %s.%s", catalogName, "default"))

Review Comment:
   Instead of "default", should this use `NAMESPACE.toString()`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Support compaction [iceberg-python]

2024-01-30 Thread via GitHub


HonahX commented on issue #270:
URL: https://github.com/apache/iceberg-python/issues/270#issuecomment-1917560747

   I am interested in taking this if no one has started working on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Add support for describing/showing views [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9513:
URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471678490


##
spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java:
##
@@ -1149,6 +1149,159 @@ public void 
createViewWithSubqueryExpressionUsingGlobalTempView() {
 "because it references to the temporary object 
global_temp.%s", globalTempView));
   }
 
+  @Test
+  public void describeView() {
+String viewName = "describeView";
+
+sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, 
tableName);
+assertThat(sql("DESCRIBE %s", viewName))
+.containsExactlyInAnyOrder(row("id", "int", ""), row("data", "string", 
""));
+  }
+
+  @Test
+  public void describeExtendedView() {
+String viewName = "describeExtendedView";
+String sql = String.format("SELECT id, data FROM %s WHERE id <= 3", 
tableName);
+
+sql(
+"CREATE VIEW %s (new_id COMMENT 'ID', new_data COMMENT 'DATA') COMMENT 
'view comment' AS %s",
+viewName, sql);
+assertThat(sql("DESCRIBE EXTENDED %s", viewName))
+.contains(
+row("new_id", "int", "ID"),
+row("new_data", "string", "DATA"),
+row("Comment", "view comment", ""),
+row("View Text", sql, ""),
+row("View Catalog and Namespace", String.format("%s.%s", 
catalogName, NAMESPACE), ""),
+row("View Query Output Columns", "[id, data]", ""),
+row(
+"View Properties",
+String.format(
+"['format-version' = '1', 'location' = '/%s/%s', 
'provider' = 'iceberg']",
+NAMESPACE, viewName),
+""));
+  }
+
+  @Test
+  public void showViewProperties() {
+String viewName = "showViewProps";
+
+sql(
+"CREATE VIEW %s TBLPROPERTIES ('key1'='val1', 'key2'='val2') AS SELECT 
id, data FROM %s WHERE id <= 3",
+viewName, tableName);
+assertThat(sql("SHOW TBLPROPERTIES %s", viewName))
+.contains(row("key1", "val1"), row("key2", "val2"));
+  }
+
+  @Test
+  public void showViewPropertiesByKey() {
+String viewName = "showViewPropsByKey";
+
+sql("CREATE VIEW %s AS SELECT id, data FROM %s WHERE id <= 3", viewName, 
tableName);
+assertThat(sql("SHOW TBLPROPERTIES %s", 
viewName)).contains(row("provider", "iceberg"));
+
+assertThat(sql("SHOW TBLPROPERTIES %s (provider)", viewName))
+.contains(row("provider", "iceberg"));
+
+assertThat(sql("SHOW TBLPROPERTIES %s (non.existing)", viewName))
+.contains(
+row(
+"non.existing",
+String.format(
+"View %s.%s.%s does not have property: non.existing",
+catalogName, NAMESPACE, viewName)));
+  }
+
+  @Test
+  public void showViews() throws NoSuchTableException {
+insertRows(6);
+String sql = String.format("SELECT * from %s", tableName);
+sql("CREATE VIEW v1 AS %s", sql);
+sql("CREATE VIEW prefixV2 AS %s", sql);
+sql("CREATE VIEW prefixV3 AS %s", sql);
+sql("CREATE GLOBAL TEMPORARY VIEW globalViewForListing AS %s", sql);
+sql("CREATE TEMPORARY VIEW tempViewForListing AS %s", sql);
+
+// spark stores temp views case-insensitive by default
+Object[] tempView = row("", "tempviewforlisting", true);
+assertThat(sql("SHOW VIEWS"))
+.contains(
+row(NAMESPACE.toString(), "prefixV2", false),
+row(NAMESPACE.toString(), "prefixV3", false),
+row(NAMESPACE.toString(), "v1", false),
+tempView);
+
+assertThat(sql("SHOW VIEWS IN %s", catalogName))
+.contains(
+row(NAMESPACE.toString(), "prefixV2", false),
+row(NAMESPACE.toString(), "prefixV3", false),
+row(NAMESPACE.toString(), "v1", false),
+tempView);
+
+assertThat(sql("SHOW VIEWS IN %s.%s", catalogName, "default"))
+.contains(
+row(NAMESPACE.toString(), "prefixV2", false),
+row(NAMESPACE.toString(), "prefixV3", false),
+row(NAMESPACE.toString(), "v1", false),
+tempView);
+
+assertThat(sql("SHOW VIEWS LIKE 'pref*'"))
+.contains(
+row(NAMESPACE.toString(), "prefixV2", false),
+row(NAMESPACE.toString(), "prefixV3", false));
+
+assertThat(sql("SHOW VIEWS LIKE 'non-existing'")).isEmpty();
+
+assertThat(sql("SHOW VIEWS IN spark_catalog.default")).contains(tempView);
+
+assertThat(sql("SHOW VIEWS IN global_temp"))
+.contains(
+// spark stores temp views case-insensitive by default
+row("global_temp", "globalviewforlisting", true), tempView);
+  }
+
+  @Test
+  public void showCreateSimpleView() {
+String viewName = "showCreateSimpleView";
+String sql = String.format("SELECT id, data FROM %s WHERE id <= 3", 
tableName);
+
+sql("CREATE VIEW %s AS %s", viewName, sql);
+
+String expected =
+

Re: [PR] Spark: Add support for describing/showing views [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9513:
URL: https://github.com/apache/iceberg/pull/9513#discussion_r1471683079


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowCreateV2ViewExec.scala:
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.catalyst.util.escapeSingleQuotedString
+import org.apache.spark.sql.connector.catalog.View
+import org.apache.spark.sql.connector.catalog.ViewCatalog
+import org.apache.spark.sql.execution.LeafExecNode
+import scala.collection.JavaConverters._
+
+case class ShowCreateV2ViewExec(output: Seq[Attribute], view: View)
+  extends V2CommandExec with LeafExecNode {
+
+  override protected def run(): Seq[InternalRow] = {
+val builder = new StringBuilder
+builder ++= s"CREATE VIEW ${view.name} "
+showColumns(view, builder)
+showComment(view, builder)
+showProperties(view, builder)
+builder ++= s"AS\n${view.query}\n"
+
+Seq(toCatalystRow(builder.toString))
+  }
+
+  private def showColumns(view: View, builder: StringBuilder): Unit = {
+val columns = view.schema().fields

Review Comment:
   I think this is fine for now, but we should consider how to handle the 
aliases in the future



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Add support for describing/showing views [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on PR #9513:
URL: https://github.com/apache/iceberg/pull/9513#issuecomment-1917565679

   I pointed out a few things that I think I would change, but there is nothing 
blocking this. +1 overall. Please fix what you agree with (including the `SHOW 
VIEWS` behavior with temp views) and merge when you're ready!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Support altering view properties [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9582:
URL: https://github.com/apache/iceberg/pull/9582#discussion_r1471690098


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckViews.scala:
##
@@ -36,6 +38,13 @@ object CheckViews extends (LogicalPlan => Unit) {
 verifyColumnCount(ident, columnAliases, query)
 SchemaUtils.checkColumnNameDuplication(query.schema.fieldNames, 
SQLConf.get.resolver)
 
+  case UnsetViewProperties(ResolvedV2View(catalog, ident), propertyKeys, 
ifExists) =>

Review Comment:
   Check rules are intended to ensure that SQL is correct and valid, not to 
check the execution. If you have this check here, it would fail when you run 
`EXPLAIN ALTER VIEW ...` because this is doing execution work (checking for the 
property) in the planner. Another way to think about this is that this isn't an 
analysis failure, it is a runtime failure due to data. The SQL is perfectly 
valid.
   
   This should either be done in `AlterV2ViewExec` or not at all. I would lean 
toward not doing this check at all. Shouldn't this be idempotent? Is there 
value in failing to remove a property if it isn't there? I don't see value so 
I'd ignore this and let the catalog throw an exception if it chooses to.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Support altering view properties [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9582:
URL: https://github.com/apache/iceberg/pull/9582#discussion_r1471691985


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala:
##
@@ -21,19 +21,21 @@ package org.apache.spark.sql.catalyst.analysis
 
 import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.analysis.ViewUtil.IcebergViewHelper
+import org.apache.spark.sql.catalyst.analysis.ViewUtil.isViewCatalog

Review Comment:
   Let's also avoid static imports in Scala. Can you import `ViewUtil` and call 
these method from there?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Add example of using PyIceberg with minimal external dependencies [iceberg-python]

2024-01-30 Thread via GitHub


kevinjqliu opened a new issue, #326:
URL: https://github.com/apache/iceberg-python/issues/326

   ### Feature Request / Improvement
   
   With #299 resolved, let's add an example to show working with Iceberg using 
minimal external dependencies. 
   
   Get started with Iceberg using the local filesystem and a simple catalog.
   `Local Filesystem` + `SqlCatalog`
   
   ## Catalog config
   ```
   from pyiceberg.catalog.sql import SqlCatalog
   
   catalog_path = "/tmp/warehouse/pyiceberg.db"
   warehouse_path = "/tmp/warehouse/"
   catalog = SqlCatalog("default", **{
   "uri": f"sqlite:///{catalog_path}",
   "warehouse": f"file://{warehouse_path}",
   })
   ```
   
   Note that, the `/tmp/warehouse/` folder needs to exist first.
   
   ## Create a new table and write to it
   
https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/api.md#write-support
   
   ## Query the table with `pyiceberg` 
   `~/.pyiceberg.yaml` file:
   ```
   catalog:
 default:
   uri: sqlite:tmp/warehouse/pyiceberg.db
   warehouse: file:///tmp/warehouse/
   ```
   
   ## Run `pyiceberg` CLI
   `pyiceberg list`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Add partition evolution [iceberg-python]

2024-01-30 Thread via GitHub


syun64 commented on issue #193:
URL: https://github.com/apache/iceberg-python/issues/193#issuecomment-1917576022

   @Fokko Should we move this to 0.7.0 release Milestone?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Support altering view properties [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9582:
URL: https://github.com/apache/iceberg/pull/9582#discussion_r1471693576


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala:
##
@@ -60,17 +62,18 @@ case class RewriteViewCommands(spark: SparkSession) extends 
Rule[LogicalPlan] wi
 properties = properties,
 allowExisting = allowExisting,
 replace = replace)
+
+case u@UnresolvedView(ResolvedView(resolved), _, _, _) =>

Review Comment:
   Why does this need to be done here instead of in `ResolveViews`? This is 
just resolving the view. I think it's probably that we have to resolve early to 
avoid an `AnalysisException`, but it would be great to clarify with a comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Use `.read_bytes()` instead [iceberg-python]

2024-01-30 Thread via GitHub


HonahX merged PR #325:
URL: https://github.com/apache/iceberg-python/pull/325


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Throw exception on `ALTER VIEW AS ` [iceberg]

2024-01-30 Thread via GitHub


nastra commented on code in PR #9510:
URL: https://github.com/apache/iceberg/pull/9510#discussion_r1471696587


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckViews.scala:
##
@@ -36,6 +38,9 @@ object CheckViews extends (LogicalPlan => Unit) {
 verifyColumnCount(ident, columnAliases, query)
 SchemaUtils.checkColumnNameDuplication(query.schema.fieldNames, 
SQLConf.get.resolver)
 
+case AlterViewAs(ResolvedV2View(_, _, _), _, query) =>
+  SchemaUtils.checkColumnNameDuplication(query.schema.fieldNames, 
SQLConf.get.resolver)

Review Comment:
   To summarize this topic, we discussed this offline and decided that we 
should not support `ALTER VIEW ... AS` for now and rather throw an exception 
(indicating to use `CREATE OR REPLACE VIEW`), as it leads to surprising 
behaviors that column aliases/comments are lost after updating the underlying 
SQL.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Support altering view properties [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9582:
URL: https://github.com/apache/iceberg/pull/9582#discussion_r1471696609


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckViews.scala:
##
@@ -36,6 +38,13 @@ object CheckViews extends (LogicalPlan => Unit) {
 verifyColumnCount(ident, columnAliases, query)
 SchemaUtils.checkColumnNameDuplication(query.schema.fieldNames, 
SQLConf.get.resolver)
 
+  case UnsetViewProperties(ResolvedV2View(catalog, ident), propertyKeys, 
ifExists) =>

Review Comment:
   Okay, I missed the `ifExists` part. looks like this behavior is required by 
Spark.
   
   In that case, this should be moved to the Exec plan.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Support altering view properties [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9582:
URL: https://github.com/apache/iceberg/pull/9582#discussion_r1471699613


##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##
@@ -606,6 +607,29 @@ public View createView(
   @Override
   public View alterView(Identifier ident, ViewChange... changes)
   throws NoSuchViewException, IllegalArgumentException {
+if (null != asViewCatalog) {
+  try {
+org.apache.iceberg.view.View view = 
asViewCatalog.loadView(buildIdentifier(ident));
+UpdateViewProperties updateViewProperties = view.updateProperties();

Review Comment:
   Can you also reject reserved properties? I don't think that you should be 
able to set `format-version`, `location`, or `provider`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Support altering view properties [iceberg]

2024-01-30 Thread via GitHub


nastra commented on code in PR #9582:
URL: https://github.com/apache/iceberg/pull/9582#discussion_r1471703651


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala:
##
@@ -60,17 +62,18 @@ case class RewriteViewCommands(spark: SparkSession) extends 
Rule[LogicalPlan] wi
 properties = properties,
 allowExisting = allowExisting,
 replace = replace)
+
+case u@UnresolvedView(ResolvedView(resolved), _, _, _) =>

Review Comment:
   yes this needs to be done early because otherwise the Analyzer will resolve 
views and fail (since v2 views aren't supported). I'll add a comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Rewrite identifier when using Subquery expressions in View [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on PR #9587:
URL: https://github.com/apache/iceberg/pull/9587#issuecomment-1917587797

   Thanks, @nastra! Good to have this fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Support altering view properties [iceberg]

2024-01-30 Thread via GitHub


nastra commented on code in PR #9582:
URL: https://github.com/apache/iceberg/pull/9582#discussion_r1471703651


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala:
##
@@ -60,17 +62,18 @@ case class RewriteViewCommands(spark: SparkSession) extends 
Rule[LogicalPlan] wi
 properties = properties,
 allowExisting = allowExisting,
 replace = replace)
+
+case u@UnresolvedView(ResolvedView(resolved), _, _, _) =>

Review Comment:
   this needs to be done early because otherwise the Analyzer will resolve 
views and fail (since v2 views aren't supported). I'll add a comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Rewrite identifier when using Subquery expressions in View [iceberg]

2024-01-30 Thread via GitHub


rdblue merged PR #9587:
URL: https://github.com/apache/iceberg/pull/9587


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: Bump software.amazon.s3.accessgrants:aws-s3-accessgrants-java-plugin from 1.0.1 to 2.0.0 [iceberg]

2024-01-30 Thread via GitHub


jackye1995 merged PR #9575:
URL: https://github.com/apache/iceberg/pull/9575


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Fix flaky TestSparkExecutorCache [iceberg]

2024-01-30 Thread via GitHub


aokolnychyi commented on PR #9583:
URL: https://github.com/apache/iceberg/pull/9583#issuecomment-1917593034

   @amogh-jahagirdar @szehon-ho @nastra, could you check this one? I had 5 
successful CI runs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: Bump software.amazon.awssdk:bom from 2.23.2 to 2.23.12 [iceberg]

2024-01-30 Thread via GitHub


jackye1995 merged PR #9573:
URL: https://github.com/apache/iceberg/pull/9573


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] [Proposal] Iceberg Materialized View Spec [iceberg]

2024-01-30 Thread via GitHub


szehon-ho commented on issue #6420:
URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1917629464

   Great, this all makes sense to me except following points:
   
   > 3. Question: Most people seem to be for Option 1
   
   As we clarified in the google doc, it seems at least locally we are moving 
to Option 2:  
https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?pli=1&disco=AAABFByy0mU
   
   > 3. As can be seen in the last point of 
https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?pli=1#heading=h.m5kli4l5q7ui,
 we agreed to leave the refresh strategy to the query engine. If available, an 
incremental refresh is always preferable to a full refresh.
   
   Did we consider making this property configurable?  We should at least store 
what the materialized view refresh strategy was used in the snapshot for 
debugging?  We can discuss on the google doc.
   
   So Im open to either, we can make a PR, or fill out the Specification Draft 
part of the doc ?  The main barrier right now to more folks reviewing is that 
the doc is just a list of options and long threads that take forever to see 
where the consensus is, it took me quite some time to follow.  Having a central 
section of what concrete additions we are proposing will greatly streamline the 
review experience.  Feel free to take my summarized spec changes in my previous 
comment as a starting point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Create Iceberg Table from pyarrow Schema with no IDs [iceberg-python]

2024-01-30 Thread via GitHub


syun64 closed issue #278: Create Iceberg Table from pyarrow Schema with no IDs
URL: https://github.com/apache/iceberg-python/issues/278


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark: Throw exception on `ALTER VIEW AS ` [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #9510:
URL: https://github.com/apache/iceberg/pull/9510#discussion_r1471786657


##
spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteViewCommands.scala:
##
@@ -60,6 +62,9 @@ case class RewriteViewCommands(spark: SparkSession) extends 
Rule[LogicalPlan] wi
 properties = properties,
 allowExisting = allowExisting,
 replace = replace)
+
+case AlterViewAs(ResolvedV2View(_, _), _, _) =>
+  throw new AnalysisException("ALTER VIEW  AS is not supported. 
Use CREATE OR REPLACE VIEW instead")

Review Comment:
   Does this need to happen here instead of in `CheckViews` so that Spark 
doesn't throw a different analysis exception?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] REST: Set fresh IDs on create-table [iceberg-python]

2024-01-30 Thread via GitHub


Fokko opened a new pull request, #327:
URL: https://github.com/apache/iceberg-python/pull/327

   For the other implementation we do this when writing the metadata, but for 
the REST catalog we just post the schema itself.
   
   The REST spec does not mention anything about
   setting fresh IDs, so it is best to do it ourselves.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Remove ASF publish property [iceberg-docs]

2024-01-30 Thread via GitHub


rdblue merged PR #308:
URL: https://github.com/apache/iceberg-docs/pull/308


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Move nightly versioned docs to top-level docs directory [iceberg]

2024-01-30 Thread via GitHub


rdblue commented on PR #9578:
URL: https://github.com/apache/iceberg/pull/9578#issuecomment-1917693542

   Thanks for reverting this, @Fokko! I think we need to make sure that the 
docs don't move so that we have history and we can validate the changes from 
merging nightly into these docs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] New docs switch [iceberg]

2024-01-30 Thread via GitHub


rdblue merged PR #9520:
URL: https://github.com/apache/iceberg/pull/9520


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Problems in release doc about versioned documentation [iceberg]

2024-01-30 Thread via GitHub


rdblue closed issue #8151: Problems in release doc about versioned documentation
URL: https://github.com/apache/iceberg/issues/8151


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] REST: Set fresh IDs on create-table [iceberg-python]

2024-01-30 Thread via GitHub


rdblue commented on code in PR #327:
URL: https://github.com/apache/iceberg-python/pull/327#discussion_r1471795199


##
pyiceberg/catalog/rest.py:
##
@@ -447,13 +447,14 @@ def create_table(
 sort_order: SortOrder = UNSORTED_SORT_ORDER,
 properties: Properties = EMPTY_DICT,
 ) -> Table:
-schema: Schema = self._convert_schema_if_needed(schema)  # type: ignore
+iceberg_schema = self._convert_schema_if_needed(schema)
+iceberg_schema = assign_fresh_schema_ids(iceberg_schema)
 
 namespace_and_table = self._split_identifier_for_path(identifier)
 request = CreateTableRequest(
 name=namespace_and_table["table"],
 location=location,
-table_schema=schema,
+table_schema=iceberg_schema,

Review Comment:
   Isn't this done on the server side?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org