date:20231013

Re: [PR] Flink: flink/*: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



nastra commented on PR #8819:
URL: https://github.com/apache/iceberg/pull/8819#issuecomment-1761008540

   @PickBas thanks for working on this. I think it would be great to combine 
all changes in a single PR rather than opening 1-file PRs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Derive 'operation' from view version [iceberg]

2023-10-13 Thread via GitHub



nastra commented on code in PR #8678:
URL: https://github.com/apache/iceberg/pull/8678#discussion_r1357853053


##
core/src/main/java/org/apache/iceberg/view/BaseMetastoreViewCatalog.java:
##
@@ -161,7 +162,7 @@ private View create(ViewOperations ops) {
   .defaultNamespace(defaultNamespace)
   .defaultCatalog(defaultCatalog)
   .timestampMillis(System.currentTimeMillis())
-  .putSummary("operation", "create")
+  .putAllSummary(EnvironmentContext.get())

Review Comment:
   in https://iceberg.apache.org/view-spec/#summary we mention that the summary 
might contain the engine name/version, so I figured it makes sense to add this 
info from `EnvironmentContext` here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Derive 'operation' from view version [iceberg]

2023-10-13 Thread via GitHub



nastra commented on code in PR #8678:
URL: https://github.com/apache/iceberg/pull/8678#discussion_r1357855073


##
core/src/test/java/org/apache/iceberg/view/ViewCatalogTests.java:
##
@@ -99,7 +100,7 @@ public void basicCreateView() {
 .timestampMillis(view.currentVersion().timestampMillis())
 .versionId(1)
 .schemaId(0)
-.putSummary("operation", "create")
+.summary(view.currentVersion().summary())

Review Comment:
   we could also use `EnvironmentContext.get()` here for comparison, but I'm 
not sure if this might later cause test issues when view support is added for 
other engines



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: suport read/write Manifest [iceberg-rust]

2023-10-13 Thread via GitHub



liurenjie1024 commented on code in PR #79:
URL: https://github.com/apache/iceberg-rust/pull/79#discussion_r1357853771


##
crates/iceberg/src/spec/partition.rs:
##
@@ -54,6 +56,24 @@ impl PartitionSpec {
 pub fn builder() -> PartitionSpecBuilder {
 PartitionSpecBuilder::default()
 }
+
+/// Construct the partition type from schema
+pub fn partition_type(&self, schema: &Schema) -> Result 
{
+let mut fields = Vec::with_capacity(self.fields.len());
+for field in &self.fields {
+let field = schema.field_by_id(field.source_id).ok_or_else(|| {

Review Comment:
   I think this is incorrect? The partition type should be determined by type 
after transformation.



##
crates/iceberg/src/spec/values.rs:
##
@@ -966,6 +978,547 @@ mod timestamptz {
 }
 }
 
+mod serde {

Review Comment:
   I would suggest to move this into another pr and resolve them first, 
otherwise this pr would get too large to review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: flink/*: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



PickBas commented on PR #8819:
URL: https://github.com/apache/iceberg/pull/8819#issuecomment-1761042424

   > @PickBas thanks for working on this. I think it would be great to combine 
all changes in a single PR rather than opening 1-file PRs
   
   @nastra Thank you for your feedback. I've done 1 PR per module so we could 
keep the changes traceable and manageable, as @Fokko requested in PR #8813. 
   
   I may do it another way: keep PRs open for spark and core modules, since 
they include most of the changes, and combine the rest of the modules into a 
single PR.
   
   If that works for both you and @Fokko, I will surely do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: suport read/write Manifest [iceberg-rust]

2023-10-13 Thread via GitHub



ZENOTME commented on code in PR #79:
URL: https://github.com/apache/iceberg-rust/pull/79#discussion_r1357920808


##
crates/iceberg/src/spec/values.rs:
##
@@ -966,6 +978,547 @@ mod timestamptz {
 }
 }
 
+mod serde {

Review Comment:
   If `RawLiteral` looks well, I can seperate it to another PR. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: suport read/write Manifest [iceberg-rust]

2023-10-13 Thread via GitHub



ZENOTME commented on code in PR #79:
URL: https://github.com/apache/iceberg-rust/pull/79#discussion_r1357921009


##
crates/iceberg/src/spec/partition.rs:
##
@@ -54,6 +56,24 @@ impl PartitionSpec {
 pub fn builder() -> PartitionSpecBuilder {
 PartitionSpecBuilder::default()
 }
+
+/// Construct the partition type from schema
+pub fn partition_type(&self, schema: &Schema) -> Result 
{
+let mut fields = Vec::with_capacity(self.fields.len());
+for field in &self.fields {
+let field = schema.field_by_id(field.source_id).ok_or_else(|| {

Review Comment:
   Yes. My bad.🥵



##
crates/iceberg/src/spec/partition.rs:
##
@@ -54,6 +56,24 @@ impl PartitionSpec {
 pub fn builder() -> PartitionSpecBuilder {
 PartitionSpecBuilder::default()
 }
+
+/// Construct the partition type from schema
+pub fn partition_type(&self, schema: &Schema) -> Result 
{
+let mut fields = Vec::with_capacity(self.fields.len());
+for field in &self.fields {
+let field = schema.field_by_id(field.source_id).ok_or_else(|| {

Review Comment:
   Yes. My bad.🥵



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: add builder to TableMetadata interface [iceberg-rust]

2023-10-13 Thread via GitHub



liurenjie1024 commented on code in PR #62:
URL: https://github.com/apache/iceberg-rust/pull/62#discussion_r1357880510


##
crates/iceberg/src/spec/table_metadata.rs:
##
@@ -93,22 +116,278 @@ pub struct TableMetadata {
 /// previous metadata file location should be added to the list.
 /// Tables can be configured to remove oldest metadata log entries and
 /// keep a fixed-size log of the most recent entries after a commit.
+#[builder(default, setter(custom))]
 metadata_log: Vec,
 
 /// A list of sort orders, stored as full sort order objects.
+#[builder(default, setter(custom))]
 sort_orders: HashMap,
 /// Default sort order id of the table. Note that this could be used by
 /// writers, but is not used when reading because reads use the specs
 /// stored in manifest files.
+#[builder(default = "DEFAULT_SORT_ORDER_ID", setter(custom))]
 default_sort_order_id: i64,
 ///A map of snapshot references. The map keys are the unique snapshot 
reference
 /// names in the table, and the map values are snapshot reference objects.
 /// There is always a main branch reference pointing to the 
current-snapshot-id
 /// even if the refs map is null.
+#[builder(default = "Self::default_ref()", setter(custom))]
 refs: HashMap,
 }
 
+// We define a from implementation from builder Error to Iceberg Error
+impl From for Error {

Review Comment:
   Move this to `error` module?



##
crates/iceberg/src/spec/table_metadata.rs:
##
@@ -93,22 +116,278 @@ pub struct TableMetadata {
 /// previous metadata file location should be added to the list.
 /// Tables can be configured to remove oldest metadata log entries and
 /// keep a fixed-size log of the most recent entries after a commit.
+#[builder(default, setter(custom))]
 metadata_log: Vec,
 
 /// A list of sort orders, stored as full sort order objects.
+#[builder(default, setter(custom))]
 sort_orders: HashMap,
 /// Default sort order id of the table. Note that this could be used by
 /// writers, but is not used when reading because reads use the specs
 /// stored in manifest files.
+#[builder(default = "DEFAULT_SORT_ORDER_ID", setter(custom))]
 default_sort_order_id: i64,
 ///A map of snapshot references. The map keys are the unique snapshot 
reference
 /// names in the table, and the map values are snapshot reference objects.
 /// There is always a main branch reference pointing to the 
current-snapshot-id
 /// even if the refs map is null.
+#[builder(default = "Self::default_ref()", setter(custom))]
 refs: HashMap,
 }
 
+// We define a from implementation from builder Error to Iceberg Error
+impl From for Error {
+fn from(ufe: UninitializedFieldError) -> Error {
+Error::new(ErrorKind::DataInvalid, ufe.to_string())

Review Comment:
   ```suggestion
   Error::new(ErrorKind::DataInvalid, "Some fields of table metadata 
not inited")
   .with_source(ufe)
   ```



##
crates/iceberg/src/spec/table_metadata.rs:
##
@@ -93,22 +116,278 @@ pub struct TableMetadata {
 /// previous metadata file location should be added to the list.
 /// Tables can be configured to remove oldest metadata log entries and
 /// keep a fixed-size log of the most recent entries after a commit.
+#[builder(default, setter(custom))]
 metadata_log: Vec,
 
 /// A list of sort orders, stored as full sort order objects.
+#[builder(default, setter(custom))]
 sort_orders: HashMap,
 /// Default sort order id of the table. Note that this could be used by
 /// writers, but is not used when reading because reads use the specs
 /// stored in manifest files.
+#[builder(default = "DEFAULT_SORT_ORDER_ID", setter(custom))]
 default_sort_order_id: i64,
 ///A map of snapshot references. The map keys are the unique snapshot 
reference
 /// names in the table, and the map values are snapshot reference objects.
 /// There is always a main branch reference pointing to the 
current-snapshot-id
 /// even if the refs map is null.
+#[builder(default = "Self::default_ref()", setter(custom))]
 refs: HashMap,
 }
 
+// We define a from implementation from builder Error to Iceberg Error
+impl From for Error {
+fn from(ufe: UninitializedFieldError) -> Error {
+Error::new(ErrorKind::DataInvalid, ufe.to_string())
+}
+}
+
+impl TableMetadataBuilder {
+/// Get current time in ms
+fn current_time_ms() -> i64 {
+UNIX_EPOCH
+.elapsed()
+.unwrap()
+.as_millis()
+.try_into()
+.unwrap()
+}
+
+fn default_ref() -> HashMap {
+HashMap::from([(
+"main".to_string(),
+SnapshotReference {
+snapshot_id: -1,
+retention: SnapshotRetention::Branch {
+min_snapshots_to_keep: None,
+

Re: [I] Flaky test: TestRemoveOrphanFilesAction3 > orphanedFileRemovedWithParallelTasks [iceberg]

2023-10-13 Thread via GitHub



amogh-jahagirdar commented on issue #8824:
URL: https://github.com/apache/iceberg/issues/8824#issuecomment-1761134284

   Yeah, I'd probably start with making this 
https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java#L273
 a `ConcurrentHashMap.newKeySet();`. Even though the implementation inserts 
unique elements (file paths), there are no thread safety guarantees on normal 
hash sets. I think it's possible there could be some weird corruptions of 
internal state on the underlying table/buckets in the set which lead to missing 
elements.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Aws: aws/*: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



PickBas closed pull request #8823: Aws: aws/*: replaced .size() > 0 with 
isEmpty()
URL: https://github.com/apache/iceberg/pull/8823


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Mr: mr/*: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



PickBas closed pull request #8820: Mr: mr/*: replaced .size() > 0 with isEmpty()
URL: https://github.com/apache/iceberg/pull/8820


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Delta-lake: delta-lake/*: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



PickBas closed pull request #8818: Delta-lake: delta-lake/*: replaced .size() > 
0 with isEmpty()
URL: https://github.com/apache/iceberg/pull/8818


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Aliyun: aliyun/*: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



PickBas closed pull request #8821: Aliyun: aliyun/*: replaced .size() > 0 with 
isEmpty()
URL: https://github.com/apache/iceberg/pull/8821


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Hive3: hive3/*: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



PickBas closed pull request #8817: Hive3: hive3/*: replaced .size() > 0 with 
isEmpty()
URL: https://github.com/apache/iceberg/pull/8817


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Parquet: parquet/*: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



PickBas closed pull request #8816: Parquet: parquet/*: replaced .size() > 0 
with isEmpty()
URL: https://github.com/apache/iceberg/pull/8816


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Data: data/*: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



PickBas closed pull request #8815: Data: data/*: replaced .size() > 0 with 
isEmpty()
URL: https://github.com/apache/iceberg/pull/8815


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Api: api/*: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



PickBas closed pull request #8822: Api: api/*: replaced .size() > 0 with 
isEmpty()
URL: https://github.com/apache/iceberg/pull/8822


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink, Aliyun, MR, Delta-lake, Hive3, Parquet, Data: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



PickBas commented on PR #8819:
URL: https://github.com/apache/iceberg/pull/8819#issuecomment-1761167713

   @amogh-jahagirdar @nastra  @Fokko Done. Now the rest of the modules are 
included in this pull request.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

2023-10-13 Thread via GitHub



chenwyi2 commented on PR #7161:
URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1761169778

   Hi @stevenzwu @kengtin this PR can be create too many small files when 
parition with dt,hout,minute and bucekt(id), suppose paralisim is 120 and bucke 
number is 8, then 15 writes can write into same one bucket, but there is 
problem, data from the previous few hours can be into one commit because of 
data latency, there can be 15000 and more data files if changed partition is up 
to 1000, can we use complete parition name instead of just bucket?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink, Aliyun, MR, Delta-lake, Hive3, Parquet, Data: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



Fokko commented on PR #8819:
URL: https://github.com/apache/iceberg/pull/8819#issuecomment-1761169780

   > If that works for both you and @Fokko, I will surely do that.
   
   I'm open to everything. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink, Aliyun, MR, Delta-lake, Hive3, Parquet, Data: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



Fokko commented on code in PR #8819:
URL: https://github.com/apache/iceberg/pull/8819#discussion_r1357986438


##
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogNamespace.java:
##
@@ -105,7 +105,7 @@ public void testNamespaceExists() {
   public void testListNamespace() {
 String namespace = createNamespace();
 List namespaceList = glueCatalog.listNamespaces();
-Assert.assertTrue(namespaceList.size() > 0);
+Assert.assertFalse(namespaceList.isEmpty());

Review Comment:
   Nice, much easier to read 👍 



##
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogNamespace.java:
##
@@ -105,7 +105,7 @@ public void testNamespaceExists() {
   public void testListNamespace() {
 String namespace = createNamespace();
 List namespaceList = glueCatalog.listNamespaces();
-Assert.assertTrue(namespaceList.size() > 0);
+Assert.assertFalse(namespaceList.isEmpty());

Review Comment:
   Nice, much easier to read 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink, Aliyun, MR, Delta-lake, Hive3, Parquet, Data: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



nastra commented on code in PR #8819:
URL: https://github.com/apache/iceberg/pull/8819#discussion_r1358006481


##
aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogNamespace.java:
##
@@ -105,7 +105,7 @@ public void testNamespaceExists() {
   public void testListNamespace() {
 String namespace = createNamespace();
 List namespaceList = glueCatalog.listNamespaces();
-Assert.assertTrue(namespaceList.size() > 0);
+Assert.assertFalse(namespaceList.isEmpty());

Review Comment:
   nit: we might want to switch all Junit4-style assertions to AssertJ in the 
AWS integration test module (in a separate PR)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]

2023-10-13 Thread via GitHub



nastra commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1761210176

   @fengjiajie thanks for working on this. Could you please add a test that 
reproduces the issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat(tables): add basic table implementation [iceberg-go]

2023-10-13 Thread via GitHub



nastra commented on code in PR #11:
URL: https://github.com/apache/iceberg-go/pull/11#discussion_r1358030118


##
table/metadata.go:
##
@@ -0,0 +1,401 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package table
+
+import (
+   "encoding/json"
+   "errors"
+   "fmt"
+   "io"
+
+   "github.com/apache/iceberg-go"
+
+   "github.com/google/uuid"
+)
+
+// Metadata for an iceberg table as specified in the Iceberg spec
+//
+// https://iceberg.apache.org/spec/#iceberg-table-spec
+type Metadata interface {
+   // Version indicates the version of this metadata, 1 for V1, 2 for V2, 
etc.
+   Version() int
+   // TableUUID returns a UUID that identifies the table, generated when 
the
+   // table is created. Implementations must throw an exception if a 
table's
+   // UUID does not match the expected UUID after refreshing metadata.
+   TableUUID() uuid.UUID
+   // Location is the table's base location. This is used by writers to 
determine
+   // where to store data files, manifest files, and table metadata files.
+   Location() string
+   // LastUpdatedMillis is the timestamp in milliseconds from the unix 
epoch when
+   // the table was last updated. Each table metadata file should update 
this
+   // field just before writing.
+   LastUpdatedMillis() int64
+   // LastColumn returns the highest assigned column ID for the table.
+   // This is used to ensure fields are always assigned an unused ID when
+   // evolving schemas.
+   LastColumn() int
+   // Schemas returns the list of schemas, stored as objects with their
+   // schema-id.
+   Schemas() []*iceberg.Schema
+   // CurrentSchema returns the table's current schema.
+   CurrentSchema() *iceberg.Schema
+   // PartitionSpecs returns the list of all partition specs in the table.
+   PartitionSpecs() []iceberg.PartitionSpec
+   // PartitionSpec returns the current partition spec that the table is 
using.
+   PartitionSpec() iceberg.PartitionSpec
+   // DefaultPartitionSpec is the ID of the current spec that writers 
should
+   // use by default.
+   DefaultPartitionSpec() int
+   // LastPartitionSpecID is the highest assigned partition field ID across
+   // all partition specs for the table. This is used to ensure partition
+   // fields are always assigned an unused ID when evolving specs.
+   LastPartitionSpecID() *int
+   // Snapshots returns the list of valid snapshots. Valid snapshots are
+   // snapshots for which all data files exist in the file system. A data
+   // file must not be deleted from the file system until the last snapshot
+   // in which it was listed is garbage collected.
+   Snapshots() []Snapshot
+   // SnapshotByID find and return a specific snapshot by its ID. Returns
+   // nil if the ID is not found in the list of snapshots.
+   SnapshotByID(int64) *Snapshot
+   // SnapshotByName searches the list of snapshots for a snapshot with a 
given
+   // ref name. Returns nil if there's no ref with this name for a 
snapshot.
+   SnapshotByName(name string) *Snapshot

Review Comment:
   it should be either 0 or 1 snapshot that is returned. In my head I was 
thinking about `snapshotsById` that we use in the Java impl, so having singular 
here is fine I think



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat(tables): add basic table implementation [iceberg-go]

2023-10-13 Thread via GitHub



nastra commented on code in PR #11:
URL: https://github.com/apache/iceberg-go/pull/11#discussion_r1358030652


##
table/table.go:
##
@@ -0,0 +1,97 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package table
+
+import (
+   "reflect"
+
+   "github.com/apache/iceberg-go"
+   "github.com/apache/iceberg-go/io"
+   "golang.org/x/exp/slices"
+)
+
+type Identifier = []string
+
+type Table struct {
+   identifier   Identifier
+   metadata Metadata
+   metadataLocation string
+   fs   io.IO
+}
+
+func (t Table) Equals(other Table) bool {
+   return slices.Equal(t.identifier, other.identifier) &&
+   t.metadataLocation == other.metadataLocation &&
+   reflect.DeepEqual(t.metadata, other.metadata)
+}
+
+func (t Table) Identifier() Identifier   { return t.identifier }
+func (t Table) Metadata() Metadata   { return t.metadata }
+func (t Table) MetadataLocation() string { return t.metadataLocation }
+func (t Table) FS() io.IO{ return t.fs }
+
+func (t Table) Schema() *iceberg.Schema  { return 
t.metadata.CurrentSchema() }
+func (t Table) Spec() iceberg.PartitionSpec  { return 
t.metadata.PartitionSpec() }
+func (t Table) SortOrder() SortOrder { return 
t.metadata.SortOrder() }
+func (t Table) Properties() iceberg.Properties   { return 
t.metadata.Properties() }
+func (t Table) Location() string { return 
t.metadata.Location() }
+func (t Table) CurrentSnapshot() *Snapshot   { return 
t.metadata.CurrentSnapshot() }
+func (t Table) SnapshotByID(id int64) *Snapshot  { return 
t.metadata.SnapshotByID(id) }
+func (t Table) SnapshotByName(name string) *Snapshot { return 
t.metadata.SnapshotByName(name) }

Review Comment:
   ok thanks for clarifying



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat(tables): add basic table implementation [iceberg-go]

2023-10-13 Thread via GitHub



nastra commented on code in PR #11:
URL: https://github.com/apache/iceberg-go/pull/11#discussion_r1358032220


##
table/table.go:
##
@@ -0,0 +1,97 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package table
+
+import (
+   "reflect"
+
+   "github.com/apache/iceberg-go"
+   "github.com/apache/iceberg-go/io"
+   "golang.org/x/exp/slices"
+)
+
+type Identifier = []string
+
+type Table struct {
+   identifier   Identifier
+   metadata Metadata
+   metadataLocation string
+   fs   io.IO
+}
+
+func (t Table) Equals(other Table) bool {
+   return slices.Equal(t.identifier, other.identifier) &&
+   t.metadataLocation == other.metadataLocation &&
+   reflect.DeepEqual(t.metadata, other.metadata)

Review Comment:
   I think we can go ahead and merge this PR, but this is something to keep in 
mind for a follow-up



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat(tables): add basic table implementation [iceberg-go]

2023-10-13 Thread via GitHub



nastra merged PR #11:
URL: https://github.com/apache/iceberg-go/pull/11


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add spec_id back to data file [iceberg-python]

2023-10-13 Thread via GitHub



Fokko commented on code in PR #63:
URL: https://github.com/apache/iceberg-python/pull/63#discussion_r1358033938


##
pyiceberg/manifest.py:
##
@@ -182,6 +182,7 @@ def __repr__(self) -> str:
 doc="Splittable offsets",
 ),
 NestedField(field_id=140, name="sort_order_id", 
field_type=IntegerType(), required=False, doc="Sort order ID"),
+NestedField(field_id=141, name="spec_id", field_type=IntegerType(), 
required=False, doc="Partition spec ID"),

Review Comment:
   I don't think we want to add the fields here since the `spec_id` isn't 
read/written.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] [BUG] string row filter ignore 2nd (and onwards) And [iceberg-python]

2023-10-13 Thread via GitHub



amogh-jahagirdar commented on issue #64:
URL: https://github.com/apache/iceberg-python/issues/64#issuecomment-1761229740

   Hm yeah I can repro this with a simple test in `table/test_init.py`
   
   ```
 scan = table.scan(row_filter="x=1 AND y=1 AND z=1")
 assert scan.row_filter == And(EqualTo("x", 1), EqualTo("y", 1), 
EqualTo("z", 1))
   ```
   
   this fails the assertion because the  row filter ends up being 
And(EqualTo("x", 1), EqualTo("y", 1)) for some reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] [BUG] string row filter ignore 2nd (and onwards) And [iceberg-python]

2023-10-13 Thread via GitHub



amogh-jahagirdar commented on issue #64:
URL: https://github.com/apache/iceberg-python/issues/64#issuecomment-1761230308

   Looking into this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] [BUG] string row filter ignore 2nd (and onwards) And [iceberg-python]

2023-10-13 Thread via GitHub



amogh-jahagirdar commented on issue #64:
URL: https://github.com/apache/iceberg-python/issues/64#issuecomment-1761250090

   
https://github.com/apache/iceberg-python/blob/main/pyiceberg/expressions/parser.py#L236
 this is where it'll skip anything after. I think this needs to combine all 
results in a single And (same applies for Or). But I need to do some testing 
with different combinations, not sure what how PyParse actually passes in the 
ParseResults when it visits the expression


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]

2023-10-13 Thread via GitHub



nastra commented on code in PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#discussion_r1358102450


##
flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java:
##
@@ -98,6 +100,62 @@ public void testTwoLevelList() throws IOException {
 }
   }
 
+  @Test
+  public void testReadBinaryFieldAsString() throws IOException {
+Schema schemaForWriteBinary = new Schema(optional(1, "strbytes", 
Types.BinaryType.get()));
+org.apache.avro.Schema avroSchema = 
AvroSchemaUtil.convert(schemaForWriteBinary.asStruct());
+
+File testFile = temp.newFile();
+Assert.assertTrue(testFile.delete());

Review Comment:
   please use AssertJ-style assertions for newly added tests 
(https://iceberg.apache.org/contribute/#assertj). This makes it easier to 
migrate from JUnit4 to JUnit5



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Fix compiler warnings [iceberg]

2023-10-13 Thread via GitHub



nastra merged PR #8763:
URL: https://github.com/apache/iceberg/pull/8763


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Python: Add support for Python 3.12 [iceberg-python]

2023-10-13 Thread via GitHub



Fokko commented on PR #35:
URL: https://github.com/apache/iceberg-python/pull/35#issuecomment-1761321623

   @steinsgateted can you pull in the main branch? It looks like that 3.12 is 
available: https://github.com/actions/python-versions/releases


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink, Aliyun, MR, Delta-lake, Hive3, Parquet, Data: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



Fokko merged PR #8819:
URL: https://github.com/apache/iceberg/pull/8819


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink, Aliyun, MR, Delta-lake, Hive3, Parquet, Data: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



Fokko commented on PR #8819:
URL: https://github.com/apache/iceberg/pull/8819#issuecomment-1761324612

   Thanks @PickBas for picking this up 🙌  and @nastra and @ajantha-bhat for the 
review 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]

2023-10-13 Thread via GitHub



amogh-jahagirdar opened a new pull request, #65:
URL: https://github.com/apache/iceberg-python/pull/65

   Fixes #64 . If there are multiple and/or conditions currently, our 
expression parser will ignore anything after the second predicate. This change 
fixes the issue by forwarding the remaining predicates as the argument to 
`*rest: BooleanExpression` for And and Or.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Run dependabot daily [iceberg-python]

2023-10-13 Thread via GitHub



Fokko opened a new pull request, #66:
URL: https://github.com/apache/iceberg-python/pull/66

   I would love to run dependabot daily instead of weekly. Now we're in our own 
repository, we'll introduce less noise. This would have helped us to identify 
the issue with Pydantic earlier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]

2023-10-13 Thread via GitHub



amogh-jahagirdar commented on code in PR #65:
URL: https://github.com/apache/iceberg-python/pull/65#discussion_r1358113965


##
pyiceberg/expressions/parser.py:
##
@@ -233,10 +233,14 @@ def handle_not(result: ParseResults) -> Not:
 
 
 def handle_and(result: ParseResults) -> And:
+if len(result[0]) > 2:
+return And(result[0][0], result[0][1], *result[0][2:])
 return And(result[0][0], result[0][1])

Review Comment:
   Uh maybe not the most pythonic...I guess a ternary which sets the *args 
option if it's > 2 otherwise AlwaysTrue (for And) and then for Or would be 
AlwaysFalse so it just considers the first 2.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]

2023-10-13 Thread via GitHub



Fokko commented on PR #65:
URL: https://github.com/apache/iceberg-python/pull/65#issuecomment-1761329387

   This is a serious one, thanks for reporting @puchengy


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]

2023-10-13 Thread via GitHub



Fokko commented on code in PR #65:
URL: https://github.com/apache/iceberg-python/pull/65#discussion_r1358122959


##
pyiceberg/expressions/parser.py:
##
@@ -233,10 +233,14 @@ def handle_not(result: ParseResults) -> Not:
 
 
 def handle_and(result: ParseResults) -> And:
+if len(result[0]) > 2:
+return And(result[0][0], result[0][1], *result[0][2:])
 return And(result[0][0], result[0][1])
 
 
 def handle_or(result: ParseResults) -> Or:
+if len(result[0]) > 2:
+return Or(result[0][0], result[0][1], *result[0][2:])
 return Or(result[0][0], result[0][1])

Review Comment:
   ```suggestion
   return Or(*result[0])
   ```



##
pyiceberg/expressions/parser.py:
##
@@ -233,10 +233,14 @@ def handle_not(result: ParseResults) -> Not:
 
 
 def handle_and(result: ParseResults) -> And:
+if len(result[0]) > 2:
+return And(result[0][0], result[0][1], *result[0][2:])
 return And(result[0][0], result[0][1])

Review Comment:
   What do you think of:
   ```suggestion
   return And(*result[0])
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: suport read/write Manifest [iceberg-rust]

2023-10-13 Thread via GitHub



liurenjie1024 commented on code in PR #79:
URL: https://github.com/apache/iceberg-rust/pull/79#discussion_r1358133729


##
crates/iceberg/src/spec/values.rs:
##
@@ -966,6 +978,547 @@ mod timestamptz {
 }
 }
 
+mod serde {

Review Comment:
   I'm ok with this approach, cc @JanKaul @Xuanwo 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]

2023-10-13 Thread via GitHub



amogh-jahagirdar commented on code in PR #65:
URL: https://github.com/apache/iceberg-python/pull/65#discussion_r1358137792


##
pyiceberg/expressions/parser.py:
##
@@ -233,10 +233,14 @@ def handle_not(result: ParseResults) -> Not:
 
 
 def handle_and(result: ParseResults) -> And:
+if len(result[0]) > 2:
+return And(result[0][0], result[0][1], *result[0][2:])
 return And(result[0][0], result[0][1])

Review Comment:
   I realized I can just pass in *result[0] 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: First version of rest catalog. [iceberg-rust]

2023-10-13 Thread via GitHub



liurenjie1024 commented on code in PR #78:
URL: https://github.com/apache/iceberg-rust/pull/78#discussion_r1357791341


##
crates/iceberg/Cargo.toml:
##
@@ -41,20 +41,24 @@ either = "1"
 futures = "0.3"
 itertools = "0.11"
 lazy_static = "1"
+log = "^0.4"
 murmur3 = "0.5.2"
 once_cell = "1"
 opendal = "0.40"
 ordered-float = "4.0.0"
+reqwest = { version = "^0.11", features = ["json"] }

Review Comment:
   > The problem with the features is that they are add-only, making it 
difficult for users to disable them.
   
   Sorry, I don't get your point, would you give a concrete example? 
   
   One concern with separate crate approach is that it makes loading catalog 
dynamically like Python difficult.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]

2023-10-13 Thread via GitHub



amogh-jahagirdar commented on code in PR #65:
URL: https://github.com/apache/iceberg-python/pull/65#discussion_r1358138359


##
pyiceberg/expressions/parser.py:
##
@@ -233,10 +233,14 @@ def handle_not(result: ParseResults) -> Not:
 
 
 def handle_and(result: ParseResults) -> And:
+if len(result[0]) > 2:
+return And(result[0][0], result[0][1], *result[0][2:])
 return And(result[0][0], result[0][1])

Review Comment:
   Haha just came to the realization I can do this and don't need to 
differentiate based on the lengthpushed up the change. Way better



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]

2023-10-13 Thread via GitHub



amogh-jahagirdar commented on PR #65:
URL: https://github.com/apache/iceberg-python/pull/65#issuecomment-1761366348

   Yes thanks @puchengy for reporting this, please also take a look a this fix 
when you get a chance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Make `location` in `TableCreation` optional [iceberg-rust]

2023-10-13 Thread via GitHub



Xuanwo commented on issue #67:
URL: https://github.com/apache/iceberg-rust/issues/67#issuecomment-1761422712

   I'm going to make this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: spark/*: replaced .size() > 0 with isEmpty() [iceberg]

2023-10-13 Thread via GitHub



amogh-jahagirdar merged PR #8814:
URL: https://github.com/apache/iceberg/pull/8814


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add View support for REST catalog [iceberg]

2023-10-13 Thread via GitHub



nastra commented on code in PR #7913:
URL: https://github.com/apache/iceberg/pull/7913#discussion_r1358250961


##
core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java:
##
@@ -374,4 +385,107 @@ static TableMetadata commit(TableOperations ops, 
UpdateTableRequest request) {
 
 return ops.current();
   }
+
+  public static BaseView baseView(View view) {
+Preconditions.checkArgument(view instanceof BaseView, "View must be a 
BaseView");
+return (BaseView) view;
+  }
+
+  public static ListTablesResponse listViews(ViewCatalog catalog, Namespace 
namespace) {
+return 
ListTablesResponse.builder().addAll(catalog.listViews(namespace)).build();
+  }
+
+  public static LoadViewResponse createView(
+  ViewCatalog catalog, Namespace namespace, CreateViewRequest request) {
+request.validate();
+
+ViewMetadata viewMetadata = request.metadata();
+ViewBuilder viewBuilder =
+catalog
+.buildView(TableIdentifier.of(namespace, request.name()))
+.withSchema(viewMetadata.schema())
+.withProperties(viewMetadata.properties())
+
.withDefaultNamespace(viewMetadata.currentVersion().defaultNamespace())
+
.withDefaultCatalog(viewMetadata.currentVersion().defaultCatalog());
+viewMetadata.currentVersion().representations().stream()
+.filter(r -> r instanceof SQLViewRepresentation)
+.map(r -> (SQLViewRepresentation) r)
+.forEach(r -> viewBuilder.withQuery(r.dialect(), r.sql()));
+View view = viewBuilder.create();
+
+return viewResponse(view);
+  }
+
+  private static LoadViewResponse viewResponse(View view) {
+ViewMetadata metadata = baseView(view).operations().current();
+return ImmutableLoadViewResponse.builder()
+.metadata(metadata)
+.metadataLocation(metadata.metadataFileLocation())
+.build();
+  }
+
+  public static LoadViewResponse loadView(ViewCatalog catalog, TableIdentifier 
viewIdentifier) {
+View view = catalog.loadView(viewIdentifier);
+return viewResponse(view);
+  }
+
+  public static LoadViewResponse updateView(
+  ViewCatalog catalog, TableIdentifier ident, UpdateTableRequest request) {
+View view = catalog.loadView(ident);
+ViewMetadata metadata = commit(baseView(view).operations(), request);
+
+return ImmutableLoadViewResponse.builder()
+.metadata(metadata)
+.metadataLocation(metadata.metadataFileLocation())
+.build();
+  }
+
+  public static void renameView(ViewCatalog catalog, RenameTableRequest 
request) {
+catalog.renameView(request.source(), request.destination());
+  }
+
+  public static void dropView(ViewCatalog catalog, TableIdentifier 
viewIdentifier) {
+boolean dropped = catalog.dropView(viewIdentifier);
+if (!dropped) {
+  throw new NoSuchViewException("View does not exist: %s", viewIdentifier);
+}
+  }
+
+  static ViewMetadata commit(ViewOperations ops, UpdateTableRequest request) {
+AtomicBoolean isRetry = new AtomicBoolean(false);
+try {
+  Tasks.foreach(ops)
+  .retry(COMMIT_NUM_RETRIES_DEFAULT)
+  .exponentialBackoff(
+  COMMIT_MIN_RETRY_WAIT_MS_DEFAULT,
+  COMMIT_MAX_RETRY_WAIT_MS_DEFAULT,
+  COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT,
+  2.0 /* exponential */)
+  .onlyRetryOn(CommitFailedException.class)
+  .run(
+  taskOps -> {
+ViewMetadata base = isRetry.get() ? taskOps.refresh() : 
taskOps.current();
+isRetry.set(true);
+
+// apply changes
+ViewMetadata.Builder metadataBuilder = 
ViewMetadata.buildFrom(base);
+request.updates().forEach(update -> 
update.applyTo(metadataBuilder));

Review Comment:
   > 1.) For creating a view, there probably needs to be a requirement similar 
to table's AssertTableDoesNotExist but for views, AssertViewDoesNotExist. 
Similar principle for ReplaceView and verifying the UUID as expected.
   
   For tables I believe `UpdateRequirement.AssertTableDoesNotExist` is mainly 
used for transactional cases, whereas for views we don't have transaction 
support. I think what would make sense is to have an assertion on the UUID of 
the view.
   
   > 2.) I don't see a metadata update for AddViewRepresentation but I'd 
imagine that should validate that the SQL dialect does not already exist for 
the specified version. I forgot where we landed on that discussion, We do want 
to prevent that right?
   
   We do have `MetadataUpdate.AddViewVersion` for this and we do validate now 
that there's no duplicate dialect (which was fixed as part of #7880), so we 
should be good here.
   
   > 3.) A requirement for the schema ID to make sure that is unchanged from 
the base metadata?
   
   Typically this would be done via `UpdateRequirement.AssertCurrentSchemaID` 
when there's a `MetadataUpdate.SetCurrentSchema` update, but for Views we don't 
send a `Metada

[PR] Build: add gradle configuration to enforce reproducible build [iceberg]

2023-10-13 Thread via GitHub



jbonofre opened a new pull request, #8826:
URL: https://github.com/apache/iceberg/pull/8826

   Close #8825 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.15 [iceberg]

2023-10-13 Thread via GitHub



jbonofre commented on PR #8238:
URL: https://github.com/apache/iceberg/pull/8238#issuecomment-1761561894

   I propose to close this PR in favor of #8830 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Upgrade to spring-web 5.3.30 [iceberg]

2023-10-13 Thread via GitHub



jbonofre opened a new pull request, #8828:
URL: https://github.com/apache/iceberg/pull/8828

   Close #8827 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Build: enforce reproducible build [iceberg]

2023-10-13 Thread via GitHub



jbonofre opened a new issue, #8825:
URL: https://github.com/apache/iceberg/issues/8825

   ### Feature Request / Improvement
   
   Reproducible builds is a development practice that create an 
independently-verifiable path from source to binary code.
   A build is reproducible if given the same source code, build environment and 
build instructions, any party can recreate
   bit-by-bit identical copies of all specified artifacts.
   
   Fortunately, Gradle supports reproducible build with a couple of 
configuration on `AbstractArchiveTask`.
   
   ### Query engine
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]

2023-10-13 Thread via GitHub



nastra commented on code in PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#discussion_r1358326874


##
flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java:
##
@@ -90,17 +92,72 @@ public void testTwoLevelList() throws IOException {
 .createReaderFunc(type -> FlinkParquetReaders.buildReader(schema, 
type))
 .build()) {
   Iterator rows = reader.iterator();
-  Assert.assertTrue("Should have at least one row", rows.hasNext());
+  assertThat(rows.hasNext()).as("Should have at least one row").isTrue();
   RowData rowData = rows.next();
-  Assert.assertArrayEquals(rowData.getArray(0).getBinary(0), expectedByte);
-  Assert.assertArrayEquals(rowData.getBinary(1), expectedByte);
-  Assert.assertFalse("Should not have more than one row", rows.hasNext());
+  assertThat(expectedByte).isEqualTo(rowData.getArray(0).getBinary(0));
+  assertThat(expectedByte).isEqualTo(rowData.getBinary(1));
+  assertThat(rows.hasNext()).as("Should not have more than one 
row").isFalse();
+}
+  }
+
+  @Test
+  public void testReadBinaryFieldAsString() throws IOException {
+Schema schemaForWriteBinary = new Schema(optional(1, "strbytes", 
Types.BinaryType.get()));
+org.apache.avro.Schema avroSchema = 
AvroSchemaUtil.convert(schemaForWriteBinary.asStruct());
+
+File testFile = temp.newFile();
+assertThat(testFile.delete()).isTrue();
+
+ParquetWriter writer =
+AvroParquetWriter.builder(new Path(testFile.toURI()))
+.withDataModel(GenericData.get())
+.withSchema(avroSchema)
+.build();
+
+String expectedString = "hello";
+
+GenericRecordBuilder recordBuilder = new GenericRecordBuilder(avroSchema);
+ByteBuffer expectedBinary = 
ByteBuffer.wrap(expectedString.getBytes(StandardCharsets.UTF_8));
+recordBuilder.set("strbytes", expectedBinary);
+GenericData.Record expectedRecord = recordBuilder.build();
+
+writer.write(expectedRecord);
+writer.close();
+
+// read as string
+Schema schemaForReadBinaryAsString =
+new Schema(optional(1, "strbytes", Types.StringType.get()));
+try (CloseableIterable reader =
+Parquet.read(Files.localInput(testFile))
+.project(schemaForReadBinaryAsString)
+.createReaderFunc(
+type -> 
FlinkParquetReaders.buildReader(schemaForReadBinaryAsString, type))
+.build()) {
+  Iterator rows = reader.iterator();
+  assertThat(rows.hasNext()).as("Should have at least one row").isTrue();

Review Comment:
   this should typically be `assertThat(rows).as("...").hasNext()`. The reason 
for that is that AssertJ provides additional context when an assertion fails 
and such fluent-style checks are usually preferred over the 
`.isTrue()`/`.isFalse()` checks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]

2023-10-13 Thread via GitHub



nastra commented on code in PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#discussion_r1358327260


##
flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java:
##
@@ -90,17 +92,72 @@ public void testTwoLevelList() throws IOException {
 .createReaderFunc(type -> FlinkParquetReaders.buildReader(schema, 
type))
 .build()) {
   Iterator rows = reader.iterator();
-  Assert.assertTrue("Should have at least one row", rows.hasNext());
+  assertThat(rows.hasNext()).as("Should have at least one row").isTrue();
   RowData rowData = rows.next();
-  Assert.assertArrayEquals(rowData.getArray(0).getBinary(0), expectedByte);
-  Assert.assertArrayEquals(rowData.getBinary(1), expectedByte);
-  Assert.assertFalse("Should not have more than one row", rows.hasNext());
+  assertThat(expectedByte).isEqualTo(rowData.getArray(0).getBinary(0));
+  assertThat(expectedByte).isEqualTo(rowData.getBinary(1));
+  assertThat(rows.hasNext()).as("Should not have more than one 
row").isFalse();
+}
+  }
+
+  @Test
+  public void testReadBinaryFieldAsString() throws IOException {
+Schema schemaForWriteBinary = new Schema(optional(1, "strbytes", 
Types.BinaryType.get()));
+org.apache.avro.Schema avroSchema = 
AvroSchemaUtil.convert(schemaForWriteBinary.asStruct());
+
+File testFile = temp.newFile();
+assertThat(testFile.delete()).isTrue();
+
+ParquetWriter writer =
+AvroParquetWriter.builder(new Path(testFile.toURI()))
+.withDataModel(GenericData.get())
+.withSchema(avroSchema)
+.build();
+
+String expectedString = "hello";
+
+GenericRecordBuilder recordBuilder = new GenericRecordBuilder(avroSchema);
+ByteBuffer expectedBinary = 
ByteBuffer.wrap(expectedString.getBytes(StandardCharsets.UTF_8));
+recordBuilder.set("strbytes", expectedBinary);
+GenericData.Record expectedRecord = recordBuilder.build();
+
+writer.write(expectedRecord);
+writer.close();
+
+// read as string
+Schema schemaForReadBinaryAsString =
+new Schema(optional(1, "strbytes", Types.StringType.get()));
+try (CloseableIterable reader =
+Parquet.read(Files.localInput(testFile))
+.project(schemaForReadBinaryAsString)
+.createReaderFunc(
+type -> 
FlinkParquetReaders.buildReader(schemaForReadBinaryAsString, type))
+.build()) {
+  Iterator rows = reader.iterator();
+  assertThat(rows.hasNext()).as("Should have at least one row").isTrue();
+  RowData rowData = rows.next();
+  assertThat(rowData.getString(0)).isInstanceOf(BinaryStringData.class);
+  assertThat(rowData.getString(0).toString()).isEqualTo(expectedString);
+  assertThat(rows.hasNext()).as("Should not have more than one 
row").isFalse();

Review Comment:
   this would be `assertThat(rows).as("...").isExhausted()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]

2023-10-13 Thread via GitHub



nastra commented on code in PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#discussion_r1358331014


##
flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java:
##
@@ -90,17 +92,72 @@ public void testTwoLevelList() throws IOException {
 .createReaderFunc(type -> FlinkParquetReaders.buildReader(schema, 
type))
 .build()) {
   Iterator rows = reader.iterator();
-  Assert.assertTrue("Should have at least one row", rows.hasNext());
+  assertThat(rows.hasNext()).as("Should have at least one row").isTrue();
   RowData rowData = rows.next();
-  Assert.assertArrayEquals(rowData.getArray(0).getBinary(0), expectedByte);
-  Assert.assertArrayEquals(rowData.getBinary(1), expectedByte);
-  Assert.assertFalse("Should not have more than one row", rows.hasNext());
+  assertThat(expectedByte).isEqualTo(rowData.getArray(0).getBinary(0));
+  assertThat(expectedByte).isEqualTo(rowData.getBinary(1));
+  assertThat(rows.hasNext()).as("Should not have more than one 
row").isFalse();
+}
+  }
+
+  @Test
+  public void testReadBinaryFieldAsString() throws IOException {
+Schema schemaForWriteBinary = new Schema(optional(1, "strbytes", 
Types.BinaryType.get()));
+org.apache.avro.Schema avroSchema = 
AvroSchemaUtil.convert(schemaForWriteBinary.asStruct());
+
+File testFile = temp.newFile();
+assertThat(testFile.delete()).isTrue();
+
+ParquetWriter writer =
+AvroParquetWriter.builder(new Path(testFile.toURI()))
+.withDataModel(GenericData.get())
+.withSchema(avroSchema)
+.build();
+
+String expectedString = "hello";
+
+GenericRecordBuilder recordBuilder = new GenericRecordBuilder(avroSchema);
+ByteBuffer expectedBinary = 
ByteBuffer.wrap(expectedString.getBytes(StandardCharsets.UTF_8));
+recordBuilder.set("strbytes", expectedBinary);
+GenericData.Record expectedRecord = recordBuilder.build();
+
+writer.write(expectedRecord);
+writer.close();

Review Comment:
   rather than calling `close()` manually, I think it would be better to use 
`try-with-resources` here:
   
   ```
   GenericRecordBuilder recordBuilder = new GenericRecordBuilder(avroSchema);
   ByteBuffer expectedBinary = 
ByteBuffer.wrap(expectedString.getBytes(StandardCharsets.UTF_8));
   recordBuilder.set("strbytes", expectedBinary);
   GenericData.Record expectedRecord = recordBuilder.build();
   
   try (ParquetWriter writer =
   AvroParquetWriter.builder(new Path(testFile.toURI()))
   .withDataModel(GenericData.get())
   .withSchema(avroSchema)
   .build()) {
 writer.write(expectedRecord);
   }
   ```



##
flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java:
##
@@ -90,17 +92,72 @@ public void testTwoLevelList() throws IOException {
 .createReaderFunc(type -> FlinkParquetReaders.buildReader(schema, 
type))
 .build()) {
   Iterator rows = reader.iterator();
-  Assert.assertTrue("Should have at least one row", rows.hasNext());
+  assertThat(rows.hasNext()).as("Should have at least one row").isTrue();
   RowData rowData = rows.next();
-  Assert.assertArrayEquals(rowData.getArray(0).getBinary(0), expectedByte);
-  Assert.assertArrayEquals(rowData.getBinary(1), expectedByte);
-  Assert.assertFalse("Should not have more than one row", rows.hasNext());
+  assertThat(expectedByte).isEqualTo(rowData.getArray(0).getBinary(0));
+  assertThat(expectedByte).isEqualTo(rowData.getBinary(1));
+  assertThat(rows.hasNext()).as("Should not have more than one 
row").isFalse();
+}
+  }
+
+  @Test
+  public void testReadBinaryFieldAsString() throws IOException {
+Schema schemaForWriteBinary = new Schema(optional(1, "strbytes", 
Types.BinaryType.get()));
+org.apache.avro.Schema avroSchema = 
AvroSchemaUtil.convert(schemaForWriteBinary.asStruct());
+
+File testFile = temp.newFile();
+assertThat(testFile.delete()).isTrue();
+
+ParquetWriter writer =
+AvroParquetWriter.builder(new Path(testFile.toURI()))
+.withDataModel(GenericData.get())
+.withSchema(avroSchema)
+.build();
+
+String expectedString = "hello";
+
+GenericRecordBuilder recordBuilder = new GenericRecordBuilder(avroSchema);
+ByteBuffer expectedBinary = 
ByteBuffer.wrap(expectedString.getBytes(StandardCharsets.UTF_8));
+recordBuilder.set("strbytes", expectedBinary);
+GenericData.Record expectedRecord = recordBuilder.build();
+
+writer.write(expectedRecord);
+writer.close();

Review Comment:
   rather than calling `close()` manually, I think it would be better to use 
`try-with-resources` here:
   
   ```
   GenericRecordBuilder recordBuilder = new 
GenericRecordBuilder(avroSchema);
   ByteBuffer expectedBinary = 
ByteBuffer.wrap(e

Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.17 [iceberg]

2023-10-13 Thread via GitHub



jbonofre commented on PR #8788:
URL: https://github.com/apache/iceberg/pull/8788#issuecomment-1761560686

   I propose to close this PR in favor of #8830 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.15 [iceberg]

2023-10-13 Thread via GitHub



ajantha-bhat closed pull request #8238: Build: Bump jetty from 9.4.43.v20210629 
to 11.0.15
URL: https://github.com/apache/iceberg/pull/8238


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.15 [iceberg]

2023-10-13 Thread via GitHub



ajantha-bhat commented on PR #8238:
URL: https://github.com/apache/iceberg/pull/8238#issuecomment-1761663164

   > I propose to close this PR in favor of 
https://github.com/apache/iceberg/pull/8830
   
   Agree. Thanks for raising the PR to support the last JDK8 version. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.15 [iceberg]

2023-10-13 Thread via GitHub



dependabot[bot] commented on PR #8238:
URL: https://github.com/apache/iceberg/pull/8238#issuecomment-1761663265

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. You can also ignore all major, minor, or patch 
releases for a dependency by adding an [`ignore` 
condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore)
 with the desired `update_types` to your config file.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.17 [iceberg]

2023-10-13 Thread via GitHub



dependabot[bot] commented on PR #8788:
URL: https://github.com/apache/iceberg/pull/8788#issuecomment-1761667373

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. You can also ignore all major, minor, or patch 
releases for a dependency by adding an [`ignore` 
condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore)
 with the desired `update_types` to your config file.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.17 [iceberg]

2023-10-13 Thread via GitHub



ajantha-bhat commented on PR #8788:
URL: https://github.com/apache/iceberg/pull/8788#issuecomment-1761667236

   I propose to close this PR in favor of 
https://github.com/apache/iceberg/pull/8830
   
   Agree. Thanks for raising the PR to support the last JDK8 version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.17 [iceberg]

2023-10-13 Thread via GitHub



ajantha-bhat closed pull request #8788: Build: Bump jetty from 9.4.43.v20210629 
to 11.0.17
URL: https://github.com/apache/iceberg/pull/8788


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump org.springframework:spring-web from 5.3.9 to 6.0.13 [iceberg]

2023-10-13 Thread via GitHub



ajantha-bhat closed pull request #8811: Build: Bump 
org.springframework:spring-web from 5.3.9 to 6.0.13
URL: https://github.com/apache/iceberg/pull/8811


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump org.springframework:spring-web from 5.3.9 to 6.0.13 [iceberg]

2023-10-13 Thread via GitHub



dependabot[bot] commented on PR #8811:
URL: https://github.com/apache/iceberg/pull/8811#issuecomment-1761669389

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. If you'd rather skip all updates until the next 
major or minor version, let me know by commenting `@dependabot ignore this 
major version` or `@dependabot ignore this minor version`. You can also ignore 
all major, minor, or patch releases for a dependency by adding an [`ignore` 
condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore)
 with the desired `update_types` to your config file.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump org.springframework:spring-web from 5.3.9 to 6.0.13 [iceberg]

2023-10-13 Thread via GitHub



ajantha-bhat commented on PR #8811:
URL: https://github.com/apache/iceberg/pull/8811#issuecomment-1761669307

   Closing in the favour of https://github.com/apache/iceberg/pull/8828


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-13 Thread via GitHub



huaxingao commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761680652

   @atifiu File statistics are not accurate and can't be used any more if you 
use filters.
   
   For example, you have table (col int), the max of col is 100, and the min is 
0, so the statistics file is
   ```
   max   min
   1001
   ```
   If you have `SELECT MAX(col) FROM table`, we can check the statistics file 
and simple return 100, but if you have `SELECT MAX(col) FROM table WHERE col < 
70`, we can't use the statistics file any more. We only know that the 
`MAX(col)` is smaller than 70, but we have no idea what value it is, so have to 
compute.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-13 Thread via GitHub



atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761688659

   @huaxingao so you meant to say that with filters whether on partitioned or 
non partitioned column(s), aggregate pushdown will not work ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]

2023-10-13 Thread via GitHub



puchengy commented on PR #65:
URL: https://github.com/apache/iceberg-python/pull/65#issuecomment-1761695595

   @amogh-jahagirdar The test result LGTM. However, I am not familiar with the 
implementation, please feel free to go ahead and merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Upgrade to spring-web 5.3.30 [iceberg]

2023-10-13 Thread via GitHub



nastra merged PR #8828:
URL: https://github.com/apache/iceberg/pull/8828


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Upgrade to Jetty 9.4.53.v20231009 [iceberg]

2023-10-13 Thread via GitHub



nastra closed issue #8829: Upgrade to Jetty 9.4.53.v20231009
URL: https://github.com/apache/iceberg/issues/8829


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Upgrade to Jetty 9.4.53.v20231009 [iceberg]

2023-10-13 Thread via GitHub



nastra merged PR #8830:
URL: https://github.com/apache/iceberg/pull/8830


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Upgrade to spring-web 5.3.30 [iceberg]

2023-10-13 Thread via GitHub



nastra closed issue #8827: Upgrade to spring-web 5.3.30
URL: https://github.com/apache/iceberg/issues/8827


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-13 Thread via GitHub



huaxingao commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761909655

   If filters are on partitioned columns, aggregate pushdown should work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-13 Thread via GitHub



atifiu commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761916077

   It's not working. Either with between, > or <.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Replace `.size() > 0` with `!.isEmpty()` [iceberg]

2023-10-13 Thread via GitHub



amogh-jahagirdar merged PR #8813:
URL: https://github.com/apache/iceberg/pull/8813


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core: Add View support for REST catalog [iceberg]

2023-10-13 Thread via GitHub



nastra commented on code in PR #7913:
URL: https://github.com/apache/iceberg/pull/7913#discussion_r1358620356


##
core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java:
##
@@ -374,4 +385,107 @@ static TableMetadata commit(TableOperations ops, 
UpdateTableRequest request) {
 
 return ops.current();
   }
+
+  public static BaseView baseView(View view) {
+Preconditions.checkArgument(view instanceof BaseView, "View must be a 
BaseView");
+return (BaseView) view;
+  }
+
+  public static ListTablesResponse listViews(ViewCatalog catalog, Namespace 
namespace) {
+return 
ListTablesResponse.builder().addAll(catalog.listViews(namespace)).build();
+  }
+
+  public static LoadViewResponse createView(
+  ViewCatalog catalog, Namespace namespace, CreateViewRequest request) {
+request.validate();
+
+ViewMetadata viewMetadata = request.metadata();
+ViewBuilder viewBuilder =
+catalog
+.buildView(TableIdentifier.of(namespace, request.name()))
+.withSchema(viewMetadata.schema())
+.withProperties(viewMetadata.properties())
+
.withDefaultNamespace(viewMetadata.currentVersion().defaultNamespace())
+
.withDefaultCatalog(viewMetadata.currentVersion().defaultCatalog());
+viewMetadata.currentVersion().representations().stream()
+.filter(r -> r instanceof SQLViewRepresentation)
+.map(r -> (SQLViewRepresentation) r)
+.forEach(r -> viewBuilder.withQuery(r.dialect(), r.sql()));
+View view = viewBuilder.create();
+
+return viewResponse(view);
+  }
+
+  private static LoadViewResponse viewResponse(View view) {
+ViewMetadata metadata = baseView(view).operations().current();
+return ImmutableLoadViewResponse.builder()
+.metadata(metadata)
+.metadataLocation(metadata.metadataFileLocation())
+.build();
+  }
+
+  public static LoadViewResponse loadView(ViewCatalog catalog, TableIdentifier 
viewIdentifier) {
+View view = catalog.loadView(viewIdentifier);
+return viewResponse(view);
+  }
+
+  public static LoadViewResponse updateView(
+  ViewCatalog catalog, TableIdentifier ident, UpdateTableRequest request) {
+View view = catalog.loadView(ident);
+ViewMetadata metadata = commit(baseView(view).operations(), request);
+
+return ImmutableLoadViewResponse.builder()
+.metadata(metadata)
+.metadataLocation(metadata.metadataFileLocation())
+.build();
+  }
+
+  public static void renameView(ViewCatalog catalog, RenameTableRequest 
request) {
+catalog.renameView(request.source(), request.destination());
+  }
+
+  public static void dropView(ViewCatalog catalog, TableIdentifier 
viewIdentifier) {
+boolean dropped = catalog.dropView(viewIdentifier);
+if (!dropped) {
+  throw new NoSuchViewException("View does not exist: %s", viewIdentifier);
+}
+  }
+
+  static ViewMetadata commit(ViewOperations ops, UpdateTableRequest request) {
+AtomicBoolean isRetry = new AtomicBoolean(false);
+try {
+  Tasks.foreach(ops)
+  .retry(COMMIT_NUM_RETRIES_DEFAULT)
+  .exponentialBackoff(
+  COMMIT_MIN_RETRY_WAIT_MS_DEFAULT,
+  COMMIT_MAX_RETRY_WAIT_MS_DEFAULT,
+  COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT,
+  2.0 /* exponential */)
+  .onlyRetryOn(CommitFailedException.class)
+  .run(
+  taskOps -> {
+ViewMetadata base = isRetry.get() ? taskOps.refresh() : 
taskOps.current();
+isRetry.set(true);
+
+// apply changes
+ViewMetadata.Builder metadataBuilder = 
ViewMetadata.buildFrom(base);
+request.updates().forEach(update -> 
update.applyTo(metadataBuilder));

Review Comment:
   I've opened https://github.com/apache/iceberg/pull/8831 to introduce 
`AssertUUID` (and deprecate `AssertTableUUID`) so that we can use it both for 
tables and views. Alternatively, we could also introduce `AssertViewUUID` with 
the same functionality as `AssertTableUUID`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]

2023-10-13 Thread via GitHub



rdblue commented on code in PR #65:
URL: https://github.com/apache/iceberg-python/pull/65#discussion_r1358624936


##
pyiceberg/expressions/parser.py:
##
@@ -233,11 +233,11 @@ def handle_not(result: ParseResults) -> Not:
 
 
 def handle_and(result: ParseResults) -> And:
-return And(result[0][0], result[0][1])
+return And(*result[0])

Review Comment:
   Why does the parser produce more than 2 child expressions?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]

2023-10-13 Thread via GitHub



rdblue commented on code in PR #65:
URL: https://github.com/apache/iceberg-python/pull/65#discussion_r1358624936


##
pyiceberg/expressions/parser.py:
##
@@ -233,11 +233,11 @@ def handle_not(result: ParseResults) -> Not:
 
 
 def handle_and(result: ParseResults) -> And:
-return And(result[0][0], result[0][1])
+return And(*result[0])

Review Comment:
   Why does the parser produce more than 2 child expressions?
   
   Is this a result of using `infix_notation` that I didn't realize at the time?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]

2023-10-13 Thread via GitHub



rdblue commented on PR #65:
URL: https://github.com/apache/iceberg-python/pull/65#issuecomment-1761946877

   Thanks for jumping on the fix, @amogh-jahagirdar!
   
   @Fokko, should we release 0.5.1 with this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]

2023-10-13 Thread via GitHub



Fokko commented on PR #65:
URL: https://github.com/apache/iceberg-python/pull/65#issuecomment-1761997037

   > @Fokko, should we release 0.5.1 with this patch?
   
   Yes, I think that's a good idea


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]

2023-10-13 Thread via GitHub



Fokko merged PR #65:
URL: https://github.com/apache/iceberg-python/pull/65


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] [BUG] string row filter ignore 2nd (and onwards) And [iceberg-python]

2023-10-13 Thread via GitHub



Fokko closed issue #64: [BUG] string row filter ignore 2nd (and onwards) And
URL: https://github.com/apache/iceberg-python/issues/64


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add spec_id back to data file [iceberg-python]

2023-10-13 Thread via GitHub



Fokko commented on PR #63:
URL: https://github.com/apache/iceberg-python/pull/63#issuecomment-1762014542

   @puchengy can you fix the CI? We need to make this part of 0.5.1 since the 
`spec_id `was there before (as you pointed out on Slack :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] AvroWriter Issue: Incorrect iceberg_to_avro Schema Conversion for Decimal, Fixed, and UUID [iceberg-python]

2023-10-13 Thread via GitHub



Fokko closed issue #14: AvroWriter Issue: Incorrect iceberg_to_avro Schema 
Conversion for Decimal, Fixed, and UUID
URL: https://github.com/apache/iceberg-python/issues/14


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Fix Iceberg to Avro Schema Conversion: Fixed, Decimal, UUID [iceberg-python]

2023-10-13 Thread via GitHub



Fokko merged PR #53:
URL: https://github.com/apache/iceberg-python/pull/53


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add spec_id back to data file [iceberg-python]

2023-10-13 Thread via GitHub



puchengy commented on PR #63:
URL: https://github.com/apache/iceberg-python/pull/63#issuecomment-1762024206

   @Fokko Will do that. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add spec_id back to data file [iceberg-python]

2023-10-13 Thread via GitHub



Fokko commented on PR #63:
URL: https://github.com/apache/iceberg-python/pull/63#issuecomment-1762031697

   @puchengy 
   
   `make install && make lint` should fix it. It also looks like the 
integration test is failing because FastAvro is missing the `spec_id` (and this 
is correct because it isn't written, the test should be updated).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add spec_id back to data file [iceberg-python]

2023-10-13 Thread via GitHub



puchengy commented on PR #63:
URL: https://github.com/apache/iceberg-python/pull/63#issuecomment-1762044854

   @Fokko thanks I will address that today or tomorrow. If this becomes a 
blocker of the release, feel free to take it over.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add spec_id back to data file [iceberg-python]

2023-10-13 Thread via GitHub



Fokko commented on PR #63:
URL: https://github.com/apache/iceberg-python/pull/63#issuecomment-1762064930

   @puchengy let me give it a try


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add spec_id back to data file [iceberg-python]

2023-10-13 Thread via GitHub



Fokko commented on PR #63:
URL: https://github.com/apache/iceberg-python/pull/63#issuecomment-1762074280

   @puchengy I think this fixes it: 
https://github.com/puchengy/iceberg-python/pull/1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] push down min/max/count to iceberg [iceberg]

2023-10-13 Thread via GitHub



huaxingao commented on PR #6252:
URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1762149585

   @atifiu I suspect somehow your partition filter isn't completely pushed 
down. In this [PR](https://github.com/apache/iceberg/pull/6524), we will 
discard filters that can be completely evaluated using partition metadata. 
Could you check the EXPLAIN and see if there is still filter? If there is 
filter, then somehow the partition filter is not completely pushed down and 
Spark has to filter again, and statistics is not accurate in this case, and we 
can't push down aggregates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Update to the latest version [iceberg-python]

2023-10-13 Thread via GitHub



Fokko opened a new pull request, #67:
URL: https://github.com/apache/iceberg-python/pull/67

   Check if the dependencies are okay


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] allow override env-variables in load_catalog [iceberg-python]

2023-10-13 Thread via GitHub



Fokko merged PR #45:
URL: https://github.com/apache/iceberg-python/pull/45


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Bump version to 0.5.1 [iceberg-python]

2023-10-13 Thread via GitHub



Fokko opened a new pull request, #68:
URL: https://github.com/apache/iceberg-python/pull/68

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Check for empty responses [iceberg-python]

2023-10-13 Thread via GitHub



Fokko opened a new pull request, #69:
URL: https://github.com/apache/iceberg-python/pull/69

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Make `next_sequence_number` private [iceberg-python]

2023-10-13 Thread via GitHub



Fokko merged PR #62:
URL: https://github.com/apache/iceberg-python/pull/62


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Fix fixed type [iceberg-python]

2023-10-13 Thread via GitHub



Fokko opened a new pull request, #70:
URL: https://github.com/apache/iceberg-python/pull/70

   Should be `FIXED_LEN_BYTE_ARRAY`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-13 Thread via GitHub



barronw commented on code in PR #76:
URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1358866184


##
crates/iceberg/src/spec/manifest_list.rs:
##
@@ -940,4 +1017,104 @@ mod test {
 
r#"[{"manifest_path":"s3a://icebergdata/demo/s1/t1/metadata/05ffe08b-810f-49b3-a8f4-e88fc99b254a-m0.avro","manifest_length":6926,"partition_spec_id":0,"content":0,"sequence_number":1,"min_sequence_number":1,"added_snapshot_id":377075049360453639,"added_data_files_count":1,"existing_data_files_count":0,"deleted_data_files_count":0,"added_rows_count":3,"existing_rows_count":0,"deleted_rows_count":0,"partitions":[{"contains_null":false,"contains_nan":false,"lower_bound":[1,0,0,0,0,0,0,0],"upper_bound":[1,0,0,0,0,0,0,0]}],"key_metadata":null}]"#
 );
 }
+
+#[tokio::test]
+async fn test_manifest_list_writer_v1() {
+let expected_manifest_list = ManifestList {
+entries: vec![ManifestListEntry {
+manifest_path: 
"/opt/bitnami/spark/warehouse/db/table/metadata/10d28031-9739-484c-92db-cdf2975cead4-m0.avro".to_string(),
+manifest_length: 5806,
+partition_spec_id: 0,
+content: ManifestContentType::Data,
+sequence_number: 0,
+min_sequence_number: 0,
+added_snapshot_id: 1646658105718557341,
+added_data_files_count: Some(3),
+existing_data_files_count: Some(0),
+deleted_data_files_count: Some(0),
+added_rows_count: Some(3),
+existing_rows_count: Some(0),
+deleted_rows_count: Some(0),
+partitions: vec![],
+key_metadata: vec![],
+}]
+};
+
+let temp_dir = TempDir::new("manifest_list_v1").unwrap();
+let path = temp_dir.path().join("manifest_list_v1.avro");
+let io = FileIOBuilder::new_fs_io().build().unwrap();
+let output_file = io.new_output(path.to_str().unwrap()).unwrap();
+
+let mut metadata = HashMap::new();
+metadata.insert(String::from("format-version"), String::from("1"));
+let mut writer =
+ManifestListWriter::new(output_file, 
crate::spec::FormatVersion::V1, metadata);
+writer
+.add_manifests(expected_manifest_list.entries.clone().into_iter())
+.unwrap();
+writer.close().await.unwrap();
+
+let bs = fs::read(path).unwrap();
+let manifest_list = ManifestList::parse_with_version(
+&bs,
+crate::spec::FormatVersion::V1,
+&StructType::new(vec![]),

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: manifest list writer [iceberg-rust]

2023-10-13 Thread via GitHub



barronw commented on code in PR #76:
URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1358866633


##
crates/iceberg/src/spec/manifest_list.rs:
##
@@ -69,6 +73,25 @@ impl ManifestList {
 &self.entries
 }
 
+/// Get the v1 schema of the manifest list entry.
+pub(crate) fn v1_schema() -> Schema {

Review Comment:
   I moved the schema definitions into `_const_fields`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

1 2 >

1 - 100 of 111 matches

Mail list logo