[GitHub] [iceberg] Fokko merged pull request #6130: Build: Bump mkdocs from 1.4.1 to 1.4.2 in /python

2022-11-06 Thread GitBox


Fokko merged PR #6130:
URL: https://github.com/apache/iceberg/pull/6130


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6127: Python: Add expression evaluator

2022-11-06 Thread GitBox


rdblue commented on code in PR #6127:
URL: https://github.com/apache/iceberg/pull/6127#discussion_r1014889665


##
python/tests/expressions/test_evaluator.py:
##
@@ -0,0 +1,203 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from typing import Any, List, TypeVar
+
+from pyiceberg.expressions import AlwaysTrue, AlwaysFalse, LessThan, 
Reference, LessThanOrEqual, GreaterThan, \
+GreaterThanOrEqual, EqualTo, NotEqualTo, IsNaN, NotNaN, IsNull, NotNull, 
Not, And, Or, In, NotIn
+from pyiceberg.expressions.literals import literal
+from pyiceberg.expressions.visitors import evaluator
+from pyiceberg.files import StructProtocol
+from pyiceberg.schema import Schema
+from pyiceberg.types import NestedField, LongType, StringType, DoubleType
+
+V = TypeVar('V')
+
+
+class Column:

Review Comment:
   I fixed this by adding decorators that will coerce arguments before passing 
to the constructors of the frozen data classes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6127: Python: Add expression evaluator

2022-11-06 Thread GitBox


rdblue commented on code in PR #6127:
URL: https://github.com/apache/iceberg/pull/6127#discussion_r1014890146


##
python/pyiceberg/expressions/__init__.py:
##
@@ -281,6 +289,30 @@ def __invert__(self) -> BoundIsNull:
 return BoundIsNull(self.term)
 
 
+def coerce_unary_arguments(data_class: Union[type, None]) -> Union[type, 
Callable[[type], type]]:

Review Comment:
   @Fokko, this PR introduces 3 of these decorators to make it easy to 
construct predicates. This decorator replaces `__init__` with a version that 
converts a string passed as `term` into a `Reference`. This makes it much 
easier to create expressions, as you can see from the test cases.
   
   The other decorators handle set predicates and literal predicates.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6123: Python: Support creating a DateLiteral from a date (#6120)

2022-11-06 Thread GitBox


rdblue commented on code in PR #6123:
URL: https://github.com/apache/iceberg/pull/6123#discussion_r1014891415


##
python/pyiceberg/utils/datetime.py:
##
@@ -47,11 +47,16 @@ def micros_to_time(micros: int) -> time:
 return time(hour=hours, minute=minutes, second=seconds, 
microsecond=microseconds)
 
 
-def date_to_days(date_str: str) -> int:
+def date_str_to_days(date_str: str) -> int:

Review Comment:
   Does this need to be renamed? Why not support both with the same name to 
avoid a breaking change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #6118: Parquet, Core: Fix collection of Parquet metrics when column names co…

2022-11-06 Thread GitBox


rdblue commented on PR #6118:
URL: https://github.com/apache/iceberg/pull/6118#issuecomment-1304892034

   @zhongyujiang can you explain the fix a bit more clearly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue merged pull request #6110: API: Hash floats -0.0 and 0.0 to the same bucket

2022-11-06 Thread GitBox


rdblue merged PR #6110:
URL: https://github.com/apache/iceberg/pull/6110


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #6110: API: Hash floats -0.0 and 0.0 to the same bucket

2022-11-06 Thread GitBox


rdblue commented on PR #6110:
URL: https://github.com/apache/iceberg/pull/6110#issuecomment-1304892916

   Thanks, @fb913bf0de288ba84fe98f7a23d35edfdb22381! Looks great.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6093: Spark-3.0: Remove/update spark-3.0 mention from Docs and Builds

2022-11-06 Thread GitBox


rdblue commented on code in PR #6093:
URL: https://github.com/apache/iceberg/pull/6093#discussion_r1014892191


##
docs/aws.md:
##
@@ -488,7 +488,7 @@ disaster recovery, etc.
 For using cross-region access points, we need to additionally set 
`use-arn-region-enabled` catalog property to
 `true` to enable `S3FileIO` to make cross-region calls, it's not required for 
same / multi-region access points.
 
-For example, to use S3 access-point with Spark 3.0, you can start the Spark 
SQL shell with:
+For example, to use S3 access-point with Spark 3.3, you can start the Spark 
SQL shell with:

Review Comment:
   What about using 3.x? Or maybe just omit the version number?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue merged pull request #6093: Spark-3.0: Remove/update spark-3.0 mention from Docs and Builds

2022-11-06 Thread GitBox


rdblue merged PR #6093:
URL: https://github.com/apache/iceberg/pull/6093


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #6094: Spark-3.0: Remove spark/v3.0 folder

2022-11-06 Thread GitBox


rdblue commented on PR #6094:
URL: https://github.com/apache/iceberg/pull/6094#issuecomment-1304894017

   @ajantha-bhat, I merged #6093. Can you rebase this one?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #6108: SparkBatchQueryScan logs too much - #6106

2022-11-06 Thread GitBox


rdblue commented on PR #6108:
URL: https://github.com/apache/iceberg/pull/6108#issuecomment-1304894200

   Running CI again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue merged pull request #6076: Python: Replace mmh3 with mmhash3

2022-11-06 Thread GitBox


rdblue merged PR #6076:
URL: https://github.com/apache/iceberg/pull/6076


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #6076: Python: Replace mmh3 with mmhash3

2022-11-06 Thread GitBox


rdblue commented on PR #6076:
URL: https://github.com/apache/iceberg/pull/6076#issuecomment-1304894694

   Thanks, @Fokko!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #6064: Support 2-level list and maps type in RemoveIds.

2022-11-06 Thread GitBox


rdblue commented on PR #6064:
URL: https://github.com/apache/iceberg/pull/6064#issuecomment-1304895348

   Looks good to me. Merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue merged pull request #6064: Support 2-level list and maps type in RemoveIds.

2022-11-06 Thread GitBox


rdblue merged PR #6064:
URL: https://github.com/apache/iceberg/pull/6064


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue merged pull request #6065: Fix TestAggregateBinding

2022-11-06 Thread GitBox


rdblue merged PR #6065:
URL: https://github.com/apache/iceberg/pull/6065


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #6065: Fix TestAggregateBinding

2022-11-06 Thread GitBox


rdblue commented on PR #6065:
URL: https://github.com/apache/iceberg/pull/6065#issuecomment-1304895651

   Thanks, @huaxingao!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-06 Thread GitBox


rdblue commented on code in PR #6058:
URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014893653


##
core/src/main/java/org/apache/iceberg/BaseTableScan.java:
##
@@ -141,6 +142,8 @@ public CloseableIterable planFiles() {
   doPlanFiles(),
   () -> {
 planningDuration.stop();
+Map metadata = 
Maps.newHashMap(context().options());
+metadata.putAll(EnvironmentContext.get());

Review Comment:
   How is it helpful? I see that metadata is generic, but I don't understand 
the value of mixing these together, versus having a separate context map and 
options map.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-06 Thread GitBox


rdblue commented on code in PR #6058:
URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014893913


##
core/src/main/java/org/apache/iceberg/EnvironmentContext.java:
##
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.util.Map;
+import java.util.function.Supplier;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+
+public class EnvironmentContext {
+  public static final String ENGINE_NAME = "engine-name";
+  public static final String ENGINE_VERSION = "engine-version";
+
+  private EnvironmentContext() {}
+
+  private static final Map PROPERTIES = 
Maps.newConcurrentMap();
+  private static final ThreadLocal>> 
THREAD_LOCAL_PROPERTIES =
+  ThreadLocal.withInitial(Maps::newConcurrentMap);

Review Comment:
   Why put these inside a `ThreadLocal`? Then these need to be defined for 
every thread. Instead, you could use the fact that the `Supplier` should return 
the correct value for any thread and just use a simple map.
   
   Using a `Supplier` makes it so someone else has to handle the `ThreadLocal`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #6056: Parquet: Remove the row position since parquet row group has it natively

2022-11-06 Thread GitBox


rdblue commented on PR #6056:
URL: https://github.com/apache/iceberg/pull/6056#issuecomment-1304896473

   Do we trust this value from Parquet?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6073: Core: Pass purgeRequested flag to REST server

2022-11-06 Thread GitBox


rdblue commented on code in PR #6073:
URL: https://github.com/apache/iceberg/pull/6073#discussion_r1014894248


##
core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java:
##
@@ -222,13 +222,27 @@ public static LoadTableResponse createTable(
 throw new IllegalStateException("Cannot wrap catalog that does not produce 
BaseTable");
   }
 
+  /**
+   * @param catalog The catalog instance
+   * @param ident The table identifier of the table to be dropped.
+   * @deprecated Will be removed in 1.2.0, use {@link 
CatalogHandlers#dropTable(Catalog,
+   * TableIdentifier, boolean)}.
+   */
+  @Deprecated

Review Comment:
   Rather than adding a boolean option, can we have two methods, `dropTable` 
and `purgeTable`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6073: Core: Pass purgeRequested flag to REST server

2022-11-06 Thread GitBox


rdblue commented on code in PR #6073:
URL: https://github.com/apache/iceberg/pull/6073#discussion_r1014894712


##
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java:
##
@@ -799,6 +799,23 @@ public void testDropTable() {
 Assert.assertFalse("Table should not exist after drop", 
catalog.tableExists(TABLE));
   }
 
+  @Test
+  public void testDropTableWithPurge() {
+C catalog = catalog();
+
+if (requiresNamespaceCreate()) {
+  catalog.createNamespace(NS);
+}
+
+Assertions.assertThat(catalog.tableExists(TABLE)).isFalse();
+
+catalog.buildTable(TABLE, SCHEMA).create();
+Assertions.assertThat(catalog.tableExists(TABLE)).isTrue();
+
+Assertions.assertThat(catalog.dropTable(TABLE, true)).isTrue();
+Assertions.assertThat(catalog.tableExists(TABLE)).isFalse();

Review Comment:
   This looks copied from the test above. Can you copy the assertions with 
context strings?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6073: Core: Pass purgeRequested flag to REST server

2022-11-06 Thread GitBox


rdblue commented on code in PR #6073:
URL: https://github.com/apache/iceberg/pull/6073#discussion_r1014894879


##
core/src/test/java/org/apache/iceberg/rest/RESTCatalogAdapter.java:
##
@@ -320,7 +321,10 @@ public  T handleRequest(
 
   case DROP_TABLE:
 {
-  CatalogHandlers.dropTable(catalog, identFromPathVars(vars));
+  CatalogHandlers.dropTable(
+  catalog,
+  identFromPathVars(vars),
+  PropertyUtil.propertyAsBoolean(vars, "purgeRequested", false));

Review Comment:
   You mean always using `purge=true`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6073: Core: Pass purgeRequested flag to REST server

2022-11-06 Thread GitBox


rdblue commented on code in PR #6073:
URL: https://github.com/apache/iceberg/pull/6073#discussion_r1014894952


##
core/src/test/java/org/apache/iceberg/rest/TestHTTPClient.java:
##
@@ -219,7 +219,7 @@ private static Item doExecuteRequest(
 restClient.head(path, headers, onError);
 return null;
   case DELETE:
-return restClient.delete(path, Item.class, headers, onError);
+return restClient.delete(path, Item.class, () -> headers, onError);

Review Comment:
   Look good to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6073: Core: Pass purgeRequested flag to REST server

2022-11-06 Thread GitBox


rdblue commented on code in PR #6073:
URL: https://github.com/apache/iceberg/pull/6073#discussion_r1014895276


##
core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java:
##
@@ -222,8 +222,8 @@ public static LoadTableResponse createTable(
 throw new IllegalStateException("Cannot wrap catalog that does not produce 
BaseTable");
   }
 
-  public static void dropTable(Catalog catalog, TableIdentifier ident) {
-boolean dropped = catalog.dropTable(ident);
+  public static void dropTable(Catalog catalog, TableIdentifier ident, boolean 
purge) {
+boolean dropped = catalog.dropTable(ident, purge);

Review Comment:
   The `SessionCatalog` API requires specifying purge. But the `RESTCatalog` 
adapter does call `Catalog.dropTable` to have the same behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6073: Core: Pass purgeRequested flag to REST server

2022-11-06 Thread GitBox


rdblue commented on code in PR #6073:
URL: https://github.com/apache/iceberg/pull/6073#discussion_r1014895355


##
core/src/test/java/org/apache/iceberg/rest/RESTCatalogAdapter.java:
##
@@ -320,7 +321,10 @@ public  T handleRequest(
 
   case DROP_TABLE:
 {
-  CatalogHandlers.dropTable(catalog, identFromPathVars(vars));
+  CatalogHandlers.dropTable(
+  catalog,
+  identFromPathVars(vars),
+  PropertyUtil.propertyAsBoolean(vars, "purgeRequested", false));

Review Comment:
   I think this is correct for the REST API. If `purgeRequested` was not set 
then it was not requested. This should be consistent behavior, since purge 
would cause a failure before.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-06 Thread GitBox


rdblue commented on code in PR #6069:
URL: https://github.com/apache/iceberg/pull/6069#discussion_r1014895768


##
python/pyiceberg/expressions/visitors.py:
##
@@ -517,7 +519,7 @@ def visit_equal(self, term: BoundTerm, literal: 
Literal[Any]) -> bool:
 pos = term.ref().accessor.position
 field = self.partition_fields[pos]
 
-if field.lower_bound is None:
+if field.lower_bound is None or field.upper_bound is None:

Review Comment:
   If one is `None` then both should be, so this shouldn't have any effect. But 
it's fine to update this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-06 Thread GitBox


rdblue commented on code in PR #6069:
URL: https://github.com/apache/iceberg/pull/6069#discussion_r1014895867


##
python/pyiceberg/expressions/visitors.py:
##
@@ -526,7 +528,7 @@ def visit_equal(self, term: BoundTerm, literal: 
Literal[Any]) -> bool:
 if lower > literal.value:
 return ROWS_CANNOT_MATCH
 
-upper = _from_byte_buffer(term.ref().field.field_type, 
field.lower_bound)
+upper = _from_byte_buffer(term.ref().field.field_type, 
field.upper_bound)

Review Comment:
   @Fokko, can you check why tests didn't catch this case? We should ensure 
this is tested properly and separate the fix into its own PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] ddrinka commented on a diff in pull request #6123: Python: Support creating a DateLiteral from a date (#6120)

2022-11-06 Thread GitBox


ddrinka commented on code in PR #6123:
URL: https://github.com/apache/iceberg/pull/6123#discussion_r1014897183


##
python/pyiceberg/utils/datetime.py:
##
@@ -47,11 +47,16 @@ def micros_to_time(micros: int) -> time:
 return time(hour=hours, minute=minutes, second=seconds, 
microsecond=microseconds)
 
 
-def date_to_days(date_str: str) -> int:
+def date_str_to_days(date_str: str) -> int:

Review Comment:
   My thought process was that this API is still pre 1.0 and the existing name 
was misleading, so better to improve it now.  I have no preference either way 
though, so happy to adjust.  I'm not sure how to support both with the same 
name though?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jzhuge commented on a diff in pull request #4925: API: Add view interfaces

2022-11-06 Thread GitBox


jzhuge commented on code in PR #4925:
URL: https://github.com/apache/iceberg/pull/4925#discussion_r1014897964


##
api/src/main/java/org/apache/iceberg/view/ViewBuilder.java:
##
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.view;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.ViewCatalog;
+
+/**
+ * A builder used to create or replace a SQL {@link View}.
+ *
+ * Call {@link ViewCatalog#buildView} to create a new builder.
+ */
+public interface ViewBuilder {
+  /**
+   * Set the view schema.
+   *
+   * @param schema view schema
+   * @return this for method chaining
+   */
+  ViewBuilder withSchema(Schema schema);
+
+  /**
+   * Set the view query.
+   *
+   * @param query view query
+   * @return this for method chaining
+   */
+  ViewBuilder withQuery(String query);
+
+  /**
+   * Set the view SQL dialect.
+   *
+   * @param dialect view SQL dialect
+   * @return this for method chaining
+   */
+  ViewBuilder withDialect(String dialect);
+
+  /**
+   * Set the view default catalog.
+   *
+   * @param defaultCatalog view default catalog
+   * @return this for method chaining
+   */
+  ViewBuilder withDefaultCatalog(String defaultCatalog);
+
+  /**
+   * Set the view default namespaces.
+   *
+   * @param defaultNamespaces view default namespaces
+   * @return this for method chaining
+   */
+  ViewBuilder withDefaultNamespace(List defaultNamespaces);

Review Comment:
   Changed to use Namespace class and use singular



##
api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java:
##
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.view;
+
+import java.util.List;
+
+public interface SQLViewRepresentation extends ViewRepresentation {
+
+  @Override
+  default Type type() {
+return Type.SQL;
+  }
+
+  String query();
+
+  String dialect();
+
+  String defaultCatalog();
+
+  List defaultNamespace();

Review Comment:
   Changed to use Namespace class and use singular



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jzhuge commented on a diff in pull request #4925: API: Add view interfaces

2022-11-06 Thread GitBox


jzhuge commented on code in PR #4925:
URL: https://github.com/apache/iceberg/pull/4925#discussion_r1014898267


##
api/src/main/java/org/apache/iceberg/view/ViewBuilder.java:
##
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.view;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.ViewCatalog;
+
+/**
+ * A builder used to create or replace a SQL {@link View}.
+ *
+ * Call {@link ViewCatalog#buildView} to create a new builder.
+ */
+public interface ViewBuilder {
+  /**
+   * Set the view schema.
+   *
+   * @param schema view schema
+   * @return this for method chaining
+   */
+  ViewBuilder withSchema(Schema schema);
+
+  /**
+   * Set the view query.
+   *
+   * @param query view query
+   * @return this for method chaining
+   */
+  ViewBuilder withQuery(String query);
+
+  /**
+   * Set the view SQL dialect.
+   *
+   * @param dialect view SQL dialect
+   * @return this for method chaining
+   */
+  ViewBuilder withDialect(String dialect);
+
+  /**
+   * Set the view default catalog.
+   *
+   * @param defaultCatalog view default catalog
+   * @return this for method chaining
+   */
+  ViewBuilder withDefaultCatalog(String defaultCatalog);
+
+  /**
+   * Set the view default namespaces.
+   *
+   * @param defaultNamespaces view default namespaces
+   * @return this for method chaining
+   */
+  ViewBuilder withDefaultNamespace(List defaultNamespaces);
+
+  /**
+   * Set the view field aliases.
+   *
+   * @param fieldAliases view field aliases
+   * @return this for method chaining
+   */
+  ViewBuilder withFieldAliases(List fieldAliases);
+
+  /**
+   * Set the view field comments.
+   *
+   * @param fieldComments view field comments
+   * @return this for method chaining
+   */
+  ViewBuilder withFieldComments(List fieldComments);
+
+  /**
+   * Add key/value properties to the view.
+   *
+   * @param properties key/value properties
+   * @return this for method chaining
+   */
+  ViewBuilder withProperties(Map properties);
+
+  /**
+   * Add a key/value property to the view.
+   *
+   * @param key a key
+   * @param value a value
+   * @return this for method chaining
+   */
+  ViewBuilder withProperty(String key, String value);
+
+  /**
+   * Set a location for the view.
+   *
+   * @param location a location
+   * @return this for method chaining
+   */
+  ViewBuilder withLocation(String location);
+
+  /**
+   * Add a SQL {@link View} for a different dialect.
+   *
+   * If its dialect is the same as the SQL view being built, {@link 
IllegalArgumentException}

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] stevenzwu commented on a diff in pull request #4925: API: Add view interfaces

2022-11-06 Thread GitBox


stevenzwu commented on code in PR #4925:
URL: https://github.com/apache/iceberg/pull/4925#discussion_r1014899332


##
api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java:
##
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.view;
+
+import java.util.List;
+import org.apache.iceberg.catalog.Namespace;
+
+public interface SQLViewRepresentation extends ViewRepresentation {
+
+  @Override
+  default Type type() {
+return Type.SQL;
+  }
+
+  /** The view query SQL text. */
+  String query();
+
+  /** The view query SQL dialect. */
+  String dialect();
+
+  /** The default catalog when the view is created. */
+  String defaultCatalog();

Review Comment:
   this is a getter. wondering why `default...`? should it just be `catalog()`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] stevenzwu commented on a diff in pull request #4925: API: Add view interfaces

2022-11-06 Thread GitBox


stevenzwu commented on code in PR #4925:
URL: https://github.com/apache/iceberg/pull/4925#discussion_r1014899413


##
api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java:
##
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.view;
+
+import java.util.List;
+import org.apache.iceberg.catalog.Namespace;
+
+public interface SQLViewRepresentation extends ViewRepresentation {
+
+  @Override
+  default Type type() {
+return Type.SQL;
+  }
+
+  /** The view query SQL text. */
+  String query();
+
+  /** The view query SQL dialect. */
+  String dialect();
+
+  /** The default catalog when the view is created. */
+  String defaultCatalog();
+
+  /** The default namespace when the view is created. */
+  Namespace defaultNamespace();

Review Comment:
   similar question, should this just be `namespace()`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jzhuge commented on a diff in pull request #4925: API: Add view interfaces

2022-11-06 Thread GitBox


jzhuge commented on code in PR #4925:
URL: https://github.com/apache/iceberg/pull/4925#discussion_r1014903529


##
api/src/main/java/org/apache/iceberg/view/SQLViewRepresentation.java:
##
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.view;
+
+import java.util.List;
+import org.apache.iceberg.catalog.Namespace;
+
+public interface SQLViewRepresentation extends ViewRepresentation {
+
+  @Override
+  default Type type() {
+return Type.SQL;
+  }
+
+  /** The view query SQL text. */
+  String query();
+
+  /** The view query SQL dialect. */
+  String dialect();
+
+  /** The default catalog when the view is created. */
+  String defaultCatalog();

Review Comment:
   In a Spark/Presto SQL session, you can run `USE` to set default catalog and 
namespace, thus the "default".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-06 Thread GitBox


rdblue commented on code in PR #6069:
URL: https://github.com/apache/iceberg/pull/6069#discussion_r1014903669


##
python/tests/table/test_scan.py:
##
@@ -0,0 +1,177 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint:disable=redefined-outer-name
+import glob
+import json
+import os
+import tempfile
+from typing import Tuple
+
+import pytest
+from fastavro import reader, writer
+
+from pyiceberg.expressions import (
+BooleanExpression,
+EqualTo,
+IsNull,
+Reference,
+)
+from pyiceberg.expressions.literals import literal
+from pyiceberg.serializers import FromInputFile
+from pyiceberg.table import Table
+from tests.conftest import LocalFileIO
+from tests.io.test_io import LocalInputFile
+
+
+@pytest.fixture
+def temperatures_table() -> Table:

Review Comment:
   Yes, this should generate a sample table rather than putting binaries in the 
repo.
   
   Initially, we can build all of this based on just Avro and add Parquet tests 
later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-06 Thread GitBox


rdblue commented on code in PR #6069:
URL: https://github.com/apache/iceberg/pull/6069#discussion_r1014903835


##
python/pyiceberg/table/scan.py:
##
@@ -0,0 +1,103 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import itertools
+from abc import ABC, abstractmethod
+from typing import Iterable, List, Optional
+
+from pydantic import Field
+
+from pyiceberg.expressions import AlwaysTrue, BooleanExpression
+from pyiceberg.expressions.visitors import manifest_evaluator
+from pyiceberg.io import FileIO
+from pyiceberg.manifest import DataFile, ManifestFile
+from pyiceberg.table import PartitionSpec, Snapshot, Table
+from pyiceberg.utils.iceberg_base_model import IcebergBaseModel
+
+ALWAYS_TRUE = AlwaysTrue()
+
+
+class FileScanTask(IcebergBaseModel):
+"""A scan task over a range of bytes in a single data file."""
+
+manifest: ManifestFile = Field()
+data_file: DataFile = Field()
+_residual: BooleanExpression = Field()
+spec: PartitionSpec = Field()
+start: int = Field(default=0)
+
+@property
+def length(self) -> int:
+return self.data_file.file_size_in_bytes
+
+
+class TableScan(ABC):

Review Comment:
   Why does this API not match the Java API? Is there something more pythonic 
about this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-06 Thread GitBox


rdblue commented on code in PR #6069:
URL: https://github.com/apache/iceberg/pull/6069#discussion_r1014903928


##
python/pyiceberg/table/scan.py:
##
@@ -0,0 +1,103 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import itertools
+from abc import ABC, abstractmethod
+from typing import Iterable, List, Optional
+
+from pydantic import Field
+
+from pyiceberg.expressions import AlwaysTrue, BooleanExpression
+from pyiceberg.expressions.visitors import manifest_evaluator
+from pyiceberg.io import FileIO
+from pyiceberg.manifest import DataFile, ManifestFile
+from pyiceberg.table import PartitionSpec, Snapshot, Table
+from pyiceberg.utils.iceberg_base_model import IcebergBaseModel
+
+ALWAYS_TRUE = AlwaysTrue()

Review Comment:
   I'd prefer not adding these constants everywhere. Just use `AlwaysTrue()` in 
place of this. It's a singleton.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-06 Thread GitBox


rdblue commented on code in PR #6069:
URL: https://github.com/apache/iceberg/pull/6069#discussion_r1014904027


##
python/pyiceberg/table/scan.py:
##
@@ -0,0 +1,103 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import itertools
+from abc import ABC, abstractmethod
+from typing import Iterable, List, Optional
+
+from pydantic import Field
+
+from pyiceberg.expressions import AlwaysTrue, BooleanExpression
+from pyiceberg.expressions.visitors import manifest_evaluator
+from pyiceberg.io import FileIO
+from pyiceberg.manifest import DataFile, ManifestFile
+from pyiceberg.table import PartitionSpec, Snapshot, Table
+from pyiceberg.utils.iceberg_base_model import IcebergBaseModel
+
+ALWAYS_TRUE = AlwaysTrue()
+
+
+class FileScanTask(IcebergBaseModel):
+"""A scan task over a range of bytes in a single data file."""
+
+manifest: ManifestFile = Field()
+data_file: DataFile = Field()
+_residual: BooleanExpression = Field()
+spec: PartitionSpec = Field()
+start: int = Field(default=0)
+
+@property
+def length(self) -> int:
+return self.data_file.file_size_in_bytes
+
+
+class TableScan(ABC):
+"""API for configuring a table scan."""
+
+table: Table
+snapshot: Snapshot
+expression: BooleanExpression
+
+def __init__(self, table: Table, snapshot: Optional[Snapshot] = None, 
expression: BooleanExpression = ALWAYS_TRUE):
+self.table = table
+self.expression = expression or ALWAYS_TRUE
+if resolved_snapshot := snapshot or table.current_snapshot():
+self.snapshot = resolved_snapshot
+else:
+raise ValueError("Unable to resolve to a Snapshot to use for the 
table scan.")
+
+@abstractmethod
+def plan_files(self) -> Iterable[FileScanTask]:

Review Comment:
   In Java, there are several different types of tasks. I'd recommend using the 
same pattern and not requiring this to be `FileScanTask`. Introduce an abstract 
`ScanTask` and return an iterable of that instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] huaxingao commented on pull request #6065: Fix TestAggregateBinding

2022-11-06 Thread GitBox


huaxingao commented on PR #6065:
URL: https://github.com/apache/iceberg/pull/6065#issuecomment-1304926068

   Thanks! @rdblue @nastra @Fokko 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue opened a new pull request, #6131: Python: Add initial TableScan implementation

2022-11-06 Thread GitBox


rdblue opened a new pull request, #6131:
URL: https://github.com/apache/iceberg/pull/6131

   This adds an implementation of `TableScan` that is an alternative to the one 
in #6069. This doesn't implement `plan_files`, it is just to demonstrate a 
possible scan API.
   
   This scan API works like the Java scan API, but also allows passing scan 
options when creating an initial scan. Both of these are supported:
   
   ```python
   scan = table.scan(
   row_filter=In("id", [5, 6, 7]),
   selected_fields=("id", "data"),
   snapshot_id=1234567890
 )
   # OR
   scan = table.scan() \
   .filter_rows(In("id", [5, 6, 7]))
   .select("id", "data")
   .use_snapshot(1234567890)
   ```
   
   I think this is a reasonable way to get more pythonic (by passing optional 
arguments) and also mostly match the API and behavior in the JVM implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-06 Thread GitBox


rdblue commented on PR #6069:
URL: https://github.com/apache/iceberg/pull/6069#issuecomment-1304927324

   @Fokko, @dhruv-pratap, I posted an alternative scan API in a draft as #6131. 
Please take a look. That behaves like this one and allows you to specify 
optional arguments when creating a scan, but also allows the same syntax that 
Java uses.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6131: Python: Add initial TableScan implementation

2022-11-06 Thread GitBox


rdblue commented on code in PR #6131:
URL: https://github.com/apache/iceberg/pull/6131#discussion_r1014914176


##
python/pyiceberg/table/__init__.py:
##
@@ -14,30 +14,43 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-
-
-from typing import Dict, List, Optional
-
-from pydantic import Field
-
+from typing import (
+ClassVar,
+Dict,
+List,
+Optional,
+)
+
+from pyiceberg.expressions import AlwaysTrue, And, BooleanExpression
 from pyiceberg.schema import Schema
 from pyiceberg.table.metadata import TableMetadata
 from pyiceberg.table.partitioning import PartitionSpec
 from pyiceberg.table.snapshots import Snapshot, SnapshotLogEntry
 from pyiceberg.table.sorting import SortOrder
-from pyiceberg.typedef import Identifier
-from pyiceberg.utils.iceberg_base_model import IcebergBaseModel
+from pyiceberg.typedef import EMPTY_DICT, Identifier, Properties
+
 
+class Table:

Review Comment:
   @Fokko, I removed Pydantic from this because it was conflicting with the 
`schema` method. I think we should probably make these changes to the `Table` 
class either way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6131: Python: Add initial TableScan implementation

2022-11-06 Thread GitBox


rdblue commented on code in PR #6131:
URL: https://github.com/apache/iceberg/pull/6131#discussion_r1014914272


##
python/pyiceberg/table/__init__.py:
##
@@ -90,3 +103,90 @@ def snapshot_by_name(self, name: str) -> Optional[Snapshot]:
 def history(self) -> List[SnapshotLogEntry]:
 """Get the snapshot history of this table."""
 return self.metadata.snapshot_log
+
+
+class TableScan:
+_always_true: ClassVar[BooleanExpression] = AlwaysTrue()

Review Comment:
   This was needed to avoid calling functions in arg defaults.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

2022-11-06 Thread GitBox


rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1014915514


##
python/pyiceberg/expressions/__init__.py:
##
@@ -68,7 +72,6 @@ def eval(self, struct: StructProtocol) -> T:  # pylint: 
disable=W0613
 """Returns the value at the referenced field's position in an object 
that abides by the StructProtocol"""
 
 
-@dataclass(frozen=True)

Review Comment:
   Why are the changes to these classes necessary? I'd prefer to keep these as 
frozen data classes if possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

2022-11-06 Thread GitBox


rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1014915923


##
python/pyiceberg/expressions/literals.py:
##
@@ -213,6 +217,26 @@ class LongLiteral(Literal[int]):
 def __init__(self, value: int):
 super().__init__(value, int)
 
+def __add__(self, other: Any) -> LongLiteral:

Review Comment:
   What about `increment` and `decrement` methods since we only need to add or 
subtract 1?
   
   I think I like the idea of implementing this on the literal. The main issue 
that I have with this is that it modifies the value of the literal. I think 
introducing mutability is going to cause problems because this will modify the 
original expression when we project, which should not happen. (That may also be 
why expressions are no longer immutable?)
   
   I think this should just return a new literal rather than `self`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

2022-11-06 Thread GitBox


rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1014916040


##
python/pyiceberg/transforms.py:
##
@@ -173,6 +201,23 @@ def apply(self, value: Optional[S]) -> Optional[int]:
 def result_type(self, source: IcebergType) -> IcebergType:
 return IntegerType()
 
+def project(self, name: str, pred: BoundPredicate) -> 
Optional[UnboundPredicate]:
+func = self.transform(pred.term.ref().field.field_type)

Review Comment:
   Is there a better name than `func`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

2022-11-06 Thread GitBox


rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1014916582


##
python/pyiceberg/transforms.py:
##
@@ -427,6 +486,20 @@ def can_transform(self, source: IcebergType) -> bool:
 TimestamptzType,
 }
 
+def project(self, name: str, pred: BoundPredicate) -> 
Optional[UnboundPredicate]:

Review Comment:
   Is this different than the other `DatesTransform` implementation? I might be 
missing it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

2022-11-06 Thread GitBox


rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1014917102


##
python/pyiceberg/transforms.py:
##
@@ -249,6 +294,20 @@ def satisfies_order_of(self, other: Transform) -> bool:
 def result_type(self, source: IcebergType) -> IcebergType:
 return IntegerType()
 
+def project(self, name: str, pred: BoundPredicate) -> 
Optional[UnboundPredicate]:

Review Comment:
   I think the date/time transforms need to use the same truncation logic as 
the `truncate` function. That's because the operation may need to change. For 
example: `ts < '2022-11-06T15:56'` with a `day` transform should be `ts_day <= 
'2022-11-06'`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

2022-11-06 Thread GitBox


rdblue commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1014917239


##
python/tests/expressions/test_expressions.py:
##
@@ -269,23 +283,23 @@ def test_bind_not_in_equal_term(table_schema_simple: 
Schema):
 
 
 def test_in_empty():
-assert In(Reference("foo"), ()) == AlwaysFalse()
+assert In(Reference("foo"), set()) == AlwaysFalse()

Review Comment:
   I simplified unbound expression creation in a recent PR. That might help 
here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #6128: Python: Projection

2022-11-06 Thread GitBox


rdblue commented on PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#issuecomment-1304935137

   This is looking great. There are just two issues:
   1. The date/time transforms should use the same logic as truncate for 
numbers since they're basically truncating
   2. I don't think that the changes to expressions are necessary
   
   Thanks, @Fokko!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #6128: Python: Projection

2022-11-06 Thread GitBox


rdblue commented on PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#issuecomment-1304935954

   > Removes the dataclasses from the expressions
   
   I hit the same issue in #6127 and fixed it a different way. We should talk 
about how to do this separately, but I think my update might be a reasonable 
fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue merged pull request #6108: SparkBatchQueryScan logs too much - #6106

2022-11-06 Thread GitBox


rdblue merged PR #6108:
URL: https://github.com/apache/iceberg/pull/6108


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue closed issue #6106: SparkBatchQueryScan logs too much

2022-11-06 Thread GitBox


rdblue closed issue #6106: SparkBatchQueryScan logs too much
URL: https://github.com/apache/iceberg/issues/6106


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #6108: SparkBatchQueryScan logs too much - #6106

2022-11-06 Thread GitBox


rdblue commented on PR #6108:
URL: https://github.com/apache/iceberg/pull/6108#issuecomment-1304936852

   Thanks, @Omega359!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] github-actions[bot] closed issue #3825: Need an exmple of java code that uses hive as meta store and s3.

2022-11-06 Thread GitBox


github-actions[bot] closed issue #3825: Need an exmple of java code that uses 
hive as meta store and s3.
URL: https://github.com/apache/iceberg/issues/3825


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] github-actions[bot] commented on issue #3788: flink iceberg catalog support hadoop-conf-dir option to read hadoop conf?

2022-11-06 Thread GitBox


github-actions[bot] commented on issue #3788:
URL: https://github.com/apache/iceberg/issues/3788#issuecomment-1304938610

   This issue has been closed because it has not received any activity in the 
last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] github-actions[bot] closed issue #3788: flink iceberg catalog support hadoop-conf-dir option to read hadoop conf?

2022-11-06 Thread GitBox


github-actions[bot] closed issue #3788: flink iceberg catalog support 
hadoop-conf-dir option to read hadoop conf?
URL: https://github.com/apache/iceberg/issues/3788


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] github-actions[bot] commented on issue #3825: Need an exmple of java code that uses hive as meta store and s3.

2022-11-06 Thread GitBox


github-actions[bot] commented on issue #3825:
URL: https://github.com/apache/iceberg/issues/3825#issuecomment-1304938592

   This issue has been closed because it has not received any activity in the 
last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #6123: Python: Support creating a DateLiteral from a date (#6120)

2022-11-06 Thread GitBox


rdblue commented on code in PR #6123:
URL: https://github.com/apache/iceberg/pull/6123#discussion_r1014922674


##
python/pyiceberg/utils/datetime.py:
##
@@ -47,11 +47,16 @@ def micros_to_time(micros: int) -> time:
 return time(hour=hours, minute=minutes, second=seconds, 
microsecond=microseconds)
 
 
-def date_to_days(date_str: str) -> int:
+def date_str_to_days(date_str: str) -> int:

Review Comment:
   Yeah, it probably isn't used anywhere. You're right: better to clean it up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue closed issue #6120: [Python] The structure of a partition definition and partition instance should be consistent

2022-11-06 Thread GitBox


rdblue closed issue #6120: [Python] The structure of a partition definition and 
partition instance should be consistent
URL: https://github.com/apache/iceberg/issues/6120


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue merged pull request #6123: Python: Support creating a DateLiteral from a date (#6120)

2022-11-06 Thread GitBox


rdblue merged PR #6123:
URL: https://github.com/apache/iceberg/pull/6123


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #6123: Python: Support creating a DateLiteral from a date (#6120)

2022-11-06 Thread GitBox


rdblue commented on PR #6123:
URL: https://github.com/apache/iceberg/pull/6123#issuecomment-1304946218

   Thanks, @ddrinka! I merged this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #6117: Fix typo in `_ManifestEvalVisitor.visit_equal`

2022-11-06 Thread GitBox


rdblue commented on PR #6117:
URL: https://github.com/apache/iceberg/pull/6117#issuecomment-1304947082

   I reopened this because I think it's a good idea to get it in independently. 
Thanks for finding this, @ddrinka!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #5150: Spark Integration to read from Snapshot ref

2022-11-06 Thread GitBox


rdblue commented on PR #5150:
URL: https://github.com/apache/iceberg/pull/5150#issuecomment-1304951287

   > I am unsure of how to proceed for branches and tags usecase. Just making 
changes to read from branch/tag in SparkScanBuilder worked before for previous 
versions of spark - 3.1, 3.2. But for 3.3 , properties get overridden in 
SparkCatalog.java before reaching SparkScanBuilder.java
   
   The current implementation looks fine to me. If those code paths are taken, 
it indicates that Spark was passed syntax like `TIMESTAMP AS OF '...'`, which 
should be incompatible with `branch` or `tag` options. This PR already 
implements that because the snapshot passed to `SparkTable` is added to options.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #5150: Spark Integration to read from Snapshot ref

2022-11-06 Thread GitBox


rdblue commented on PR #5150:
URL: https://github.com/apache/iceberg/pull/5150#issuecomment-1304951951

   This looks good to me. I'm rerunning CI since the failures don't look 
related to this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #6017: Core, API: Field metadata support

2022-11-06 Thread GitBox


rdblue commented on PR #6017:
URL: https://github.com/apache/iceberg/pull/6017#issuecomment-1304952593

   Closing this since there is discussion on the issue about whether it should 
be done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue closed pull request #6017: Core, API: Field metadata support

2022-11-06 Thread GitBox


rdblue closed pull request #6017: Core, API: Field metadata support
URL: https://github.com/apache/iceberg/pull/6017


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #5984: Core, API: Support incremental scanning with branch

2022-11-06 Thread GitBox


rdblue commented on code in PR #5984:
URL: https://github.com/apache/iceberg/pull/5984#discussion_r1014926208


##
api/src/main/java/org/apache/iceberg/IncrementalScan.java:
##
@@ -21,6 +21,23 @@
 /** API for configuring an incremental scan. */
 public interface IncrementalScan>
 extends Scan {
+
+  /**
+   * Instructs this scan to look for changes starting from a particular 
snapshot (inclusive).
+   *
+   * If the start snapshot is not configured, it is defaulted to the oldest 
ancestor of the end
+   * snapshot (inclusive).
+   *
+   * @param fromSnapshotId the start snapshot ID (inclusive)
+   * @param referenceName the ref used
+   * @return this for method chaining
+   * @throws IllegalArgumentException if the start snapshot is not an ancestor 
of the end snapshot
+   */
+  default ThisT fromSnapshotInclusive(long fromSnapshotId, String 
referenceName) {

Review Comment:
   @nastra, the distinction between a branch and a tag here is that a tag does 
not change. It doesn't make sense to read incremental changes from a tag 
because it is a fixed point in time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #5984: Core, API: Support incremental scanning with branch

2022-11-06 Thread GitBox


rdblue commented on code in PR #5984:
URL: https://github.com/apache/iceberg/pull/5984#discussion_r1014926208


##
api/src/main/java/org/apache/iceberg/IncrementalScan.java:
##
@@ -21,6 +21,23 @@
 /** API for configuring an incremental scan. */
 public interface IncrementalScan>
 extends Scan {
+
+  /**
+   * Instructs this scan to look for changes starting from a particular 
snapshot (inclusive).
+   *
+   * If the start snapshot is not configured, it is defaulted to the oldest 
ancestor of the end
+   * snapshot (inclusive).
+   *
+   * @param fromSnapshotId the start snapshot ID (inclusive)
+   * @param referenceName the ref used
+   * @return this for method chaining
+   * @throws IllegalArgumentException if the start snapshot is not an ancestor 
of the end snapshot
+   */
+  default ThisT fromSnapshotInclusive(long fromSnapshotId, String 
referenceName) {

Review Comment:
   @nastra, the distinction between a branch and a tag here is that a tag does 
not change. It doesn't make sense to read incremental changes from a tag 
because it is a fixed point in time. So I think it's okay to use `toBranch` 
instead of `toRef`, although I'm not sure it's actually worth the distinction 
since the other API uses `ref`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] lvyanquan commented on a diff in pull request #6111: Flink: Add 'cache.expiration-interval-ms' option to FlinkCatalog

2022-11-06 Thread GitBox


lvyanquan commented on code in PR #6111:
URL: https://github.com/apache/iceberg/pull/6111#discussion_r1014937530


##
flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/FlinkCatalogFactory.java:
##
@@ -145,8 +145,27 @@ protected Catalog createCatalog(
   baseNamespace = 
Namespace.of(properties.get(BASE_NAMESPACE).split("\\."));
 }
 
-boolean cacheEnabled = 
Boolean.parseBoolean(properties.getOrDefault(CACHE_ENABLED, "true"));
-return new FlinkCatalog(name, defaultDatabase, baseNamespace, 
catalogLoader, cacheEnabled);
+boolean cacheEnabled = CatalogProperties.CACHE_ENABLED_DEFAULT;

Review Comment:
   address it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] ajantha-bhat commented on pull request #6094: Spark-3.0: Remove spark/v3.0 folder

2022-11-06 Thread GitBox


ajantha-bhat commented on PR #6094:
URL: https://github.com/apache/iceberg/pull/6094#issuecomment-1304974804

   @rdblue: I have rebased this PR now. Thanks. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] lvyanquan commented on a diff in pull request #6111: Flink: Add 'cache.expiration-interval-ms' option to FlinkCatalog

2022-11-06 Thread GitBox


lvyanquan commented on code in PR #6111:
URL: https://github.com/apache/iceberg/pull/6111#discussion_r1014938540


##
flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/FlinkCatalogFactory.java:
##
@@ -145,8 +145,27 @@ protected Catalog createCatalog(
   baseNamespace = 
Namespace.of(properties.get(BASE_NAMESPACE).split("\\."));
 }
 
-boolean cacheEnabled = 
Boolean.parseBoolean(properties.getOrDefault(CACHE_ENABLED, "true"));
-return new FlinkCatalog(name, defaultDatabase, baseNamespace, 
catalogLoader, cacheEnabled);
+boolean cacheEnabled = CatalogProperties.CACHE_ENABLED_DEFAULT;
+if (properties.containsKey(CatalogProperties.CACHE_ENABLED)) {
+  cacheEnabled = 
Boolean.parseBoolean(properties.get(CatalogProperties.CACHE_ENABLED));
+}
+
+long cacheExpirationIntervalMs =
+PropertyUtil.propertyAsLong(
+properties,
+CatalogProperties.CACHE_EXPIRATION_INTERVAL_MS,
+CatalogProperties.CACHE_EXPIRATION_INTERVAL_MS_DEFAULT);
+if (cacheExpirationIntervalMs == 0) {

Review Comment:
   Not sure why limit to positive numbers, since negative values mean "cache 
expiration is turned off and entries expire only on refresh etc", users can set 
a negative values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] luoyuxia commented on issue #3009: How do I realize the upsert of flink sql through setting, in iceberg 0.12.0

2022-11-06 Thread GitBox


luoyuxia commented on issue #3009:
URL: https://github.com/apache/iceberg/issues/3009#issuecomment-1305001100

   Hi, you can refer to here for 
[upsert](https://iceberg.apache.org/docs/latest/flink/#upsert)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] luoyuxia commented on issue #3156: Flink reads iceberg in real time and reports errors

2022-11-06 Thread GitBox


luoyuxia commented on issue #3156:
URL: https://github.com/apache/iceberg/issues/3156#issuecomment-1305008165

   It fail when try to serialize `BaseCombinedScanTask`, I guss it may be fixed 
by #1285


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] wang-x-xia opened a new pull request, #6132: [0.14] Dell: Fix client serialization bug.

2022-11-06 Thread GitBox


wang-x-xia opened a new pull request, #6132:
URL: https://github.com/apache/iceberg/pull/6132

   From https://github.com/apache/iceberg/pull/5059.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] wang-x-xia opened a new pull request, #6133: [1.0] Dell: Fix client serialization bug.

2022-11-06 Thread GitBox


wang-x-xia opened a new pull request, #6133:
URL: https://github.com/apache/iceberg/pull/6133

   From https://github.com/apache/iceberg/pull/5059.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] luoyuxia commented on issue #3124: When writing data to S3 using Glue Catalog, current snapshot ID is -1 and not updated in the metadata file generated

2022-11-06 Thread GitBox


luoyuxia commented on issue #3124:
URL: https://github.com/apache/iceberg/issues/3124#issuecomment-1305028751

   Have ever a checkpoint been done successfully?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jzhuge opened a new pull request, #6134: Spec: Add query-column-names to SQL view representation in view spec

2022-11-06 Thread GitBox


jzhuge opened a new pull request, #6134:
URL: https://github.com/apache/iceberg/pull/6134

   Current view spec misses the field `query-column-names` in SQL view 
representation.
   
   For SELECT star view queries, the schema for the underlying table or view 
may change after the view has been created.
   Thus, we need to store the column names of the view query, because when 
using the view, it is better to pick the columns
   according to the name and order when the view was created and omit the extra 
columns we don't require.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jzhuge commented on pull request #4925: API: Add view interfaces

2022-11-06 Thread GitBox


jzhuge commented on PR #4925:
URL: https://github.com/apache/iceberg/pull/4925#issuecomment-1305068781

   Created #6134 to add the missing field `query-column-names` to SQL view 
representation in the view spec.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] zhongyujiang commented on pull request #6118: Parquet, Core: Fix collection of Parquet metrics when column names co…

2022-11-06 Thread GitBox


zhongyujiang commented on PR #6118:
URL: https://github.com/apache/iceberg/pull/6118#issuecomment-1305081026

   @rdblue sure.
   When collecting metrics from Parquet footer, Iceberg 
[converts](https://github.com/apache/iceberg/blob/167a8ccd7c578296c40f8fc61c90135e71cf1183/parquet/src/main/java/org/apache/iceberg/parquet/ParquetUtil.java#L107)
 the file MessageType to an Iceberg Schema and 
[uses](https://github.com/apache/iceberg/blob/167a8ccd7c578296c40f8fc61c90135e71cf1183/core/src/main/java/org/apache/iceberg/MetricsUtil.java#L56)
 this schema to get the column name of an field id it mapping, and then uses 
the obtained field name to get its corresponding metric mode. 
   
   However Iceberg will escape special characters in field names when 
converting an Iceberg Schema to an Parquet MessageType, and those escaped names 
cannot be restored when converting an Parquet MessageType back to an Iceberg 
Schema, that is to say, we are now using those escaped column names to get 
their corresponding metric modes, which may resulted in incorrect results since 
those escaped names cannot be recognized by MetricsConfig. 
   
   The ORC path does not have this problem because special characters are not 
escaped when converting to ORC schema and ORC 
[itself](https://lists.apache.org/thread/93xbnbs0mr0zxx4fzvrz10m5mmd4qb5w) can 
handle any UTF-8 characters in the column names.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] zhongyujiang commented on pull request #6118: Parquet, Core: Fix collection of Parquet metrics when column names co…

2022-11-06 Thread GitBox


zhongyujiang commented on PR #6118:
URL: https://github.com/apache/iceberg/pull/6118#issuecomment-1305083278

   The failure seems unrelated to this PR:
   >* What went wrong:
   >Could not determine the dependencies of task 
':iceberg-flink:iceberg-flink-runtime-1.16:shadowJar'.
   >See 
https://docs.gradle.org/7.5.1/userguide/command_line_interface.html#sec:command_line_warnings
   > Could not resolve all dependencies for configuration 
':iceberg-flink:iceberg-flink-runtime-1.16:runtimeClasspath'.
  > Could not resolve org.apache.flink:flink-metrics-dropwizard:1.16.0.
Required by:
project :iceberg-flink:iceberg-flink-runtime-1.16
 > Could not resolve org.apache.flink:flink-metrics-dropwizard:1.16.0.
> Could not get resource 
'https://repo.maven.apache.org/maven2/org/apache/flink/flink-metrics-dropwizard/1.16.0/flink-metrics-dropwizard-1.16.0.pom'.
   > Could not GET 
'https://repo.maven.apache.org/maven2/org/apache/flink/flink-metrics-dropwizard/1.16.0/flink-metrics-dropwizard-1.16.0.pom'.
  > Connect to 
repo.maven.apache.org:4[43](https://github.com/apache/iceberg/actions/runs/3407136353/jobs/5666470632#step:4:44)
 [repo.maven.apache.org/146.75.28.215] failed: connect timed out 
   >


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] ajantha-bhat commented on pull request #6094: Spark-3.0: Remove spark/v3.0 folder

2022-11-06 Thread GitBox


ajantha-bhat commented on PR #6094:
URL: https://github.com/apache/iceberg/pull/6094#issuecomment-1305136186

   cc: @Fokko 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] zhongyujiang commented on a diff in pull request #6118: Parquet, Core: Fix collection of Parquet metrics when column names co…

2022-11-06 Thread GitBox


zhongyujiang commented on code in PR #6118:
URL: https://github.com/apache/iceberg/pull/6118#discussion_r1015035815


##
parquet/src/main/java/org/apache/iceberg/parquet/ParquetUtil.java:
##
@@ -75,23 +76,27 @@ public static Metrics fileMetrics(InputFile file, 
MetricsConfig metricsConfig) {
   public static Metrics fileMetrics(
   InputFile file, MetricsConfig metricsConfig, NameMapping nameMapping) {
 try (ParquetFileReader reader = 
ParquetFileReader.open(ParquetIO.file(file))) {
-  return footerMetrics(reader.getFooter(), Stream.empty(), metricsConfig, 
nameMapping);
+  return footerMetrics(reader.getFooter(), Stream.empty(), metricsConfig, 
nameMapping, null);

Review Comment:
   This is used to migrate external files to Iceberg tables,  I am thinking if 
those files are generated by Iceberg, we may still not be able to collect 
metrics according to the correct metric mode with current change. 
   
   To solve this , I think we can change TableMigrationUtil to pass Iceberg 
schema also; 
   Or we can update MetricsConfig to make it tracks mapping between column ids 
metric modes, and use column ids to get metric modes directly when collecting 
parquet metrics, but that will need all writers to update from 
`MetricsConfig.fromProperties(Map props)` to `MetricsConfig 
from(Map props, Schema schema, SortOrder order)` (this is a 
private method now). 
   
   I prefer the second way and I can update the releated code in this repo, but 
I don't know if any external projects uses 
`MetricsConfig.fromProperties(Map props)` too. What do you 
think about it? @rdblue 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] ajantha-bhat commented on pull request #6132: [0.14] Dell: Fix client serialization bug.

2022-11-06 Thread GitBox


ajantha-bhat commented on PR #6132:
URL: https://github.com/apache/iceberg/pull/6132#issuecomment-1305140708

   I think fixing only in the master branch is enough. As per my knowledge, 
this porting is unnecessary as Iceberg will not do a release for those 
branches. 
   
   The next release is 1.1.0, So, if that branch is already made from the 
master, maybe we need to raise the fix for this branch. 
   cc: @gaborkaszab , @Fokko 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] ajantha-bhat commented on pull request #6133: [1.0] Dell: Fix client serialization bug.

2022-11-06 Thread GitBox


ajantha-bhat commented on PR #6133:
URL: https://github.com/apache/iceberg/pull/6133#issuecomment-1305140853

   I think fixing only in the master branch is enough. As per my knowledge, 
this porting is unnecessary as Iceberg will not do a release for those 
branches. 
   
   The next release is 1.1.0, So, if that branch is already made from the 
master, maybe we need to raise the fix for this branch. 
   cc: @gaborkaszab , @Fokko 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] wang-x-xia commented on pull request #6132: [0.14] Dell: Fix client serialization bug.

2022-11-06 Thread GitBox


wang-x-xia commented on PR #6132:
URL: https://github.com/apache/iceberg/pull/6132#issuecomment-1305142658

   @ajantha-bhat Thanks! I'll close this PR~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] wang-x-xia closed pull request #6132: [0.14] Dell: Fix client serialization bug.

2022-11-06 Thread GitBox


wang-x-xia closed pull request #6132: [0.14] Dell: Fix client serialization bug.
URL: https://github.com/apache/iceberg/pull/6132


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] ajantha-bhat commented on pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-11-06 Thread GitBox


ajantha-bhat commented on PR #6090:
URL: https://github.com/apache/iceberg/pull/6090#issuecomment-1305146273

   > @findepi: Thinking more about this, As the TableMetadata has just the list 
of StatisticsFile. And you have mentioned, statisticsFile.snapshotId() is "ID 
of the Iceberg table's snapshot the statistics were computed from"
   So, how will the query knows which statistics file to use for the current 
snapshot (Incase of rewrite data files, the current snapshot id may not be 
present in that list of statistics file?)
   
   @rdblue, @findepi: Please help in clearing my above doubt. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] hendrikmakait opened a new pull request, #6135: Use pythonic `len()` built-in instead of `length` property

2022-11-06 Thread GitBox


hendrikmakait opened a new pull request, #6135:
URL: https://github.com/apache/iceberg/pull/6135

   * Replaces `.length` property with the pythonic `__len__` method on 
`FixedReader` and `FixedType` to enable use of `len()` built-in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] Fokko commented on a diff in pull request #6128: Python: Projection

2022-11-06 Thread GitBox


Fokko commented on code in PR #6128:
URL: https://github.com/apache/iceberg/pull/6128#discussion_r1015064168


##
python/pyiceberg/expressions/__init__.py:
##
@@ -68,7 +72,6 @@ def eval(self, struct: StructProtocol) -> T:  # pylint: 
disable=W0613
 """Returns the value at the referenced field's position in an object 
that abides by the StructProtocol"""
 
 
-@dataclass(frozen=True)

Review Comment:
   The rationale is in the PR description. Mostly because my IDE lights up as a 
Christmas tree, and it is a bit early. It seems that the dataclasses + IDE + 
mypy don't really know what to do with the inheritance of the different classes.
   
   
![image](https://user-images.githubusercontent.com/1134248/200243609-26f1a3c1-378b-49a7-a1fd-94f747cc16c2.png)
   
   A very slimmed-down example is given in the post around `Reference`:
   
   
![image](https://user-images.githubusercontent.com/1134248/200150958-3634a40c-a530-424c-956d-94b843bb5cb9.png)
   
   PyCharm is unable to figure out the argument:
   
   
![image](https://user-images.githubusercontent.com/1134248/200245808-a5db142c-98dc-4e8b-9f66-94296f8bcde7.png)
   
   It turns out that you shouldn't use positional arguments with data classes: 
https://medium.com/@aniscampos/python-dataclass-inheritance-finally-686eaf60fbb5



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] nastra commented on a diff in pull request #6113: Core: Reduce code duplication around writing JSON collections

2022-11-06 Thread GitBox


nastra commented on code in PR #6113:
URL: https://github.com/apache/iceberg/pull/6113#discussion_r1015076718


##
core/src/main/java/org/apache/iceberg/util/JsonUtil.java:
##
@@ -374,4 +380,40 @@ void validate(JsonNode element) {
   element);
 }
   }
+
+  public static void writeIntegerArray(
+  String property, Collection items, JsonGenerator gen) throws 
IOException {

Review Comment:
   makes sense, updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] nastra commented on a diff in pull request #6113: Core: Reduce code duplication around writing JSON collections

2022-11-06 Thread GitBox


nastra commented on code in PR #6113:
URL: https://github.com/apache/iceberg/pull/6113#discussion_r1015078831


##
core/src/main/java/org/apache/iceberg/util/JsonUtil.java:
##
@@ -251,6 +252,11 @@ public static Set getIntegerSet(String property, 
JsonNode node) {
 .build();
   }
 
+  public static List getLongList(String property, JsonNode node) {

Review Comment:
   currently it's only used in the test itself



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] nastra commented on a diff in pull request #5984: Core, API: Support incremental scanning with branch

2022-11-06 Thread GitBox


nastra commented on code in PR #5984:
URL: https://github.com/apache/iceberg/pull/5984#discussion_r1015080916


##
api/src/main/java/org/apache/iceberg/IncrementalScan.java:
##
@@ -21,6 +21,23 @@
 /** API for configuring an incremental scan. */
 public interface IncrementalScan>
 extends Scan {
+
+  /**
+   * Instructs this scan to look for changes starting from a particular 
snapshot (inclusive).
+   *
+   * If the start snapshot is not configured, it is defaulted to the oldest 
ancestor of the end
+   * snapshot (inclusive).
+   *
+   * @param fromSnapshotId the start snapshot ID (inclusive)
+   * @param referenceName the ref used
+   * @return this for method chaining
+   * @throws IllegalArgumentException if the start snapshot is not an ancestor 
of the end snapshot
+   */
+  default ThisT fromSnapshotInclusive(long fromSnapshotId, String 
referenceName) {

Review Comment:
   > @nastra, the distinction between a branch and a tag here is that a tag 
does not change. It doesn't make sense to read incremental changes from a tag 
because it is a fixed point in time. So I think it's okay to use `toBranch` 
instead of `toRef`, although I'm not sure it's actually worth the distinction 
since the other API uses `ref`.
   
   my main point was purely on the naming of the API (`toBranch` vs `toRef`). 
It seems to be we could have `toRef()`, which would perform a check to make 
sure the given ref is an actual branch



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] hililiwei commented on a diff in pull request #6111: Flink: Add 'cache.expiration-interval-ms' option to FlinkCatalog

2022-11-06 Thread GitBox


hililiwei commented on code in PR #6111:
URL: https://github.com/apache/iceberg/pull/6111#discussion_r1015087242


##
flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/FlinkCatalogFactory.java:
##
@@ -145,8 +145,27 @@ protected Catalog createCatalog(
   baseNamespace = 
Namespace.of(properties.get(BASE_NAMESPACE).split("\\."));
 }
 
-boolean cacheEnabled = 
Boolean.parseBoolean(properties.getOrDefault(CACHE_ENABLED, "true"));
-return new FlinkCatalog(name, defaultDatabase, baseNamespace, 
catalogLoader, cacheEnabled);
+boolean cacheEnabled = CatalogProperties.CACHE_ENABLED_DEFAULT;
+if (properties.containsKey(CatalogProperties.CACHE_ENABLED)) {
+  cacheEnabled = 
Boolean.parseBoolean(properties.get(CatalogProperties.CACHE_ENABLED));
+}
+
+long cacheExpirationIntervalMs =
+PropertyUtil.propertyAsLong(
+properties,
+CatalogProperties.CACHE_EXPIRATION_INTERVAL_MS,
+CatalogProperties.CACHE_EXPIRATION_INTERVAL_MS_DEFAULT);
+if (cacheExpirationIntervalMs == 0) {

Review Comment:
   > * If cache.expiration-interval-ms is zero, caching will be turned off 
entirely (irrespective of the cache-enabled flag)
   > * If cache.expiration-interval-ms is positive (with cache-enabled=true), 
then caching will be on and entries will expire if not accessed in that period 
of time.
   > * If users pass in a value <= -1, (with cache-enabled=true), then caching 
will be enabled and cache expiration will never happen (i.e. entries will 
persist unless explicitly refreshed, original behavior).
   
   detail ref:  https://github.com/apache/iceberg/pull/3543#issue-1052281636



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] Fokko commented on a diff in pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-06 Thread GitBox


Fokko commented on code in PR #6069:
URL: https://github.com/apache/iceberg/pull/6069#discussion_r1015087384


##
python/pyiceberg/expressions/visitors.py:
##
@@ -517,7 +519,7 @@ def visit_equal(self, term: BoundTerm, literal: 
Literal[Any]) -> bool:
 pos = term.ref().accessor.position
 field = self.partition_fields[pos]
 
-if field.lower_bound is None:
+if field.lower_bound is None or field.upper_bound is None:

Review Comment:
   This is required for the linter since we're the variable below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] hililiwei commented on pull request #6075: Flink 1.15: Support change log scan task

2022-11-06 Thread GitBox


hililiwei commented on PR #6075:
URL: https://github.com/apache/iceberg/pull/6075#issuecomment-1305209915

   @stevenzwu could you please take a look at it when you get a chance? thx.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] chenwyi2 opened a new issue, #6136: if i lost a metadata file, how to recover

2022-11-06 Thread GitBox


chenwyi2 opened a new issue, #6136:
URL: https://github.com/apache/iceberg/issues/6136

   ### Query engine
   
   spark: 3.1
   iceberg: 0.14.1
   
   ### Question
   
   the situation is, i had a table partition by date, and  i deleted a old 
manifest file from hdfs without moving to trash, and i want to  delete some 
partitions which are older than 10 days, but i got error: 
filenotfoundexcaption(that deleted old manifest file), how can i solve this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] nastra commented on a diff in pull request #6073: Core: Pass purgeRequested flag to REST server

2022-11-06 Thread GitBox


nastra commented on code in PR #6073:
URL: https://github.com/apache/iceberg/pull/6073#discussion_r1015101336


##
core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java:
##
@@ -799,6 +799,23 @@ public void testDropTable() {
 Assert.assertFalse("Table should not exist after drop", 
catalog.tableExists(TABLE));
   }
 
+  @Test
+  public void testDropTableWithPurge() {
+C catalog = catalog();
+
+if (requiresNamespaceCreate()) {
+  catalog.createNamespace(NS);
+}
+
+Assertions.assertThat(catalog.tableExists(TABLE)).isFalse();
+
+catalog.buildTable(TABLE, SCHEMA).create();
+Assertions.assertThat(catalog.tableExists(TABLE)).isTrue();
+
+Assertions.assertThat(catalog.dropTable(TABLE, true)).isTrue();
+Assertions.assertThat(catalog.tableExists(TABLE)).isFalse();

Review Comment:
   done



##
core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java:
##
@@ -222,13 +222,27 @@ public static LoadTableResponse createTable(
 throw new IllegalStateException("Cannot wrap catalog that does not produce 
BaseTable");
   }
 
+  /**
+   * @param catalog The catalog instance
+   * @param ident The table identifier of the table to be dropped.
+   * @deprecated Will be removed in 1.2.0, use {@link 
CatalogHandlers#dropTable(Catalog,
+   * TableIdentifier, boolean)}.
+   */
+  @Deprecated

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org