[GitHub] [iceberg] Gboyka commented on issue #6235: There is no data in the table, when insert data using Hive on Tez.

2022-12-29 Thread GitBox


Gboyka commented on issue #6235:
URL: https://github.com/apache/iceberg/issues/6235#issuecomment-1367283307

   Did you find any root cause or resolution for this, I'm facing the same issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] Fokko commented on pull request #6497: Python: Move `adlfs` import inline

2022-12-29 Thread GitBox


Fokko commented on PR #6497:
URL: https://github.com/apache/iceberg/pull/6497#issuecomment-1367441531

   @cccs-eric thanks for taking a look a the PR, much appreciated. You're 
completely right here, and thanks for pointing out. Before we only had s3 as an 
implementation, so then it wasn't an issue, but now with adlfs we want to be 
able to use that without having to depend on s3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] Fokko commented on a diff in pull request #6501: Python: Use PyArrow buffer

2022-12-29 Thread GitBox


Fokko commented on code in PR #6501:
URL: https://github.com/apache/iceberg/pull/6501#discussion_r1059047328


##
python/pyiceberg/io/pyarrow.py:
##
@@ -230,6 +240,14 @@ def _get_fs(self, scheme: str) -> FileSystem:
 else:
 raise ValueError(f"Unrecognized filesystem type in URI: {scheme}")
 
+def _get_file(self, location: str) -> PyArrowFile:
+scheme, path = self.parse_location(location)
+fs = self._get_fs(scheme)
+
+buffer_size = self.properties.get(BUFFER_SIZE)

Review Comment:
   Makes a lot of sense, let me update that! 👍🏻 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] Fokko opened a new pull request, #6504: Python: Add tests

2022-12-29 Thread GitBox


Fokko opened a new pull request, #6504:
URL: https://github.com/apache/iceberg/pull/6504

   To maintain 90% code coverage


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] Fokko commented on a diff in pull request #6504: Python: Add tests

2022-12-29 Thread GitBox


Fokko commented on code in PR #6504:
URL: https://github.com/apache/iceberg/pull/6504#discussion_r1059129800


##
python/pyiceberg/typedef.py:
##
@@ -54,12 +54,9 @@ def __init__(self, default_factory: Callable[[K], V]):
 self.default_factory = default_factory
 
 def __missing__(self, key: K) -> V:
-if self.default_factory is None:

Review Comment:
   `self.default_factory` cannot be null because of the type signature



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Delta, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-29 Thread GitBox


jackye1995 commented on code in PR #6449:
URL: https://github.com/apache/iceberg/pull/6449#discussion_r1059182643


##
delta-lake/src/main/java/org/apache/iceberg/delta/SupportMigrateDeltaLake.java:
##
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.delta;
+
+/**
+ * An API that should be implemented by query engine integrations that want to 
support migration
+ * from Delta Lake table to Iceberg table.
+ */
+public interface SupportMigrateDeltaLake {

Review Comment:
   nit: maybe `SupportsMigrationFromDeltaLake`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Delta, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-29 Thread GitBox


jackye1995 commented on code in PR #6449:
URL: https://github.com/apache/iceberg/pull/6449#discussion_r1059183754


##
delta-lake/src/main/java/org/apache/iceberg/delta/BaseMigrateDeltaLakeTableAction.java:
##
@@ -0,0 +1,364 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.delta;
+
+import io.delta.standalone.DeltaLog;
+import io.delta.standalone.VersionLog;
+import io.delta.standalone.actions.Action;
+import io.delta.standalone.actions.AddFile;
+import io.delta.standalone.actions.RemoveFile;
+import java.io.File;
+import java.io.UncheckedIOException;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DataFiles;
+import org.apache.iceberg.DeleteFiles;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Metrics;
+import org.apache.iceberg.MetricsConfig;
+import org.apache.iceberg.OverwriteFiles;
+import org.apache.iceberg.PartitionField;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Snapshot;
+import org.apache.iceberg.SnapshotSummary;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableProperties;
+import org.apache.iceberg.Transaction;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.delta.utils.DeltaLakeDataTypeVisitor;
+import org.apache.iceberg.delta.utils.DeltaLakeTypeToType;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.hadoop.HadoopInputFile;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.mapping.NameMapping;
+import org.apache.iceberg.mapping.NameMappingParser;
+import org.apache.iceberg.orc.OrcMetrics;
+import org.apache.iceberg.parquet.ParquetUtil;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Type;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Takes a Delta Lake table's location and attempts to transform it into an 
Iceberg table in an
+ * optional user-specified location (default to the Delta Lake table's 
location) with a different
+ * identifier.
+ */
+public class BaseMigrateDeltaLakeTableAction implements MigrateDeltaLakeTable {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(BaseMigrateDeltaLakeTableAction.class);
+
+  private static final String MIGRATION_SOURCE_PROP = "migration_source";
+  private static final String DELTA_SOURCE_VALUE = "delta";
+  private static final String ORIGINAL_LOCATION_PROP = "original_location";
+  private static final String PARQUET_SUFFIX = ".parquet";
+  private static final String AVRO_SUFFIX = ".avro";
+  private static final String ORC_SUFFIX = ".orc";
+  private final Map additionalProperties = Maps.newHashMap();
+  private final DeltaLog deltaLog;
+  private final Catalog icebergCatalog;
+  private final String deltaTableLocation;
+  private final TableIdentifier newTableIdentifier;
+  private final Configuration hadoopConfiguration;
+  private final String newTableLocation;
+
+  public BaseMigrateDeltaLakeTableAction(
+  Catalog icebergCatalog,
+  String deltaTableLocation,
+  TableIdentifier newTableIdentifier,
+  Configuration hadoopConfiguration) {
+this.icebergCatalog = icebergCatalog;
+this.deltaTableLocation = deltaTableLocation;
+this.newTableIdentifier = newTableIdentifier;
+this.hadoopConfiguration = hadoopConfiguration;
+this.newTableLocation = deltaTableLocation;
+this.deltaLog = DeltaLog.forTable(this.hadoopConfiguration, 
this.deltaTableLocation);
+  }
+
+  public BaseMigrateDeltaLakeTableAction(
+  Catalog icebergCatalog,
+  String delta

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Delta, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table

2022-12-29 Thread GitBox


jackye1995 commented on code in PR #6449:
URL: https://github.com/apache/iceberg/pull/6449#discussion_r1059184671


##
delta-lake/src/main/java/org/apache/iceberg/delta/BaseMigrateDeltaLakeTableAction.java:
##
@@ -0,0 +1,364 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.delta;
+
+import io.delta.standalone.DeltaLog;
+import io.delta.standalone.VersionLog;
+import io.delta.standalone.actions.Action;
+import io.delta.standalone.actions.AddFile;
+import io.delta.standalone.actions.RemoveFile;
+import java.io.File;
+import java.io.UncheckedIOException;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.iceberg.AppendFiles;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DataFiles;
+import org.apache.iceberg.DeleteFiles;
+import org.apache.iceberg.FileFormat;
+import org.apache.iceberg.Metrics;
+import org.apache.iceberg.MetricsConfig;
+import org.apache.iceberg.OverwriteFiles;
+import org.apache.iceberg.PartitionField;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Snapshot;
+import org.apache.iceberg.SnapshotSummary;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableProperties;
+import org.apache.iceberg.Transaction;
+import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.catalog.Catalog;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.delta.utils.DeltaLakeDataTypeVisitor;
+import org.apache.iceberg.delta.utils.DeltaLakeTypeToType;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.hadoop.HadoopInputFile;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.io.InputFile;
+import org.apache.iceberg.mapping.NameMapping;
+import org.apache.iceberg.mapping.NameMappingParser;
+import org.apache.iceberg.orc.OrcMetrics;
+import org.apache.iceberg.parquet.ParquetUtil;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Type;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Takes a Delta Lake table's location and attempts to transform it into an 
Iceberg table in an
+ * optional user-specified location (default to the Delta Lake table's 
location) with a different
+ * identifier.
+ */
+public class BaseMigrateDeltaLakeTableAction implements MigrateDeltaLakeTable {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(BaseMigrateDeltaLakeTableAction.class);
+
+  private static final String MIGRATION_SOURCE_PROP = "migration_source";
+  private static final String DELTA_SOURCE_VALUE = "delta";
+  private static final String ORIGINAL_LOCATION_PROP = "original_location";
+  private static final String PARQUET_SUFFIX = ".parquet";
+  private static final String AVRO_SUFFIX = ".avro";
+  private static final String ORC_SUFFIX = ".orc";
+  private final Map additionalProperties = Maps.newHashMap();
+  private final DeltaLog deltaLog;
+  private final Catalog icebergCatalog;
+  private final String deltaTableLocation;
+  private final TableIdentifier newTableIdentifier;
+  private final Configuration hadoopConfiguration;
+  private final String newTableLocation;
+
+  public BaseMigrateDeltaLakeTableAction(
+  Catalog icebergCatalog,
+  String deltaTableLocation,
+  TableIdentifier newTableIdentifier,
+  Configuration hadoopConfiguration) {
+this.icebergCatalog = icebergCatalog;
+this.deltaTableLocation = deltaTableLocation;
+this.newTableIdentifier = newTableIdentifier;
+this.hadoopConfiguration = hadoopConfiguration;
+this.newTableLocation = deltaTableLocation;
+this.deltaLog = DeltaLog.forTable(this.hadoopConfiguration, 
this.deltaTableLocation);
+  }
+
+  public BaseMigrateDeltaLakeTableAction(
+  Catalog icebergCatalog,
+  String delta

[GitHub] [iceberg] github-actions[bot] commented on issue #5182: Feature: adding constraint validation

2022-12-29 Thread GitBox


github-actions[bot] commented on issue #5182:
URL: https://github.com/apache/iceberg/issues/5182#issuecomment-1367649865

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] github-actions[bot] closed issue #5071: Support UNSET of sortOrder from the SQL

2022-12-29 Thread GitBox


github-actions[bot] closed issue #5071: Support UNSET of sortOrder from the SQL
URL: https://github.com/apache/iceberg/issues/5071


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] github-actions[bot] commented on issue #5071: Support UNSET of sortOrder from the SQL

2022-12-29 Thread GitBox


github-actions[bot] commented on issue #5071:
URL: https://github.com/apache/iceberg/issues/5071#issuecomment-1367649913

   This issue has been closed because it has not received any activity in the 
last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] github-actions[bot] closed issue #5025: Reduce the number of equity-deletes using bloom filter

2022-12-29 Thread GitBox


github-actions[bot] closed issue #5025: Reduce the number of equity-deletes 
using bloom filter
URL: https://github.com/apache/iceberg/issues/5025


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] github-actions[bot] commented on issue #5025: Reduce the number of equity-deletes using bloom filter

2022-12-29 Thread GitBox


github-actions[bot] commented on issue #5025:
URL: https://github.com/apache/iceberg/issues/5025#issuecomment-1367649950

   This issue has been closed because it has not received any activity in the 
last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] singhpk234 commented on a diff in pull request #6480: Spark: Fail streaming planning when snapshot not found

2022-12-29 Thread GitBox


singhpk234 commented on code in PR #6480:
URL: https://github.com/apache/iceberg/pull/6480#discussion_r1059091796


##
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java:
##
@@ -207,7 +207,14 @@ private List planFiles(StreamingOffset 
startOffset, StreamingOffse
 currentOffset = new StreamingOffset(snapshotAfter.snapshotId(), 0L, 
false);
   }
 
-  if (!shouldProcess(table.snapshot(currentOffset.snapshotId( {
+  Snapshot snapshot = table.snapshot(currentOffset.snapshotId());
+
+  if (snapshot == null) {
+throw new IllegalStateException(
+String.format("Failed to find expected snapshot %d", 
snapshot.snapshotId()));
+  }
+

Review Comment:
   as per my understanding returning `false` from shouldProcess will make it 
skip the expired snapshot, which might not be expected if the expired snapshot 
was of type `APPEND`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org