[GitHub] [iceberg] Gboyka commented on issue #6235: There is no data in the table, when insert data using Hive on Tez.
Gboyka commented on issue #6235: URL: https://github.com/apache/iceberg/issues/6235#issuecomment-1367283307 Did you find any root cause or resolution for this, I'm facing the same issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] Fokko commented on pull request #6497: Python: Move `adlfs` import inline
Fokko commented on PR #6497: URL: https://github.com/apache/iceberg/pull/6497#issuecomment-1367441531 @cccs-eric thanks for taking a look a the PR, much appreciated. You're completely right here, and thanks for pointing out. Before we only had s3 as an implementation, so then it wasn't an issue, but now with adlfs we want to be able to use that without having to depend on s3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] Fokko commented on a diff in pull request #6501: Python: Use PyArrow buffer
Fokko commented on code in PR #6501: URL: https://github.com/apache/iceberg/pull/6501#discussion_r1059047328 ## python/pyiceberg/io/pyarrow.py: ## @@ -230,6 +240,14 @@ def _get_fs(self, scheme: str) -> FileSystem: else: raise ValueError(f"Unrecognized filesystem type in URI: {scheme}") +def _get_file(self, location: str) -> PyArrowFile: +scheme, path = self.parse_location(location) +fs = self._get_fs(scheme) + +buffer_size = self.properties.get(BUFFER_SIZE) Review Comment: Makes a lot of sense, let me update that! 👍🏻 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] Fokko opened a new pull request, #6504: Python: Add tests
Fokko opened a new pull request, #6504: URL: https://github.com/apache/iceberg/pull/6504 To maintain 90% code coverage -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] Fokko commented on a diff in pull request #6504: Python: Add tests
Fokko commented on code in PR #6504: URL: https://github.com/apache/iceberg/pull/6504#discussion_r1059129800 ## python/pyiceberg/typedef.py: ## @@ -54,12 +54,9 @@ def __init__(self, default_factory: Callable[[K], V]): self.default_factory = default_factory def __missing__(self, key: K) -> V: -if self.default_factory is None: Review Comment: `self.default_factory` cannot be null because of the type signature -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Delta, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1059182643 ## delta-lake/src/main/java/org/apache/iceberg/delta/SupportMigrateDeltaLake.java: ## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.delta; + +/** + * An API that should be implemented by query engine integrations that want to support migration + * from Delta Lake table to Iceberg table. + */ +public interface SupportMigrateDeltaLake { Review Comment: nit: maybe `SupportsMigrationFromDeltaLake`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Delta, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1059183754 ## delta-lake/src/main/java/org/apache/iceberg/delta/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,364 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.delta; + +import io.delta.standalone.DeltaLog; +import io.delta.standalone.VersionLog; +import io.delta.standalone.actions.Action; +import io.delta.standalone.actions.AddFile; +import io.delta.standalone.actions.RemoveFile; +import java.io.File; +import java.io.UncheckedIOException; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.iceberg.AppendFiles; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.DataFiles; +import org.apache.iceberg.DeleteFiles; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.Metrics; +import org.apache.iceberg.MetricsConfig; +import org.apache.iceberg.OverwriteFiles; +import org.apache.iceberg.PartitionField; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Snapshot; +import org.apache.iceberg.SnapshotSummary; +import org.apache.iceberg.Table; +import org.apache.iceberg.TableProperties; +import org.apache.iceberg.Transaction; +import org.apache.iceberg.avro.Avro; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.delta.utils.DeltaLakeDataTypeVisitor; +import org.apache.iceberg.delta.utils.DeltaLakeTypeToType; +import org.apache.iceberg.exceptions.ValidationException; +import org.apache.iceberg.hadoop.HadoopInputFile; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.io.InputFile; +import org.apache.iceberg.mapping.NameMapping; +import org.apache.iceberg.mapping.NameMappingParser; +import org.apache.iceberg.orc.OrcMetrics; +import org.apache.iceberg.parquet.ParquetUtil; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.relocated.com.google.common.collect.Iterables; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Type; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Takes a Delta Lake table's location and attempts to transform it into an Iceberg table in an + * optional user-specified location (default to the Delta Lake table's location) with a different + * identifier. + */ +public class BaseMigrateDeltaLakeTableAction implements MigrateDeltaLakeTable { + + private static final Logger LOG = LoggerFactory.getLogger(BaseMigrateDeltaLakeTableAction.class); + + private static final String MIGRATION_SOURCE_PROP = "migration_source"; + private static final String DELTA_SOURCE_VALUE = "delta"; + private static final String ORIGINAL_LOCATION_PROP = "original_location"; + private static final String PARQUET_SUFFIX = ".parquet"; + private static final String AVRO_SUFFIX = ".avro"; + private static final String ORC_SUFFIX = ".orc"; + private final Map additionalProperties = Maps.newHashMap(); + private final DeltaLog deltaLog; + private final Catalog icebergCatalog; + private final String deltaTableLocation; + private final TableIdentifier newTableIdentifier; + private final Configuration hadoopConfiguration; + private final String newTableLocation; + + public BaseMigrateDeltaLakeTableAction( + Catalog icebergCatalog, + String deltaTableLocation, + TableIdentifier newTableIdentifier, + Configuration hadoopConfiguration) { +this.icebergCatalog = icebergCatalog; +this.deltaTableLocation = deltaTableLocation; +this.newTableIdentifier = newTableIdentifier; +this.hadoopConfiguration = hadoopConfiguration; +this.newTableLocation = deltaTableLocation; +this.deltaLog = DeltaLog.forTable(this.hadoopConfiguration, this.deltaTableLocation); + } + + public BaseMigrateDeltaLakeTableAction( + Catalog icebergCatalog, + String delta
[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6449: WIP: Delta, Spark: Adding support for Migrating Delta Lake Table to Iceberg Table
jackye1995 commented on code in PR #6449: URL: https://github.com/apache/iceberg/pull/6449#discussion_r1059184671 ## delta-lake/src/main/java/org/apache/iceberg/delta/BaseMigrateDeltaLakeTableAction.java: ## @@ -0,0 +1,364 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.delta; + +import io.delta.standalone.DeltaLog; +import io.delta.standalone.VersionLog; +import io.delta.standalone.actions.Action; +import io.delta.standalone.actions.AddFile; +import io.delta.standalone.actions.RemoveFile; +import java.io.File; +import java.io.UncheckedIOException; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.iceberg.AppendFiles; +import org.apache.iceberg.DataFile; +import org.apache.iceberg.DataFiles; +import org.apache.iceberg.DeleteFiles; +import org.apache.iceberg.FileFormat; +import org.apache.iceberg.Metrics; +import org.apache.iceberg.MetricsConfig; +import org.apache.iceberg.OverwriteFiles; +import org.apache.iceberg.PartitionField; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Snapshot; +import org.apache.iceberg.SnapshotSummary; +import org.apache.iceberg.Table; +import org.apache.iceberg.TableProperties; +import org.apache.iceberg.Transaction; +import org.apache.iceberg.avro.Avro; +import org.apache.iceberg.catalog.Catalog; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.delta.utils.DeltaLakeDataTypeVisitor; +import org.apache.iceberg.delta.utils.DeltaLakeTypeToType; +import org.apache.iceberg.exceptions.ValidationException; +import org.apache.iceberg.hadoop.HadoopInputFile; +import org.apache.iceberg.io.FileIO; +import org.apache.iceberg.io.InputFile; +import org.apache.iceberg.mapping.NameMapping; +import org.apache.iceberg.mapping.NameMappingParser; +import org.apache.iceberg.orc.OrcMetrics; +import org.apache.iceberg.parquet.ParquetUtil; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.relocated.com.google.common.collect.Iterables; +import org.apache.iceberg.relocated.com.google.common.collect.Lists; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.types.Type; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Takes a Delta Lake table's location and attempts to transform it into an Iceberg table in an + * optional user-specified location (default to the Delta Lake table's location) with a different + * identifier. + */ +public class BaseMigrateDeltaLakeTableAction implements MigrateDeltaLakeTable { + + private static final Logger LOG = LoggerFactory.getLogger(BaseMigrateDeltaLakeTableAction.class); + + private static final String MIGRATION_SOURCE_PROP = "migration_source"; + private static final String DELTA_SOURCE_VALUE = "delta"; + private static final String ORIGINAL_LOCATION_PROP = "original_location"; + private static final String PARQUET_SUFFIX = ".parquet"; + private static final String AVRO_SUFFIX = ".avro"; + private static final String ORC_SUFFIX = ".orc"; + private final Map additionalProperties = Maps.newHashMap(); + private final DeltaLog deltaLog; + private final Catalog icebergCatalog; + private final String deltaTableLocation; + private final TableIdentifier newTableIdentifier; + private final Configuration hadoopConfiguration; + private final String newTableLocation; + + public BaseMigrateDeltaLakeTableAction( + Catalog icebergCatalog, + String deltaTableLocation, + TableIdentifier newTableIdentifier, + Configuration hadoopConfiguration) { +this.icebergCatalog = icebergCatalog; +this.deltaTableLocation = deltaTableLocation; +this.newTableIdentifier = newTableIdentifier; +this.hadoopConfiguration = hadoopConfiguration; +this.newTableLocation = deltaTableLocation; +this.deltaLog = DeltaLog.forTable(this.hadoopConfiguration, this.deltaTableLocation); + } + + public BaseMigrateDeltaLakeTableAction( + Catalog icebergCatalog, + String delta
[GitHub] [iceberg] github-actions[bot] commented on issue #5182: Feature: adding constraint validation
github-actions[bot] commented on issue #5182: URL: https://github.com/apache/iceberg/issues/5182#issuecomment-1367649865 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] github-actions[bot] closed issue #5071: Support UNSET of sortOrder from the SQL
github-actions[bot] closed issue #5071: Support UNSET of sortOrder from the SQL URL: https://github.com/apache/iceberg/issues/5071 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] github-actions[bot] commented on issue #5071: Support UNSET of sortOrder from the SQL
github-actions[bot] commented on issue #5071: URL: https://github.com/apache/iceberg/issues/5071#issuecomment-1367649913 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] github-actions[bot] closed issue #5025: Reduce the number of equity-deletes using bloom filter
github-actions[bot] closed issue #5025: Reduce the number of equity-deletes using bloom filter URL: https://github.com/apache/iceberg/issues/5025 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] github-actions[bot] commented on issue #5025: Reduce the number of equity-deletes using bloom filter
github-actions[bot] commented on issue #5025: URL: https://github.com/apache/iceberg/issues/5025#issuecomment-1367649950 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [iceberg] singhpk234 commented on a diff in pull request #6480: Spark: Fail streaming planning when snapshot not found
singhpk234 commented on code in PR #6480: URL: https://github.com/apache/iceberg/pull/6480#discussion_r1059091796 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java: ## @@ -207,7 +207,14 @@ private List planFiles(StreamingOffset startOffset, StreamingOffse currentOffset = new StreamingOffset(snapshotAfter.snapshotId(), 0L, false); } - if (!shouldProcess(table.snapshot(currentOffset.snapshotId( { + Snapshot snapshot = table.snapshot(currentOffset.snapshotId()); + + if (snapshot == null) { +throw new IllegalStateException( +String.format("Failed to find expected snapshot %d", snapshot.snapshotId())); + } + Review Comment: as per my understanding returning `false` from shouldProcess will make it skip the expired snapshot, which might not be expected if the expired snapshot was of type `APPEND` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org