Re: [I] PyIceberg appending data creates snapshots incompatible with Athena/Spark [iceberg-python]
Fokko commented on issue #1424: URL: https://github.com/apache/iceberg-python/issues/1424#issuecomment-2544900271 I'm also not seeing how this could happen, to test this, I also ran this script: ``` Python 3.10.14 (main, Mar 19 2024, 21:46:16) [Clang 15.0.0 (clang-1500.3.9.4)] Type 'copyright', 'credits' or 'license' for more information IPython 8.26.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import uuid ...: ...: while True: ...: rnd_uuid = uuid.uuid4() ...: snapshot_id = int.from_bytes( ...: bytes(lhs ^ rhs for lhs, rhs in zip(rnd_uuid.bytes[0:8], rnd_uuid.bytes[8:16])), byteorder="little", signed=True ...: ) ...: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -1 ...: if snapshot_id > 9223372036854775807: ...: print("Boom") ...: ``` I think we should add some checks to ensure that the `snapshot_id` is in the range of `[0, 9223372036854775807]` or raise a `ValueError` instead. Would you be interested in contributing that @Samreay ? I think this should happen in the `SnapshotProducer` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Doc: Do Not Modify the Source Data Table During MergeIntoCommand Exec… [iceberg]
BsoBird commented on code in PR #11787: URL: https://github.com/apache/iceberg/pull/11787#discussion_r1886334243 ## docs/docs/spark-writes.md: ## @@ -101,6 +101,9 @@ Spark 3.5 added support for `WHEN NOT MATCHED BY SOURCE ... THEN ...` to update WHEN NOT MATCHED BY SOURCE THEN UPDATE SET status = 'invalid' ``` +!!! danger +Note: For copy on write table,Please Do Not Modify the Source Data Table During Execution.Due to the need for Spark to use the source table consecutively twice for computation,the relation which is created of source data must remain constant through the two different passes of the source data.If the source data query would return different results user will see odd behavior. Review Comment: > Would it be possible to be more specific? What kind of odd behavior users will experience @Fokko Hello,Sir. The most common situation encountered in our production environment is data loss -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Doc: Do Not Modify the Source Data Table During MergeIntoCommand Exec… [iceberg]
BsoBird commented on code in PR #11787: URL: https://github.com/apache/iceberg/pull/11787#discussion_r1886334243 ## docs/docs/spark-writes.md: ## @@ -101,6 +101,9 @@ Spark 3.5 added support for `WHEN NOT MATCHED BY SOURCE ... THEN ...` to update WHEN NOT MATCHED BY SOURCE THEN UPDATE SET status = 'invalid' ``` +!!! danger +Note: For copy on write table,Please Do Not Modify the Source Data Table During Execution.Due to the need for Spark to use the source table consecutively twice for computation,the relation which is created of source data must remain constant through the two different passes of the source data.If the source data query would return different results user will see odd behavior. Review Comment: > Would it be possible to be more specific? What kind of odd behavior users will experience The most common situation encountered in our production environment is data loss -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
Fokko commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1886483368 ## pyiceberg/table/__init__.py: ## @@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]: for data_entry in data_entries ] +def _target_split_size(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_SIZE, TableProperties.READ_SPLIT_SIZE_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_SIZE, table_value) # type: ignore + +def _loop_back(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_LOOKBACK, TableProperties.READ_SPLIT_LOOKBACK_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_LOOKBACK, table_value) # type: ignore + +def _split_open_file_cost(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, +TableProperties.READ_SPLIT_OPEN_FILE_COST, +TableProperties.READ_SPLIT_OPEN_FILE_COST_DEFAULT, +) +return property_as_int(self.options, TableProperties.READ_SPLIT_OPEN_FILE_COST, table_value) # type: ignore + +def plan_task(self) -> Iterable[CombinedFileScanTask]: Review Comment: I have the same concern as @corleyma. For example, Java splits on row-groups to make full use of the parallelism of Spark, Trino, Hive etc. On a local machine, it makes more sense to just plow through the file itself, preferably using a native reader like PyArrow or Iceberg-Rust in the future. There are a lot of details around this API, for example, a task might point to a row-group that doesn't contain any relevant information, and we don't know until we open the file itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
Fokko commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1886487094 ## pyiceberg/table/__init__.py: ## @@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]: for data_entry in data_entries ] +def _target_split_size(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_SIZE, TableProperties.READ_SPLIT_SIZE_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_SIZE, table_value) # type: ignore + +def _loop_back(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_LOOKBACK, TableProperties.READ_SPLIT_LOOKBACK_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_LOOKBACK, table_value) # type: ignore + +def _split_open_file_cost(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, +TableProperties.READ_SPLIT_OPEN_FILE_COST, +TableProperties.READ_SPLIT_OPEN_FILE_COST_DEFAULT, +) +return property_as_int(self.options, TableProperties.READ_SPLIT_OPEN_FILE_COST, table_value) # type: ignore + +def plan_task(self) -> Iterable[CombinedFileScanTask]: Review Comment: Sorry, I think we had a race condition. Thoughts @jaychia @samster25? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add license checker [iceberg-cpp]
Fokko commented on code in PR #10: URL: https://github.com/apache/iceberg-cpp/pull/10#discussion_r1886489090 ## .github/workflows/license_check.yml: ## @@ -0,0 +1,26 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +name: "Run License Check" Review Comment: That's fancy, let me change to Skywalking! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
ConeyLiu commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1886465895 ## tests/integration/test_reads.py: ## @@ -873,3 +874,76 @@ def test_table_scan_empty_table(catalog: Catalog) -> None: result_table = tbl.scan().to_arrow() assert len(result_table) == 0 + + +@pytest.mark.integration +def test_plan_tasks(session_catalog: Catalog) -> None: +from pyiceberg.table import TableProperties + +table_name = "default.test_plan_tasks" +try: +session_catalog.drop_table(table_name) +except NoSuchTableError: +pass # Just to make sure that the table doesn't exist + +tbl = session_catalog.create_table( +table_name, +Schema( +NestedField(1, "number", LongType()), +), +properties={TableProperties.PARQUET_ROW_GROUP_LIMIT: "1"}, +) + +# Write 10 row groups, that should end up as 10 batches +entries = 10 +tbl.append( +pa.Table.from_pylist( +[ +{ +"number": number, +} +for number in range(entries) +], +) +) + +assert len(tbl.inspect.files()) == 1 + +plan_files = list(tbl.scan().plan_files()) +assert len(plan_files) == 1 +data_file = plan_files[0].file +assert data_file.split_offsets is not None and len(data_file.split_offsets) == 10 + +plan_tasks = list(tbl.scan(options={TableProperties.READ_SPLIT_SIZE: 1}).plan_task()) +assert len(plan_tasks) == 10 + +split_offsets = [] +for task in plan_tasks: +assert len(task.tasks) == 1 +split_offsets.append(task.tasks[0].start) + +assert split_offsets == plan_files[0].file.split_offsets + +split_sizes = [] +for i in range(1, len(data_file.split_offsets)): +split_sizes.append(data_file.split_offsets[i] - data_file.split_offsets[i - 1]) + +split_sizes.append(data_file.file_size_in_bytes - data_file.split_offsets[-1]) + +read_split_size = int(data_file.file_size_in_bytes / 4) +read_split_open_file_cost = 1 +read_split_lookback = 5 + +plan_tasks = list( Review Comment: @kevinjqliu added more tests here, pls give more advice. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
ConeyLiu commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1886464171 ## pyiceberg/table/__init__.py: ## @@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]: for data_entry in data_entries ] +def _target_split_size(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_SIZE, TableProperties.READ_SPLIT_SIZE_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_SIZE, table_value) # type: ignore + +def _loop_back(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_LOOKBACK, TableProperties.READ_SPLIT_LOOKBACK_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_LOOKBACK, table_value) # type: ignore + +def _split_open_file_cost(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, +TableProperties.READ_SPLIT_OPEN_FILE_COST, +TableProperties.READ_SPLIT_OPEN_FILE_COST_DEFAULT, +) +return property_as_int(self.options, TableProperties.READ_SPLIT_OPEN_FILE_COST, table_value) # type: ignore + +def plan_task(self) -> Iterable[CombinedFileScanTask]: Review Comment: > I assume it's intentional that we're not actually calling this in any of the methods (like to_arrow, etc) that actually execute a scan? Yes, I am not sure of the initial intention why we read the full file. So keep it not changing. > If there is no plan to call this directly in pyiceberg, I wonder who the intended consumers of this API would be? I would expect most query engines -- distributed or otherwise -- to have their own notions for how to optimize scan planning. I met OOM errors recently when we read the iceberg table with ray in distributed. Also, the parallelism is limited by the number of files when planning with files(files are rewritten into larger one, eg 512MB or 1GB). Like Spark/Flink, I think the `plan_tasks` should be useful for distributed reading. > In the case that this is primarily intended to be for the benefit of pyiceberg's own scan execution I would consider making this a private API. The distributed engine such as ray/daft would have to implement themself if they want to do `plan_tasks`. It would be better to put it into pyiceberg? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
ConeyLiu commented on PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#issuecomment-2545047787 Thanks @kevinjqliu @corleyma for your review. Pls take another look, thanks a lot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add license checker [iceberg-cpp]
Fokko commented on code in PR #10: URL: https://github.com/apache/iceberg-cpp/pull/10#discussion_r1886494679 ## .github/workflows/license_check.yml: ## @@ -0,0 +1,26 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +name: "Run License Check" Review Comment: Done 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add license checker [iceberg-cpp]
Fokko commented on code in PR #10: URL: https://github.com/apache/iceberg-cpp/pull/10#discussion_r1886494679 ## .github/workflows/license_check.yml: ## @@ -0,0 +1,26 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +name: "Run License Check" Review Comment: Done 👍 Tnx @wgtmac and @raulcd -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Add RemovePartitionSpecs table update [iceberg-rust]
Fokko commented on code in PR #804: URL: https://github.com/apache/iceberg-rust/pull/804#discussion_r1886516435 ## crates/iceberg/src/spec/table_metadata_builder.rs: ## @@ -740,6 +740,32 @@ impl TableMetadataBuilder { .set_default_partition_spec(Self::LAST_ADDED) } +/// Remove partition specs by their ids from the table metadata. +/// Does nothing if a spec id is not present. +/// +/// If the default partition spec is removed, it is re-added +/// upon build. +pub fn remove_partition_specs(mut self, spec_ids: &[i32]) -> Self { +let mut removed_specs = Vec::with_capacity(spec_ids.len()); + +self.metadata.partition_specs.retain(|k, _| { +if spec_ids.contains(k) { Review Comment: Why not turn the `spec_ids` into a `HashSet`? :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Generate release notes on website. [iceberg-rust]
Fokko commented on issue #593: URL: https://github.com/apache/iceberg-rust/issues/593#issuecomment-2545128114 I'm going to remove the milestone to unblock the 0.4.0 release. I'm happy to help here. In general, I think it would be great to add more the website (getting started etc) since it is pretty empty still. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Kafka Connect Sporadic Commit Delay [iceberg]
trolle4 commented on issue #11796: URL: https://github.com/apache/iceberg/issues/11796#issuecomment-2545131515 Also added some debug logging inside the class `org.apache.iceberg.connect.channel.CommitterImpl` ```java @Override public void save(Collection sinkRecords) { if (sinkRecords != null && !sinkRecords.isEmpty()) { worker.save(sinkRecords); if (sinkRecords.iterator().next().topic().contains("iceberg")) { LOG.info( "Saving {} records. Last offset {}", sinkRecords.size(), sinkRecords.stream() .reduce((first, second) -> second) .map(SinkRecord::kafkaOffset) .orElse(-1L)); } } processControlEvents(); } ``` Which produced the following logs for the above problem: ``` 2024-12-16 06:40:15,390 INFO Saving 1 records. Last offset 151629 [org.apache.iceberg.connect.channel.CommitterImpl] 2024-12-16 06:40:15,661 INFO Saving 1 records. Last offset 151630 [org.apache.iceberg.connect.channel.CommitterImpl] ``` So the message is read from the source kafka topic when it should be. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Store file io props to allow re-build it [iceberg-rust]
Xuanwo commented on code in PR #802: URL: https://github.com/apache/iceberg-rust/pull/802#discussion_r1887055534 ## crates/iceberg/src/io/file_io.rs: ## @@ -165,7 +175,7 @@ impl FileIOBuilder { /// Fetch the scheme string. /// /// The scheme_str will be empty if it's None. -pub(crate) fn into_parts(self) -> (String, HashMap) { +pub fn into_parts(self) -> (String, HashMap) { Review Comment: > What's the goal of the `scheme_str`? If we want to have a specific impl, we could also pass that into the configuration itself. This is of course bound to a language, [for example `py-io-impl`](https://py.iceberg.apache.org/configuration/#fileio) and in Java it is just `io-impl`. Makes sense to me. I exposed the necessary elements as they are to build `FileIO` for now. I can address this in another PR that removes the requirement for `scheme` to build `FileIO`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Add RemovePartitionSpecs table update [iceberg-rust]
c-thiel commented on PR #804: URL: https://github.com/apache/iceberg-rust/pull/804#issuecomment-2546189749 @Fokko resolved merge conflicts. Could you re-approve? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Fix numeric overflow of timestamp nano literal [iceberg]
amogh-jahagirdar commented on code in PR #11775: URL: https://github.com/apache/iceberg/pull/11775#discussion_r1887202656 ## api/src/main/java/org/apache/iceberg/expressions/Literals.java: ## @@ -300,8 +300,7 @@ public Literal to(Type type) { case TIMESTAMP: return (Literal) new TimestampLiteral(value()); case TIMESTAMP_NANO: - // assume micros and convert to nanos to match the behavior in the timestamp case above - return new TimestampLiteral(value()).to(type); + return (Literal) new TimestampNanoLiteral(value()); Review Comment: This change seems correct to me, the previous behavior for this was to assume the value was in microseconds and then pass that through to TimestampLiteral but that can overflow and does not actually represent a nanosecond timestamp! ## api/src/test/java/org/apache/iceberg/expressions/TestTimestampLiteralConversions.java: ## @@ -107,6 +108,28 @@ public void testTimestampMicrosToDateConversion() { assertThat(dateOrdinal).isEqualTo(-1); } + @Test + public void testTimestampNanoWithLongLiteral() { Review Comment: could we also add a case for `withZone`? ## api/src/test/java/org/apache/iceberg/types/TestConversions.java: ## @@ -111,9 +111,9 @@ public void testByteBufferConversions() { assertConversion( 4L, TimestampNanoType.withZone(), new byte[] {0, -124, -41, 23, 0, 0, 0, 0}); assertThat(Literal.of(40L).to(TimestampNanoType.withoutZone()).toByteBuffer().array()) -.isEqualTo(new byte[] {0, -124, -41, 23, 0, 0, 0, 0}); +.isEqualTo(new byte[] {-128, 26, 6, 0, 0, 0, 0, 0}); assertThat(Literal.of(40L).to(TimestampNanoType.withZone()).toByteBuffer().array()) Review Comment: I'm a bit confused how the original assertion was passing? Shouldn't have this always been equivalent to `{-128, 26, 6, 0, 0, 0, 0, 0}`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
danielcweeks commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1887271043 ## core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest; + +import java.util.List; +import java.util.stream.Collectors; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.immutables.value.Value; + +/** + * Represents a group of HTTP headers. + * + * Header name comparison in this class is always case-insensitive, in accordance with RFC 2616. + * + * This class exposes methods to convert to and from different representations such as maps and + * multimap, for easier access and manipulation – especially when dealing with multiple headers with + * the same name. + */ +@Value.Style(depluralize = true) +@Value.Immutable +@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"}) +public interface HTTPHeaders { + + HTTPHeaders EMPTY = of(); + + /** Returns all the header entries in this group. */ + List entries(); Review Comment: Nit: shouldn't this be `Set`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Auth Manager API part 1: HTTPRequest, HTTPHeader [iceberg]
danielcweeks commented on code in PR #11769: URL: https://github.com/apache/iceberg/pull/11769#discussion_r1887271278 ## core/src/main/java/org/apache/iceberg/rest/HTTPHeaders.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.rest; + +import java.util.List; +import java.util.stream.Collectors; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.immutables.value.Value; + +/** + * Represents a group of HTTP headers. + * + * Header name comparison in this class is always case-insensitive, in accordance with RFC 2616. + * + * This class exposes methods to convert to and from different representations such as maps and + * multimap, for easier access and manipulation – especially when dealing with multiple headers with + * the same name. + */ +@Value.Style(depluralize = true) +@Value.Immutable +@SuppressWarnings({"ImmutablesStyle", "SafeLoggingPropagation"}) +public interface HTTPHeaders { + + HTTPHeaders EMPTY = of(); + + /** Returns all the header entries in this group. */ + List entries(); + + /** Returns all the entries in this group for the given name (case-insensitive). */ + default List entries(String name) { Review Comment: Same here, set -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add Variant implementation to read serialized objects [iceberg]
rdblue commented on code in PR #11415: URL: https://github.com/apache/iceberg/pull/11415#discussion_r1887270705 ## core/src/main/java/org/apache/iceberg/variants/Variants.java: ## @@ -0,0 +1,276 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.variants; + +import java.math.BigDecimal; +import java.nio.ByteBuffer; +import java.util.List; +import java.util.Map; +import org.apache.iceberg.util.DateTimeUtil; + +public class Variants { + private Variants() {} + + enum LogicalType { Review Comment: I've left these package-private for now. That gives us the flexibility to remove them later if needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Bump `pyiceberg_core` to 0.4.0 [iceberg-rust]
Xuanwo merged PR #808: URL: https://github.com/apache/iceberg-rust/pull/808 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Implementing namespace_exists function on the REST Catalog [iceberg-python]
kevinjqliu commented on code in PR #1434: URL: https://github.com/apache/iceberg-python/pull/1434#discussion_r1887152140 ## tests/catalog/test_rest.py: ## @@ -681,6 +681,51 @@ def test_update_namespace_properties_200(rest_mock: Mocker) -> None: assert response == PropertiesUpdateSummary(removed=[], updated=["prop"], missing=["abc"]) +def test_namespace_exists_200(rest_mock: Mocker) -> None: +rest_mock.head( +f"{TEST_URI}v1/namespaces/fokko", +status_code=200, +request_headers=TEST_HEADERS, +) +catalog = RestCatalog("rest", uri=TEST_URI, token=TEST_TOKEN) + +assert catalog.namespace_exists("fokko") Review Comment: nit: the tests here are only verifying against the mock response. it would be great to test against the actual REST catalog in integration tests. i noticed that does not exist right now, perhaps we can start a `tests/integration/test_rest_catalog.py` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Tracking issues of iceberg rust v0.4.0 Release [iceberg-rust]
Xuanwo commented on issue #739: URL: https://github.com/apache/iceberg-rust/issues/739#issuecomment-2546101308 > Could someone help review this PR so we can get the versions synced to 0.4.0? #808 Merged. Let's move! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Deserialize NestedField initial-default and write-default Attributes [iceberg-python]
kevinjqliu commented on code in PR #1432: URL: https://github.com/apache/iceberg-python/pull/1432#discussion_r1887143738 ## pyiceberg/types.py: ## @@ -328,8 +328,8 @@ def __init__( data["type"] = data["type"] if "type" in data else field_type data["required"] = required data["doc"] = doc Review Comment: I went down a rabbit hole, here's what I learned. 1. Using `__init__` with a pydantic model is kind of an anti-pattern; the pydantic model typically handles all the initialization/validation. 2. `__init__` here is to provide the ability to initialize `NestedField` with positional args, i.e. `NestedField(1, "blah")` For backward compatibility, we can't change `NestedField` to not take positional args. But we can make this `__init__` (and other `__init__`s) more resilient to bugs with something like ``` def __init__(self, *args, **kwargs): # implements `__init__` to support initialization with positional arguments if args: field_names = list(self.model_fields.keys()) # Gets all field names defined on the model if len(args) > len(field_names): raise TypeError(f"Too many positional arguments. Expected at most {len(field_names)}") kwargs.update(dict(zip(field_names, args))) # Maps positional args to field names # Let Pydantic handle aliases and validation super().__init__(**kwargs) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Deserialize NestedField initial-default and write-default Attributes [iceberg-python]
kevinjqliu commented on code in PR #1432: URL: https://github.com/apache/iceberg-python/pull/1432#discussion_r1887143738 ## pyiceberg/types.py: ## @@ -328,8 +328,8 @@ def __init__( data["type"] = data["type"] if "type" in data else field_type data["required"] = required data["doc"] = doc Review Comment: I went down a rabbit hole, here's what I learned. 1. Using `__init__` with a pydantic model is kind of an anti-pattern; the pydantic model typically handles all the initialization/validation. 2. `__init__` here is to provide the ability to initialize `NestedField` with positional args, i.e. `NestedField(1, "blah")` For backward compatibility, we can't change `NestedField` to not take positional args. But we can make this `__init__` (and other `__init__`s) more resilient to bugs with something like ``` def __init__(self, *args, **kwargs): # implements `__init__` to support initialization with positional arguments if args: field_names = list(self.model_fields.keys()) # Gets all field names defined on the model if len(args) > len(field_names): raise TypeError(f"Too many positional arguments. Expected at most {len(field_names)}") kwargs.update(dict(zip(field_names, args))) # Maps positional args to field names # Let Pydantic handle aliases and validation super().__init__(**kwargs) ``` This doesn't check for when using both positional and keyword args, i.e. `NestedField(1, field_id=10)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] fix: gurantee the deserialize order of struct is same as the struct type [iceberg-rust]
Xuanwo commented on code in PR #795: URL: https://github.com/apache/iceberg-rust/pull/795#discussion_r1887090289 ## crates/iceberg/src/spec/values.rs: ## @@ -3604,4 +3608,29 @@ mod tests { assert_eq!(result, expected); } + +#[test] +fn test_record_ser_de() { +let expected_literal = Literal::Struct(Struct::from_iter(vec![ +Some(Literal::Primitive(PrimitiveLiteral::Int(1))), +Some(Literal::Primitive(PrimitiveLiteral::String( +"bar".to_string(), +))), +None, +Some(Literal::Primitive(PrimitiveLiteral::Int(1000))), +])); +let fields = Type::Struct(StructType::new(vec![ +NestedField::required(2, "id", Type::Primitive(PrimitiveType::Int)).into(), +NestedField::optional(3, "name", Type::Primitive(PrimitiveType::String)).into(), +NestedField::optional(4, "address", Type::Primitive(PrimitiveType::String)).into(), +NestedField::required(5, "extra", Type::Primitive(PrimitiveType::Int)).into(), +])); + +let raw_literal = RawLiteral::try_from(expected_literal.clone(), &fields).unwrap(); +let serialized = serde_json::to_string(&raw_literal).unwrap(); +let deserialized: RawLiteral = serde_json::from_str(&serialized).unwrap(); +let deserialized_literal = deserialized.try_into(&fields).unwrap().unwrap(); + +assert_eq!(expected_literal, deserialized_literal); Review Comment: Mark unresolve for my newly added comment: https://github.com/apache/iceberg-rust/pull/795#discussion_r1887077718 Sorry @Fokko :smile: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] fix: gurantee the deserialize order of struct is same as the struct type [iceberg-rust]
Fokko commented on code in PR #795: URL: https://github.com/apache/iceberg-rust/pull/795#discussion_r1887083904 ## crates/iceberg/src/spec/values.rs: ## @@ -3604,4 +3608,29 @@ mod tests { assert_eq!(result, expected); } + +#[test] +fn test_record_ser_de() { +let expected_literal = Literal::Struct(Struct::from_iter(vec![ +Some(Literal::Primitive(PrimitiveLiteral::Int(1))), +Some(Literal::Primitive(PrimitiveLiteral::String( +"bar".to_string(), +))), +None, +Some(Literal::Primitive(PrimitiveLiteral::Int(1000))), +])); +let fields = Type::Struct(StructType::new(vec![ +NestedField::required(2, "id", Type::Primitive(PrimitiveType::Int)).into(), +NestedField::optional(3, "name", Type::Primitive(PrimitiveType::String)).into(), +NestedField::optional(4, "address", Type::Primitive(PrimitiveType::String)).into(), +NestedField::required(5, "extra", Type::Primitive(PrimitiveType::Int)).into(), +])); + +let raw_literal = RawLiteral::try_from(expected_literal.clone(), &fields).unwrap(); +let serialized = serde_json::to_string(&raw_literal).unwrap(); +let deserialized: RawLiteral = serde_json::from_str(&serialized).unwrap(); +let deserialized_literal = deserialized.try_into(&fields).unwrap().unwrap(); + +assert_eq!(expected_literal, deserialized_literal); Review Comment: Ah, missed that. Let me unblock this for now 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Failed to read iceberg TPCH generated by snowflake [iceberg-rust]
Xuanwo commented on issue #790: URL: https://github.com/apache/iceberg-rust/issues/790#issuecomment-2546018588 > @Xuanwo are you running into anything else? Curious to learn if it works :) Hi, it works great! Thank you for the fix (just in time I needed it!). Let's close. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Failed to read iceberg TPCH generated by snowflake [iceberg-rust]
Xuanwo closed issue #790: Failed to read iceberg TPCH generated by snowflake URL: https://github.com/apache/iceberg-rust/issues/790 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Bump `pyiceberg_core` to 0.4.0 [iceberg-rust]
sungwy opened a new pull request, #808: URL: https://github.com/apache/iceberg-rust/pull/808 We are releasing the bindings simultaneously with the core rust crates. Bump the version to be consistent with the core rust crates for simplicity. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add Variant implementation to read serialized objects [iceberg]
rdblue commented on code in PR #11415: URL: https://github.com/apache/iceberg/pull/11415#discussion_r1887268226 ## core/src/main/java/org/apache/iceberg/variants/PrimitiveWrapper.java: ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.variants; + +import java.math.BigDecimal; +import java.nio.ByteBuffer; +import java.nio.ByteOrder; +import java.nio.charset.StandardCharsets; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.variants.Variants.Primitives; + +class PrimitiveWrapper implements VariantPrimitive { + private static final byte NULL_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_NULL); + private static final byte TRUE_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_TRUE); + private static final byte FALSE_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_FALSE); + private static final byte INT8_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_INT8); + private static final byte INT16_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_INT16); + private static final byte INT32_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_INT32); + private static final byte INT64_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_INT64); + private static final byte FLOAT_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_FLOAT); + private static final byte DOUBLE_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_DOUBLE); + private static final byte DATE_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_DATE); + private static final byte TIMESTAMPTZ_HEADER = + VariantUtil.primitiveHeader(Primitives.TYPE_TIMESTAMPTZ); + private static final byte TIMESTAMPNTZ_HEADER = + VariantUtil.primitiveHeader(Primitives.TYPE_TIMESTAMPNTZ); + private static final byte DECIMAL4_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_DECIMAL4); + private static final byte DECIMAL8_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_DECIMAL8); + private static final byte DECIMAL16_HEADER = + VariantUtil.primitiveHeader(Primitives.TYPE_DECIMAL16); + private static final byte BINARY_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_BINARY); + private static final byte STRING_HEADER = VariantUtil.primitiveHeader(Primitives.TYPE_STRING); + + private final Variants.PhysicalType type; + private final T value; + private ByteBuffer buffer = null; + + PrimitiveWrapper(Variants.PhysicalType type, T value) { +this.type = type; +this.value = value; + } + + @Override + public Variants.PhysicalType type() { +return type; + } + + @Override + public T get() { +return value; + } + + @Override + public int sizeInBytes() { +switch (type()) { + case NULL: + case BOOLEAN_TRUE: + case BOOLEAN_FALSE: +return 1; // 1 header only + case INT8: +return 2; // 1 header + 1 value + case INT16: +return 3; // 1 header + 2 value + case INT32: + case DATE: + case FLOAT: +return 5; // 1 header + 4 value + case INT64: + case DOUBLE: + case TIMESTAMPTZ: + case TIMESTAMPNTZ: +return 9; // 1 header + 8 value + case DECIMAL4: +return 6; // 1 header + 1 scale + 4 unscaled value + case DECIMAL8: +return 10; // 1 header + 1 scale + 8 unscaled value + case DECIMAL16: +return 18; // 1 header + 1 scale + 16 unscaled value + case BINARY: +return 5 + ((ByteBuffer) value).remaining(); // 1 header + 4 length + value length Review Comment: @aihuaxu, if you modify the buffer then it will change the value. What you're suggesting is equivalent to other modifications, like this: ```java byte[] arr = new byte[] { 0x0A, 0x0B, 0x0C, 0x0D }; VariantPrimitive primitive = Variants.of(ByteBuffer.wrap(arr)); // modify the primitive value arr[0] = 0; ... ``` If you modify the value then it is going to be different. We could duplicate the incoming buffer to avoid the case where `position` is modified, but not the case where the backing byte array is modified. That would be okay, but I don't see a lot of value in w
Re: [PR] Spark 3.4: Add REST catalog to Spark integration tests [iceberg]
danielcweeks merged PR #11698: URL: https://github.com/apache/iceberg/pull/11698 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Feat: support aliyun oss backend. [iceberg-go]
zeroshade commented on PR #216: URL: https://github.com/apache/iceberg-go/pull/216#issuecomment-2546599302 the Java iceberg impl has some mocking and test setups for Aliyun as seen [here](https://github.com/apache/iceberg/tree/main/aliyun/src/test/java/org/apache/iceberg/aliyun/oss) would it make sense to create a similar utility in Go? or is it not worth that effort there? Is there *any* sort of testing we could add here for this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Prep 0.4.0 release [iceberg-rust]
kevinjqliu commented on code in PR #809: URL: https://github.com/apache/iceberg-rust/pull/809#discussion_r1887538305 ## crates/iceberg/src/spec/snapshot.rs: ## @@ -192,13 +191,6 @@ impl Snapshot { partition_type_provider, ) } - -pub(crate) fn log(&self) -> SnapshotLog { Review Comment: is this intentional? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Prep 0.4.0 release [iceberg-rust]
kevinjqliu commented on code in PR #809: URL: https://github.com/apache/iceberg-rust/pull/809#discussion_r1887539719 ## crates/iceberg/src/spec/table_metadata.rs: ## @@ -398,33 +397,6 @@ impl TableMetadata { self.partition_statistics.get(&snapshot_id) } -/// Append snapshot to table -#[deprecated( -since = "0.4.0", -note = "please use `TableMetadataBuilder.set_branch_snapshot` instead" -)] Review Comment: is this deprecated since `0.4.0`? if thats the case, we shouldn't remove right now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: add note for `day` transform [iceberg]
kevinjqliu commented on code in PR #11749: URL: https://github.com/apache/iceberg/pull/11749#discussion_r1887580518 ## format/spec.md: ## @@ -454,7 +454,7 @@ Partition field IDs must be reused if an existing partition spec contains an equ | **`truncate[W]`** | Value truncated to width `W` (see below) | `int`, `long`, `decimal`, `string`, `binary` | Source type | | **`year`**| Extract a date or timestamp year, as years from 1970 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | | **`month`** | Extract a date or timestamp month, as months from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | -| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | +| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` (the physical type should be an `int`, but the the logical type should be a `date`) | Review Comment: I also encountered this before in [apache/iceberg-python#1208](https://github.com/apache/iceberg-python/pull/1208) and [apache/iceberg-python#1211](https://github.com/apache/iceberg-python/pull/1211). There's also this accompanying devlist thread https://lists.apache.org/thread/2gq7b54nvc9q6f1j08l9lnzgm5onkmx5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Implementing namespace_exists function on the REST Catalog [iceberg-python]
sungwy commented on PR #1434: URL: https://github.com/apache/iceberg-python/pull/1434#issuecomment-2546756342 Hi @AhmedNader42 - thank you very much for picking up this issue and getting a working solution up already! I'm in agreement with @kevinjqliu 's comment, that it would be great to have an integration test as well, and I think it is well within the scope of this PR. https://github.com/apache/iceberg-python/blob/main/tests/integration/test_reads.py already uses `session_catalog` in its tests, which is a rest catalog fixture defined in the `conftest.py`, and I think it would be a good place to add this test case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Fix comment on `WRITE_OBJECT_STORE_PARTITIONED_PATHS` table property [iceberg]
smaheshwar-pltr opened a new pull request, #11798: URL: https://github.com/apache/iceberg/pull/11798 The code comment above the `WRITE_OBJECT_STORE_PARTITIONED_PATHS` constant in `TableProperties` was incorrect - partition values are excluded when this property is set to *false*, not true, and object store is enabled, as stated in the [docs](https://iceberg.apache.org/docs/1.7.0/aws/#object-store-file-layout). This PR fixes the comment to be consistent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Avro: Support default values for generic data [iceberg]
Fokko commented on code in PR #11786: URL: https://github.com/apache/iceberg/pull/11786#discussion_r1887585686 ## core/src/main/java/org/apache/iceberg/data/avro/PlannedDataReader.java: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.data.avro; + +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.function.Supplier; +import org.apache.avro.LogicalType; +import org.apache.avro.LogicalTypes; +import org.apache.avro.Schema; +import org.apache.avro.io.DatumReader; +import org.apache.avro.io.Decoder; +import org.apache.iceberg.avro.AvroSchemaUtil; +import org.apache.iceberg.avro.AvroWithPartnerVisitor; +import org.apache.iceberg.avro.SupportsRowPosition; +import org.apache.iceberg.avro.ValueReader; +import org.apache.iceberg.avro.ValueReaders; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.types.Type; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.Pair; + +public class PlannedDataReader implements DatumReader, SupportsRowPosition { + + public static PlannedDataReader create(org.apache.iceberg.Schema expectedSchema) { +return create(expectedSchema, ImmutableMap.of()); + } + + public static PlannedDataReader create( + org.apache.iceberg.Schema expectedSchema, Map idToConstant) { +return new PlannedDataReader<>(expectedSchema, idToConstant); + } + + private final org.apache.iceberg.Schema expectedSchema; + private final Map idToConstant; + private ValueReader reader; + + protected PlannedDataReader( + org.apache.iceberg.Schema expectedSchema, Map idToConstant) { +this.expectedSchema = expectedSchema; +this.idToConstant = idToConstant; + } + + @Override + @SuppressWarnings("unchecked") + public void setSchema(Schema fileSchema) { +this.reader = +(ValueReader) +AvroWithPartnerVisitor.visit( +expectedSchema.asStruct(), +fileSchema, +new ReadBuilder(idToConstant), +AvroWithPartnerVisitor.FieldIDAccessors.get()); + } + + @Override + public T read(T reuse, Decoder decoder) throws IOException { +return reader.read(decoder, reuse); + } + + @Override + public void setRowPositionSupplier(Supplier posSupplier) { +if (reader instanceof SupportsRowPosition) { + ((SupportsRowPosition) reader).setRowPositionSupplier(posSupplier); +} + } + + private static class ReadBuilder extends AvroWithPartnerVisitor> { +private final Map idToConstant; + +private ReadBuilder(Map idToConstant) { + this.idToConstant = idToConstant; +} + +@Override +public ValueReader record(Type partner, Schema record, List> fieldReaders) { + if (partner == null) { +return ValueReaders.skipStruct(fieldReaders); + } + + Types.StructType expected = partner.asStructType(); + List>> readPlan = + ValueReaders.buildReadPlan(expected, record, fieldReaders, idToConstant); + + return GenericReaders.struct(readPlan, expected); +} + +@Override +public ValueReader union(Type partner, Schema union, List> options) { + return ValueReaders.union(options); +} + +@Override +public ValueReader array(Type ignored, Schema array, ValueReader elementReader) { + return ValueReaders.array(elementReader); +} + +@Override +public ValueReader arrayMap( +Type ignored, Schema map, ValueReader keyReader, ValueReader valueReader) { + return ValueReaders.arrayMap(keyReader, valueReader); +} + +@Override +public ValueReader map(Type ignored, Schema map, ValueReader valueReader) { + return ValueReaders.map(ValueReaders.strings(), valueReader); +} + +@Override +public ValueReader primitive(Type ignored, Schema primitive) { + LogicalType logicalType = primitive.getLogicalType(); + if (logicalType != null) { +switch (logicalType.getName()) { + case "date": +return GenericReaders.dates(); + + case "time-micros": +return GenericReaders.times(); + + case "timestamp-micros
Re: [PR] Spec: Support geo type [iceberg]
jiayuasu commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1887549063 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional_ | **`110 null_value_counts`** | `map<121: int, 122: long>` | Map from column id to number of null values in the column | | _optional_ | _optional_ | _optional_ | **`137 nan_value_counts`** | `map<138: int, 139: long>` | Map from column id to number of NaN values in the column | | _optional_ | _optional_ | _optional_ | **`111 distinct_counts`**| `map<123: int, 124: long>` | Map from column id to number of distinct values in the column; distinct counts must be derived using values in the file by counting or using sketches, but not using methods like merging existing distinct counts | -| _optional_ | _optional_ | _optional_ | **`125 lower_bounds`** | `map<126: int, 127: binary>`| Map from column id to lower bound in the column serialized as binary [1]. Each value must be less than or equal to all non-null, non-NaN values in the column for the file [2] | -| _optional_ | _optional_ | _optional_ | **`128 upper_bounds`** | `map<129: int, 130: binary>`| Map from column id to upper bound in the column serialized as binary [1]. Each value must be greater than or equal to all non-null, non-Nan values in the column for the file [2] | +| _optional_ | _optional_ | _optional_ | **`125 lower_bounds`** | `map<126: int, 127: binary>`| Map from column id to lower bound in the column serialized as binary [1]. Each value must be less than or equal to all non-null, non-NaN values in the column for the file [2]. See [7] for`geometry` and [8] for `geography`. | +| _optional_ | _optional_ | _optional_ | **`128 upper_bounds`** | `map<129: int, 130: binary>`| Map from column id to upper bound in the column serialized as binary [1]. Each value must be greater than or equal to all non-null, non-Nan values in the column for the file [2]. See [9] for `geometry` and [10] for `geography`. | Review Comment: @szehon-ho I found it from my PR: https://github.com/szehon-ho/iceberg/pull/2/files Can you figure out a way to add it back? > The concepts of westernmost and easternmost values are explicitly introduced to address cases involving antimeridian crossing, where the `lower_bound` may be greater than `upper_bound`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] fix: gurantee the deserialize order of struct is same as the struct type [iceberg-rust]
Fokko commented on code in PR #795: URL: https://github.com/apache/iceberg-rust/pull/795#discussion_r1887570509 ## crates/iceberg/src/spec/values.rs: ## @@ -3604,4 +3608,29 @@ mod tests { assert_eq!(result, expected); } + +#[test] +fn test_record_ser_de() { +let expected_literal = Literal::Struct(Struct::from_iter(vec![ +Some(Literal::Primitive(PrimitiveLiteral::Int(1))), +Some(Literal::Primitive(PrimitiveLiteral::String( +"bar".to_string(), +))), +None, +Some(Literal::Primitive(PrimitiveLiteral::Int(1000))), +])); +let fields = Type::Struct(StructType::new(vec![ +NestedField::required(2, "id", Type::Primitive(PrimitiveType::Int)).into(), +NestedField::optional(3, "name", Type::Primitive(PrimitiveType::String)).into(), +NestedField::optional(4, "address", Type::Primitive(PrimitiveType::String)).into(), +NestedField::required(5, "extra", Type::Primitive(PrimitiveType::Int)).into(), +])); + +let raw_literal = RawLiteral::try_from(expected_literal.clone(), &fields).unwrap(); +let serialized = serde_json::to_string(&raw_literal).unwrap(); +let deserialized: RawLiteral = serde_json::from_str(&serialized).unwrap(); +let deserialized_literal = deserialized.try_into(&fields).unwrap().unwrap(); + +assert_eq!(expected_literal, deserialized_literal); Review Comment: Ah, around the same time! I share your concern, and I would like to check later on if we do the field-ID projection properly, but I didn't want to block the release 👍 For Avro it is very simple, it will always be decoded in the same order as the schema (otherwise it will just break). That said, we can rely on the order for V1, but use field-ID-based projection for V2 tables. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Parquet: Implement defaults for generic data [iceberg]
rdblue merged PR #11785: URL: https://github.com/apache/iceberg/pull/11785 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Parquet: Implement defaults for generic data [iceberg]
rdblue commented on PR #11785: URL: https://github.com/apache/iceberg/pull/11785#issuecomment-2546711840 Merging this. Thanks for the review, @Fokko! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Avro: Support default values for generic data [iceberg]
rdblue commented on code in PR #11786: URL: https://github.com/apache/iceberg/pull/11786#discussion_r1887573521 ## core/src/main/java/org/apache/iceberg/data/avro/DataReader.java: ## @@ -36,6 +36,10 @@ import org.apache.iceberg.types.Type; import org.apache.iceberg.types.Types; +/** + * @deprecated will be removed in 2.0.0; use {@link PlannedDataReader} instead. + */ +@Deprecated Review Comment: Most of the changed files in this PR are to avoid using this class now that it is deprecated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Prep 0.4.0 release [iceberg-rust]
sungwy commented on code in PR #809: URL: https://github.com/apache/iceberg-rust/pull/809#discussion_r1887569858 ## crates/iceberg/src/spec/table_metadata.rs: ## @@ -398,33 +397,6 @@ impl TableMetadata { self.partition_statistics.get(&snapshot_id) } -/// Append snapshot to table -#[deprecated( -since = "0.4.0", -note = "please use `TableMetadataBuilder.set_branch_snapshot` instead" -)] Review Comment: Yes, my bad - I read this as having to remove it by `0.4.0`, reverting ## crates/iceberg/src/spec/snapshot.rs: ## @@ -192,13 +191,6 @@ impl Snapshot { partition_type_provider, ) } - -pub(crate) fn log(&self) -> SnapshotLog { Review Comment: Yes, it's because deprecated function `append_snapshot` is the only function that calls it. Since we are reverting that change, I will revert this as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Support geo type [iceberg]
szehon-ho commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1887625905 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional_ | **`110 null_value_counts`** | `map<121: int, 122: long>` | Map from column id to number of null values in the column | | _optional_ | _optional_ | _optional_ | **`137 nan_value_counts`** | `map<138: int, 139: long>` | Map from column id to number of NaN values in the column | | _optional_ | _optional_ | _optional_ | **`111 distinct_counts`**| `map<123: int, 124: long>` | Map from column id to number of distinct values in the column; distinct counts must be derived using values in the file by counting or using sketches, but not using methods like merging existing distinct counts | -| _optional_ | _optional_ | _optional_ | **`125 lower_bounds`** | `map<126: int, 127: binary>`| Map from column id to lower bound in the column serialized as binary [1]. Each value must be less than or equal to all non-null, non-NaN values in the column for the file [2] | -| _optional_ | _optional_ | _optional_ | **`128 upper_bounds`** | `map<129: int, 130: binary>`| Map from column id to upper bound in the column serialized as binary [1]. Each value must be greater than or equal to all non-null, non-Nan values in the column for the file [2] | +| _optional_ | _optional_ | _optional_ | **`125 lower_bounds`** | `map<126: int, 127: binary>`| Map from column id to lower bound in the column serialized as binary [1]. Each value must be less than or equal to all non-null, non-NaN values in the column for the file [2]. See [7] for`geometry` and [8] for `geography`. | +| _optional_ | _optional_ | _optional_ | **`128 upper_bounds`** | `map<129: int, 130: binary>`| Map from column id to upper bound in the column serialized as binary [1]. Each value must be greater than or equal to all non-null, non-Nan values in the column for the file [2]. See [9] for `geometry` and [10] for `geography`. | Review Comment: whoops, must have accidentally removed as there are too many versions. Put it back, and moved the other common note about 180/90 bound to this point as well. This is for geography only right? (the new point is for geography type) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spec: Support geo type [iceberg]
szehon-ho commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1887625905 ## format/spec.md: ## @@ -584,8 +589,8 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _optional_ | **`110 null_value_counts`** | `map<121: int, 122: long>` | Map from column id to number of null values in the column | | _optional_ | _optional_ | _optional_ | **`137 nan_value_counts`** | `map<138: int, 139: long>` | Map from column id to number of NaN values in the column | | _optional_ | _optional_ | _optional_ | **`111 distinct_counts`**| `map<123: int, 124: long>` | Map from column id to number of distinct values in the column; distinct counts must be derived using values in the file by counting or using sketches, but not using methods like merging existing distinct counts | -| _optional_ | _optional_ | _optional_ | **`125 lower_bounds`** | `map<126: int, 127: binary>`| Map from column id to lower bound in the column serialized as binary [1]. Each value must be less than or equal to all non-null, non-NaN values in the column for the file [2] | -| _optional_ | _optional_ | _optional_ | **`128 upper_bounds`** | `map<129: int, 130: binary>`| Map from column id to upper bound in the column serialized as binary [1]. Each value must be greater than or equal to all non-null, non-Nan values in the column for the file [2] | +| _optional_ | _optional_ | _optional_ | **`125 lower_bounds`** | `map<126: int, 127: binary>`| Map from column id to lower bound in the column serialized as binary [1]. Each value must be less than or equal to all non-null, non-NaN values in the column for the file [2]. See [7] for`geometry` and [8] for `geography`. | +| _optional_ | _optional_ | _optional_ | **`128 upper_bounds`** | `map<129: int, 130: binary>`| Map from column id to upper bound in the column serialized as binary [1]. Each value must be greater than or equal to all non-null, non-Nan values in the column for the file [2]. See [9] for `geometry` and [10] for `geography`. | Review Comment: whoops, must have accidentally removed it, was dealing with too many versions :). Put it back, and moved the other common note about 180/90 bound to this point as well. This is for geography only right? (the new point is for geography type) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
ConeyLiu commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1886441268 ## pyiceberg/manifest.py: ## @@ -105,6 +105,9 @@ def _missing_(cls, value: object) -> Union[None, str]: return member return None +def is_splittable(self) -> bool: +return self.name == "AVRO" or self.name == "PARQUET" or self.name == "ORC" Review Comment: changed to enum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
ConeyLiu commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1886445512 ## tests/integration/test_reads.py: ## @@ -873,3 +873,48 @@ def test_table_scan_empty_table(catalog: Catalog) -> None: result_table = tbl.scan().to_arrow() assert len(result_table) == 0 + + +@pytest.mark.integration +def test_plan_tasks(session_catalog: Catalog) -> None: +from pyiceberg.table import TableProperties + +table_name = "default.test_plan_tasks" +try: +session_catalog.drop_table(table_name) +except NoSuchTableError: +pass # Just to make sure that the table doesn't exist + +tbl = session_catalog.create_table( +table_name, +Schema( +NestedField(1, "number", LongType()), +), +properties={TableProperties.PARQUET_ROW_GROUP_LIMIT: "1"}, +) + +# Write 10 row groups, that should end up as 10 batches +entries = 10 +tbl.append( +pa.Table.from_pylist( +[ +{ +"number": number, +} +for number in range(entries) +], +) +) + Review Comment: added an assert. Here it should only have one file but 10 row groups. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
ConeyLiu commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1886444111 ## pyiceberg/table/__init__.py: ## @@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]: for data_entry in data_entries ] +def _target_split_size(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_SIZE, TableProperties.READ_SPLIT_SIZE_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_SIZE, table_value) # type: ignore + +def _loop_back(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_LOOKBACK, TableProperties.READ_SPLIT_LOOKBACK_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_LOOKBACK, table_value) # type: ignore + +def _split_open_file_cost(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, +TableProperties.READ_SPLIT_OPEN_FILE_COST, +TableProperties.READ_SPLIT_OPEN_FILE_COST_DEFAULT, +) +return property_as_int(self.options, TableProperties.READ_SPLIT_OPEN_FILE_COST, table_value) # type: ignore + +def plan_task(self) -> Iterable[CombinedFileScanTask]: +"""Plan balanced combined tasks for this scan by splitting large and combining small tasks. + +Returns: +List of CombinedFileScanTasks +""" +split_size = self._target_split_size() +loop_back = self._loop_back() +open_file_cost = self._split_open_file_cost() + +def split(task: FileScanTask) -> List[FileScanTask]: +data_file = task.file +if not data_file.split_offsets: +return [task] + +split_offsets = data_file.split_offsets +if not all(split_offsets[i] <= split_offsets[i + 1] for i in range(len(split_offsets) - 1)): +# split offsets are strictly ascending +return [task] + +all_tasks = [] +for i in range(len(split_offsets) - 1): +all_tasks.append( +FileScanTask(data_file, task.delete_files, split_offsets[i], split_offsets[i + 1] - split_offsets[i]) +) + +all_tasks.append( +FileScanTask(data_file, task.delete_files, split_offsets[-1], data_file.file_size_in_bytes - split_offsets[-1]) +) + +return all_tasks + +def weight_func(task: FileScanTask) -> int: +return max(task.size_in_bytes(), (1 + len(task.delete_files)) * open_file_cost) Review Comment: yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
ConeyLiu commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1886443611 ## pyiceberg/table/__init__.py: ## @@ -191,6 +193,15 @@ class TableProperties: DELETE_MODE_MERGE_ON_READ = "merge-on-read" DELETE_MODE_DEFAULT = DELETE_MODE_COPY_ON_WRITE +READ_SPLIT_SIZE = "read.split.target-size" Review Comment: added to docs, those options exist in table properties as well https://iceberg.apache.org/docs/latest/configuration/#read-properties -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
ConeyLiu commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1886446277 ## pyiceberg/table/__init__.py: ## @@ -1229,7 +1240,8 @@ def with_case_sensitive(self: S, case_sensitive: bool = True) -> S: class ScanTask(ABC): -pass +def size_in_bytes(self) -> int: Review Comment: sure, marked as abstract -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
ConeyLiu commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1886446918 ## pyiceberg/table/__init__.py: ## @@ -1253,6 +1265,22 @@ def __init__( self.start = start or 0 self.length = length or data_file.file_size_in_bytes +def size_in_bytes(self) -> int: +return self.length + sum(f.file_size_in_bytes for f in self.delete_files) + + +@dataclass(init=False) +class CombinedFileScanTask(ScanTask): +"""Task representing combined multiple file scan tasks.""" Review Comment: updated doc, pls take another look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: support `arrow_struct_to_iceberg_struct` [iceberg-rust]
ZENOTME commented on PR #731: URL: https://github.com/apache/iceberg-rust/pull/731#issuecomment-2545020315 > Thanks @ZENOTME 's effort. I saw that both java/python have schema with partner visitor: > > 1. [SchemaWithPartnerVisitor](https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/core/src/main/java/org/apache/iceberg/schema/SchemaWithPartnerVisitor.java#L27) > 2. https://github.com/apache/iceberg-python/blob/bfc0d9a62176803094da0867ee793808f105d352/pyiceberg/io/pyarrow.py#L1726 > > I'm thinking that we should adopt similar design? I tried to adopt the SchemaWithPartner design, but I find the interface is not so general. https://github.com/apache/iceberg-python/blob/bfc0d9a62176803094da0867ee793808f105d352/pyiceberg/io/pyarrow.py#L1726 is not for array convert I think. To make it suitable for arrow array convert, I make some change to it: 1. [Introduce ListIterator and MapIterator](https://github.com/apache/iceberg-rust/pull/731/files#diff-919de395f497e9e93101d26e0618400083c15c4eb7ac3679e3662107f0c4b0eeR1225), the control flow of list and map is not so suitable for arrow array convert. 2. array in arrow rust can be NullArray, so we need some preorder access before diving into the subcolumn. So I introduce `visit_type_before` to solve it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix ParallelIterable deadlock [iceberg]
sopel39 commented on code in PR #11781: URL: https://github.com/apache/iceberg/pull/11781#discussion_r1887631464 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -257,17 +257,17 @@ private static class Task implements Supplier>>, Closeable { @Override public Optional> get() { try { +if (queue.size() >= approximateMaxQueueSize) { Review Comment: In order for connection to be kept around task has to be started and then it needs to yield. If task doesn't yield then the S3 connection is fully utilized and returned afterwards -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] fix: field id in name mapping should be optional [iceberg-python]
barronw commented on code in PR #1426: URL: https://github.com/apache/iceberg-python/pull/1426#discussion_r1887662401 ## pyiceberg/table/name_mapping.py: ## @@ -333,8 +334,8 @@ def struct(self, struct: StructType, struct_partner: Optional[MappedField], fiel return StructType(*field_results) def field(self, field: NestedField, field_partner: Optional[MappedField], field_result: IcebergType) -> IcebergType: -if field_partner is None: -raise ValueError(f"Field missing from NameMapping: {'.'.join(self.current_path)}") +if field_partner is None or field_partner.field_id is None: Review Comment: This should just be a type check since `NestedField` expects a field ID below. The field partner is looked up by field ID before being passed into this method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Avro: Support default values for generic data [iceberg]
rdblue commented on code in PR #11786: URL: https://github.com/apache/iceberg/pull/11786#discussion_r1887662465 ## core/src/main/java/org/apache/iceberg/data/avro/PlannedDataReader.java: ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.data.avro; + +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.function.Supplier; +import org.apache.avro.LogicalType; +import org.apache.avro.LogicalTypes; +import org.apache.avro.Schema; +import org.apache.avro.io.DatumReader; +import org.apache.avro.io.Decoder; +import org.apache.iceberg.avro.AvroSchemaUtil; +import org.apache.iceberg.avro.AvroWithPartnerVisitor; +import org.apache.iceberg.avro.SupportsRowPosition; +import org.apache.iceberg.avro.ValueReader; +import org.apache.iceberg.avro.ValueReaders; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.types.Type; +import org.apache.iceberg.types.Types; +import org.apache.iceberg.util.Pair; + +public class PlannedDataReader implements DatumReader, SupportsRowPosition { + + public static PlannedDataReader create(org.apache.iceberg.Schema expectedSchema) { +return create(expectedSchema, ImmutableMap.of()); + } + + public static PlannedDataReader create( + org.apache.iceberg.Schema expectedSchema, Map idToConstant) { +return new PlannedDataReader<>(expectedSchema, idToConstant); + } + + private final org.apache.iceberg.Schema expectedSchema; + private final Map idToConstant; + private ValueReader reader; + + protected PlannedDataReader( + org.apache.iceberg.Schema expectedSchema, Map idToConstant) { +this.expectedSchema = expectedSchema; +this.idToConstant = idToConstant; + } + + @Override + @SuppressWarnings("unchecked") + public void setSchema(Schema fileSchema) { +this.reader = +(ValueReader) +AvroWithPartnerVisitor.visit( +expectedSchema.asStruct(), +fileSchema, +new ReadBuilder(idToConstant), +AvroWithPartnerVisitor.FieldIDAccessors.get()); + } + + @Override + public T read(T reuse, Decoder decoder) throws IOException { +return reader.read(decoder, reuse); + } + + @Override + public void setRowPositionSupplier(Supplier posSupplier) { +if (reader instanceof SupportsRowPosition) { + ((SupportsRowPosition) reader).setRowPositionSupplier(posSupplier); +} + } + + private static class ReadBuilder extends AvroWithPartnerVisitor> { +private final Map idToConstant; + +private ReadBuilder(Map idToConstant) { + this.idToConstant = idToConstant; +} + +@Override +public ValueReader record(Type partner, Schema record, List> fieldReaders) { + if (partner == null) { +return ValueReaders.skipStruct(fieldReaders); + } + + Types.StructType expected = partner.asStructType(); + List>> readPlan = + ValueReaders.buildReadPlan(expected, record, fieldReaders, idToConstant); + + return GenericReaders.struct(readPlan, expected); +} + +@Override +public ValueReader union(Type partner, Schema union, List> options) { + return ValueReaders.union(options); +} + +@Override +public ValueReader array(Type ignored, Schema array, ValueReader elementReader) { + return ValueReaders.array(elementReader); +} + +@Override +public ValueReader arrayMap( +Type ignored, Schema map, ValueReader keyReader, ValueReader valueReader) { + return ValueReaders.arrayMap(keyReader, valueReader); +} + +@Override +public ValueReader map(Type ignored, Schema map, ValueReader valueReader) { + return ValueReaders.map(ValueReaders.strings(), valueReader); +} + +@Override +public ValueReader primitive(Type partner, Schema primitive) { + LogicalType logicalType = primitive.getLogicalType(); + if (logicalType != null) { +switch (logicalType.getName()) { + case "date": +return GenericReaders.dates(); + + case "time-micros": +return GenericReaders.times(); + + case "timestamp-micro
Re: [PR] Avro: Support default values for generic data [iceberg]
rdblue commented on PR #11786: URL: https://github.com/apache/iceberg/pull/11786#issuecomment-2546993293 Thanks for the review, @Fokko! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Avro: Support default values for generic data [iceberg]
rdblue merged PR #11786: URL: https://github.com/apache/iceberg/pull/11786 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Change to Flink directory for instructions [iceberg]
szehon-ho commented on PR #11031: URL: https://github.com/apache/iceberg/pull/11031#issuecomment-2546995771 Whoops sorry, I must have missed this. I think it makes sense to me. cc @pvary @stevenzwu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] chore(docs): Update Readme - Lakekeeper repository moved [iceberg-rust]
c-thiel opened a new pull request, #810: URL: https://github.com/apache/iceberg-rust/pull/810 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Bump moto from 5.0.22 to 5.0.23 [iceberg-python]
dependabot[bot] opened a new pull request, #1435: URL: https://github.com/apache/iceberg-python/pull/1435 Bumps [moto](https://github.com/getmoto/moto) from 5.0.22 to 5.0.23. Changelog Sourced from https://github.com/getmoto/moto/blob/master/CHANGELOG.md";>moto's changelog. 5.0.23 Docker Digest for 5.0.23: sha256:d41e007bb1f7d41b530959ae9cbed1edf42737ee839faf8da7e925bf19f63105 New Services: * Kafka: * create_cluster() * create_cluster_v2() * describe_cluster() * describe_cluster_v2() * delete_cluster() * list_clusters() * list_clusters_v2() * list_tags_for_resource() * tag_resource() * untag_resource() New Methods: * DirectConnect: * associate_mac_sec_key() * create_lag() * describe_lags() * describe_settings() * disassociate_mac_sec_key() * update_settings() * EFS: * describe_file_system_policy() * put_file_system_policy() * ES: * describe_elasticsearch_domains() * OpenSearch: * describe_domains() Miscellaneous: * Athena: list_query_executions() now supports the WorkGroup-parameter * Athena: start_query_execution() now supports the WorkGroup-parameter * CloudFormation: AWS::IAM::Role now supports updates * CognitoIDP: list_users() now correctly filters before applying the Limit * DirectConnect: describe_trusts() no longer requires a DirectoryId-parameter * DynamoDB: The DeleteProtectionEnabled can now be disabled * DynamoDB: update_item() can now return list of binaries * EC2: SecurityGroups now contain a SecurityGroupArn * EC2: update_route() now correctly handles DestinationPrefixListId * KMS: get_public_key() now supports passing in aliases * Lambda: publish_function() now publishes a function even if the updated code hasn't changed * MemoryDB: tag_resource/list_tags_for_resource() now supports Snapshots and SubnetGroups * RDS: copy_db_snapshot() now supports the CopyTags-parameter ... (truncated) Commits https://github.com/getmoto/moto/commit/426df0b74d98bd7d1c7b42918e11c66c4adc4c96";>426df0b Pre-Release: Up Version Number https://github.com/getmoto/moto/commit/479c2119e030ad961f6b783ff461a20663f31a36";>479c211 Prep release 5.0.23 (https://redirect.github.com/getmoto/moto/issues/8404";>#8404) https://github.com/getmoto/moto/commit/82c0c63bc1a3a7b682274ce0e5d41ae161dcb91c";>82c0c63 EC2: update_route() now correctly handles DestinationPreflixListId (https://redirect.github.com/getmoto/moto/issues/8395";>#8395) https://github.com/getmoto/moto/commit/ea13ff2c4aea37623c5ca9c27a0bd1465afce5bd";>ea13ff2 DynamoDB: update_item() can now return list of binaries (https://redirect.github.com/getmoto/moto/issues/8401";>#8401) https://github.com/getmoto/moto/commit/a7c36135f0d2b1c8c1d3e873c5c3d51f7cce4164";>a7c3613 chore: update SSM default parameters (https://redirect.github.com/getmoto/moto/issues/8402";>#8402) https://github.com/getmoto/moto/commit/f46b9e5925bd1bbd1f541db34918ad7e5bb34bf1";>f46b9e5 chore: update EC2 Instance Types (https://redirect.github.com/getmoto/moto/issues/8399";>#8399) https://github.com/getmoto/moto/commit/d21bb8624d12c7054b51ef17b06fb6894e642b76";>d21bb86 chore: update EC2 Instance Offerings (https://redirect.github.com/getmoto/moto/issues/8398";>#8398) https://github.com/getmoto/moto/commit/b5c665edf41352cd1647d720eaf9e4199436f0e1";>b5c665e EC2: add SecurityGroupArn to security groups (https://redirect.github.com/getmoto/moto/issues/8393";>#8393) https://github.com/getmoto/moto/commit/44df2b02a65255659405f7fa31f1f24259d1b1bb";>44df2b0 Kafka V1 Support (https://redirect.github.com/getmoto/moto/issues/8381";>#8381) https://github.com/getmoto/moto/commit/8896093e650be8b75a3b3103bb3a0d82380fb393";>8896093 CognitoIDP: Fix Limit and Filter params for list_users() (https://redirect.github.com/getmoto/moto/issues/8392";>#8392) Additional commits viewable in https://github.com/getmoto/moto/compare/5.0.22...5.0.23";>compare view [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that
[PR] Bump mkdocs-material from 9.5.48 to 9.5.49 [iceberg-python]
dependabot[bot] opened a new pull request, #1437: URL: https://github.com/apache/iceberg-python/pull/1437 Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.48 to 9.5.49. Release notes Sourced from https://github.com/squidfunk/mkdocs-material/releases";>mkdocs-material's releases. mkdocs-material-9.5.49 Adjusted title color in dark mode for all supported Mermaid.js diagrams Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7803";>#7803: Privacy plugin crashes on generated files Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7781";>#7781: Mermaid.js flow chart title not visible in dark mode Changelog Sourced from https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG";>mkdocs-material's changelog. mkdocs-material-9.5.49 (2024-12-16) Adjusted title color in dark mode for all supported Mermaid.js diagrams Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7803";>#7803: Privacy plugin crashes on generated files Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7781";>#7781: Mermaid.js flow chart title not visible in dark mode mkdocs-material-9.5.48 (2024-12-08) Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7774";>#7774: Disabling social cards doesn't work mkdocs-material-9.5.47 (2024-12-01) Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7750";>#7750: Numeric tags break search Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7748";>#7748: Blog plugin breaks when using future drafts (9.5.45 regression) mkdocs-material-9.5.46 (2024-11-25) Added support for removing preload hints in privacy plugin Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7734";>#7734: Code blocks in h5 headlines are uppercased Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7725";>#7725: Blog plugin crashing on missing timezone (9.5.45 regression) mkdocs-material-9.5.45 (2024-11-20) Reduced size of Docker image through multi-stage build Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7708";>#7708: Blog plugin crashing on YAML dates with timezones mkdocs-material-9.5.44 (2024-11-05) Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7672";>#7672: Font CSS 404's when using privacy plugin (9.5.43 regression) mkdocs-material-9.5.43 (2024-10-31) Added support for external images in SVGs in privacy plugin Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7651";>#7651: Privacy plugin doesn't handle quoted URLs in CSS mkdocs-material-9.5.42 (2024-10-20) Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7625";>#7625: Invalid encoding of boolean attributes in privacy plugin Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7624";>#7624: Crash when disabling privacy plugin (9.5.41 regression) mkdocs-material-9.5.41 (2024-10-15) Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7619";>#7619: Improved tooltip on logo disappears after instant navigation Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7616";>#7616: Race condition in built-in privacy plugin when inlining assets Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7615";>#7615: Comments and "Was this page helpful?" visible when printing mkdocs-material-9.5.40 (2024-10-10) Updated Latvian translations Fixed https://redirect.github.com/squidfunk/mkdocs-material/issues/7597";>#7597: Social cards not using site name on home page ... (truncated) Commits https://github.com/squidfunk/mkdocs-material/commit/f947bbe64a6f0f4754c0a7a1afc3b67e8b735c8b";>f947bbe Prepare 9.5.49 release https://github.com/squidfunk/mkdocs-material/commit/968fbe1ae213cb350850cfddf2614a1c93e73adb";>968fbe1 Fixed privacy plugin interop with generated files https://github.com/squidfunk/mkdocs-material/commit/c97270c8f1d7a068eb20335ad961a356971c5f05";>c97270c Updated dependencies https://github.com/squidfunk/mkdocs-material/commit/7fb3a39bbc17865743b8cb8846aa4e845ac53eb4";>7fb3a39 Documentation (https://redirect.github.com/squidfunk/mkdocs-material/issues/7794";>#7794) https://github.com/squidfunk/mkdocs-material/commit/1ee11152260a8081d2e6f394162194d1e591aadb";>1ee1115 Updated dependencies https://github.com/squidfunk/mkdocs-material/commit/b368cee13f23b52335fd1ab39cacae9e26b08291";>b368cee Fixed diagram title not switching color in dark mode https://github.com/squidfunk/mkdocs-material/commit/0f97a8d5f029b1d38ec84fcdb9733a3682f84d50";>0f97a8d Documentation (https://redirect.github.com/squidfunk/mkdocs-material/issues/7784";>#7784) See full diff in https://github.com/squidfunk/mkd
[PR] Bump adlfs from 2024.7.0 to 2024.12.0 [iceberg-python]
dependabot[bot] opened a new pull request, #1436: URL: https://github.com/apache/iceberg-python/pull/1436 Bumps [adlfs](https://github.com/fsspec/adlfs) from 2024.7.0 to 2024.12.0. Changelog Sourced from https://github.com/fsspec/adlfs/blob/main/CHANGELOG.md";>adlfs's changelog. 2024.12.0 Add "exclusive" more to put, pipe and open, as in upstream fsspec Allow concurrency in fetch_ranges update versions signed URLs from connection string; and add content type CI cleanups better error messages honor anon parameter better deterministic blob IDs on upload AzureBlobFileSystem and AzureBlobFile support pickling. Handle mixed casing for hdi_isfolder metadata when determining whether a blob should be treated as a folder. _put_file: overwrite now defaults to True. 2023.10.0 Added support for timeout/connection_timeout/read_timeout 2023.9.0 Compatability with new glob behavior in fsspec 2023.9.0 Compatability with azure.storage.blob 12.18.1 2022.11.2 Reorder fs.info() to search the parent directory only after searching for the specified item directly Removed pin on upper bound for azure-storage-blob Moved AzureDatalakeFileSystem to a separate module, in acknowledgement of https://learn.microsoft.com/en-us/answers/questions/281107/azure-data-lake-storage-gen1-retirement-announceme.html%7D";>Microsoft End of Life notice Added DeprecationWarning to AzureDatalakeFilesystem 2022.10.1 Pin azure-storage-blob >=12.12.0,=1.23.1,<2.0.0 2022.9.1 Fixed missing dependency on aiohttp in package metadata. 2022.9.0 Add support to AzureBlobFileSystem for versioning Assure full uri's are left stripped to remove "" Set Python requires >=3.8 in setup.cfg _strip_protocol handle lists 2022.7.0 Fix overflow error when uploading files > 2GB ... (truncated) Commits See full diff in https://github.com/fsspec/adlfs/compare/2024.7.0...2024.12.0";>compare view [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix comment on `WRITE_OBJECT_STORE_PARTITIONED_PATHS` table property [iceberg]
ebyhr commented on PR #11798: URL: https://github.com/apache/iceberg/pull/11798#issuecomment-2547049649 I believe this change is correct. The usage is: https://github.com/apache/iceberg/blob/b9b61b1d72ebb192d5e90453ff7030ece73d2603/core/src/main/java/org/apache/iceberg/LocationProviders.java#L130-L134 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Allow adding files to multiple partition specs in FastAppend [iceberg]
anuragmantri commented on code in PR #11771: URL: https://github.com/apache/iceberg/pull/11771#discussion_r1887709101 ## core/src/test/java/org/apache/iceberg/catalog/CatalogTests.java: ## @@ -1590,13 +1590,15 @@ public void testCompleteCreateTransactionMultipleSchemas() { updateSchema.commit(); UpdatePartitionSpec updateSpec = create.updateSpec().addField("new_col"); -PartitionSpec newSpec = updateSpec.apply(); updateSpec.commit(); ReplaceSortOrder replaceSortOrder = create.replaceSortOrder().asc("new_col"); SortOrder newSortOrder = replaceSortOrder.apply(); replaceSortOrder.commit(); +// Get new spec after commit to write new file with new spec +PartitionSpec newSpec = create.table().spec(); Review Comment: This test was not using the latest spec. I would fail with `newAppend()` too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Fix numeric overflow of timestamp nano literal [iceberg]
ebyhr commented on code in PR #11775: URL: https://github.com/apache/iceberg/pull/11775#discussion_r1887709351 ## api/src/test/java/org/apache/iceberg/types/TestConversions.java: ## @@ -111,9 +111,9 @@ public void testByteBufferConversions() { assertConversion( 4L, TimestampNanoType.withZone(), new byte[] {0, -124, -41, 23, 0, 0, 0, 0}); assertThat(Literal.of(40L).to(TimestampNanoType.withoutZone()).toByteBuffer().array()) -.isEqualTo(new byte[] {0, -124, -41, 23, 0, 0, 0, 0}); +.isEqualTo(new byte[] {-128, 26, 6, 0, 0, 0, 0, 0}); assertThat(Literal.of(40L).to(TimestampNanoType.withZone()).toByteBuffer().array()) Review Comment: I think the cause is the original logic called `DateTimeUtil.microsToNanos` method which multiples the value by 1000: https://github.com/apache/iceberg/blob/b9b61b1d72ebb192d5e90453ff7030ece73d2603/api/src/main/java/org/apache/iceberg/expressions/Literals.java#L444-L445 ## api/src/test/java/org/apache/iceberg/expressions/TestTimestampLiteralConversions.java: ## @@ -107,6 +108,28 @@ public void testTimestampMicrosToDateConversion() { assertThat(dateOrdinal).isEqualTo(-1); } + @Test + public void testTimestampNanoWithLongLiteral() { Review Comment: Sure, added the test case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix ParallelIterable deadlock [iceberg]
sopel39 commented on code in PR #11781: URL: https://github.com/apache/iceberg/pull/11781#discussion_r1886884393 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -257,17 +257,17 @@ private static class Task implements Supplier>>, Closeable { @Override public Optional> get() { try { +if (queue.size() >= approximateMaxQueueSize) { Review Comment: > so, each table scan leads to 2*cores tasks/s3 connections? potentially, yes. but there are extra conditions like queue must be full. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Unimplement Map from CharSequenceMap to obey contract [iceberg]
findepi commented on PR #11704: URL: https://github.com/apache/iceberg/pull/11704#issuecomment-2545942666 @nastra @Fokko can you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Implementing namespace_exists function on the REST Catalog [iceberg-python]
AhmedNader42 opened a new pull request, #1434: URL: https://github.com/apache/iceberg-python/pull/1434 I have added the namespace_exists functionality for REST Catalog. Here's an example usage similar to the table_exists function of the same class  Please review the changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] i cant table create dayTransform() and monthTransform() togetger for single field in schema [iceberg]
Fokko commented on issue #11788: URL: https://github.com/apache/iceberg/issues/11788#issuecomment-2545929169 @dvnageshpatil If you have a timestamp value: ``` ts = '2024-10-22T19:25:00' ``` Then the transforms will produce: ``` day(ts) = '2024-10-22' month(ts) = '2024-10' ``` In this case, the `month` information is useless since it is also encoded in the `day` information. Therefore just adding a `day` partition transform will do it 👍 I hope this helps! I'll close this issue for now, feel free to re-open it if there are any further questions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] i cant table create dayTransform() and monthTransform() togetger for single field in schema [iceberg]
Fokko closed issue #11788: i cant table create dayTransform() and monthTransform() togetger for single field in schema URL: https://github.com/apache/iceberg/issues/11788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add plan tasks for TableScan [iceberg-python]
kevinjqliu commented on code in PR #1427: URL: https://github.com/apache/iceberg-python/pull/1427#discussion_r1887034502 ## pyiceberg/table/__init__.py: ## @@ -1423,6 +1451,66 @@ def plan_files(self) -> Iterable[FileScanTask]: for data_entry in data_entries ] +def _target_split_size(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_SIZE, TableProperties.READ_SPLIT_SIZE_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_SIZE, table_value) # type: ignore + +def _loop_back(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, TableProperties.READ_SPLIT_LOOKBACK, TableProperties.READ_SPLIT_LOOKBACK_DEFAULT +) +return property_as_int(self.options, TableProperties.READ_SPLIT_LOOKBACK, table_value) # type: ignore + +def _split_open_file_cost(self) -> int: +table_value = property_as_int( +self.table_metadata.properties, +TableProperties.READ_SPLIT_OPEN_FILE_COST, +TableProperties.READ_SPLIT_OPEN_FILE_COST_DEFAULT, +) +return property_as_int(self.options, TableProperties.READ_SPLIT_OPEN_FILE_COST, table_value) # type: ignore + +def plan_task(self) -> Iterable[CombinedFileScanTask]: Review Comment: This is where daft does scan planning, using `plan_files`. https://github.com/Eventual-Inc/Daft/blob/95a61d26bfb198c590570229a81f7e3bec0049a0/daft/iceberg/iceberg_scan.py#L153-L206 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Implement `namespace_exists` function on the REST Catalog [iceberg-python]
AhmedNader42 commented on issue #1430: URL: https://github.com/apache/iceberg-python/issues/1430#issuecomment-2545954476 Submitted #1434 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] [DISCUSS] Exceptions vs status codes [iceberg-cpp]
wgtmac commented on issue #14: URL: https://github.com/apache/iceberg-cpp/issues/14#issuecomment-2545956079 As an Arrow developer who has experienced both status and exception in the same repo, I feel inclined to use exception to make the code shorter and easier to debug. It is preferred (perhaps biased) to use exception for modern C++: https://learn.microsoft.com/en-us/cpp/cpp/errors-and-exception-handling-modern-cpp?view=msvc-170 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Integrate Test Framework [iceberg-cpp]
pitrou commented on PR #13: URL: https://github.com/apache/iceberg-cpp/pull/13#issuecomment-2545484399 > I found this link: https://yurigeronimus.medium.com/guide-for-choosing-a-test-framework-for-your-c-project-2a7741b53317 Seems like a lot of fluff with no substance, unfortunately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Add RemovePartitionSpecs table update [iceberg-rust]
c-thiel commented on code in PR #804: URL: https://github.com/apache/iceberg-rust/pull/804#discussion_r1886757422 ## crates/iceberg/src/spec/table_metadata_builder.rs: ## @@ -740,6 +740,32 @@ impl TableMetadataBuilder { .set_default_partition_spec(Self::LAST_ADDED) } +/// Remove partition specs by their ids from the table metadata. +/// Does nothing if a spec id is not present. +/// +/// If the default partition spec is removed, it is re-added +/// upon build. +pub fn remove_partition_specs(mut self, spec_ids: &[i32]) -> Self { +let mut removed_specs = Vec::with_capacity(spec_ids.len()); + +self.metadata.partition_specs.retain(|k, _| { +if spec_ids.contains(k) { Review Comment: I switched the order - makes more sense to me this way. We are now looping over a probably very short array doing O(1) lookups. `partition_specs` is a HashMap -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: TableMetadata Statistic Files [iceberg-rust]
Fokko commented on code in PR #799: URL: https://github.com/apache/iceberg-rust/pull/799#discussion_r1886819635 ## crates/iceberg/src/catalog/mod.rs: ## @@ -446,6 +446,30 @@ pub enum TableUpdate { /// Properties to remove removals: Vec, }, +/// Set statistics for a snapshot +#[serde(with = "_serde_set_statistics")] +SetStatistics { +/// File containing the statistics +statistics: StatisticsFile, +}, Review Comment: Thanks for raising this on the dev-list: https://lists.apache.org/thread/6fnrp73wokfqlt5vormpjyjmtvl29ll1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] add Status data structure [iceberg-cpp]
gaborkaszab commented on PR #8: URL: https://github.com/apache/iceberg-cpp/pull/8#issuecomment-2545642065 > There is an old stackoverflow question which I think we can take a look There is a stackoverflow for everything and for the opposite too. I just googled for "Status codes vs exceptions" and it gives me the first 5 results supporting exceptions over status codes, I guess because the algorithm is biased by my personal preference it learned from me. > when you set a breakpoint on the method chain and exception happens, how do you know which call give out the exception? This works well in Java, I don't think it would be any different in C++ world. If you get an exception I guess you can check the trace where it was thrown and then set your debugger accordingly. Also method chaining doesn't mean that you write everything in a single line, you could break as you want. I did a quick run through on this PR and my general opinion is this: This PR assumes that we simply want to pull in auxiliary stuff from the Arrow project and without questioning whether we'd need them here or not we just pull entire files into this project. This is pretty difficult to review TBH, and if someone is not from the Arrow project it's even more difficult because they have to figure out the initial motivation of each functionality of another project. I think this should be the other way around. Let's identify some functionality that we'd like to develop in this project, like interface for catalogs let's say, and if we find something in other projects that could help then pull in only the required part. Or let's have a very simple implementation ourselves, we can still evolve it later on. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: TableMetadata Statistic Files [iceberg-rust]
Fokko commented on code in PR #799: URL: https://github.com/apache/iceberg-rust/pull/799#discussion_r1886823887 ## crates/iceberg/src/spec/table_metadata_builder.rs: ## @@ -524,6 +526,52 @@ impl TableMetadataBuilder { self } +/// Set statistics for a snapshot +pub fn set_statistics(mut self, statistics: StatisticsFile) -> Self { +self.metadata +.statistics +.insert(statistics.snapshot_id, statistics.clone()); Review Comment: Do we want to check if a `snapshot-id` exists? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Exceptions vs status codes [iceberg-cpp]
gaborkaszab opened a new issue, #14: URL: https://github.com/apache/iceberg-cpp/issues/14 I'm pretty sure there are pros and cons for each side and people might get into religious fights on this topic. Let's try to come to a conclusion which one we should use in this project. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] add Status data structure [iceberg-cpp]
wgtmac commented on PR #8: URL: https://github.com/apache/iceberg-cpp/pull/8#issuecomment-2545814446 I agree with @gaborkaszab that it would be better to discuss a concrete API design (e.g. Table, FileIO, etc.) before introducing a full-functional status implementation. If we decide to go with status over exception, we can add a basic status class at first and make it powerful later on. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: Add RemovePartitionSpecs table update [iceberg-rust]
c-thiel commented on code in PR #804: URL: https://github.com/apache/iceberg-rust/pull/804#discussion_r1886939579 ## crates/iceberg/src/spec/table_metadata_builder.rs: ## @@ -740,6 +740,35 @@ impl TableMetadataBuilder { .set_default_partition_spec(Self::LAST_ADDED) } +/// Remove partition specs by their ids from the table metadata. +/// Does nothing if a spec id is not present. +/// Review Comment: Added a comment in my last commit. As I am now following the java behavior to fail if the default spec is removed, I also added a comment for that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump mkdocs-material from 9.5.47 to 9.5.48 [iceberg]
Fokko merged PR #11790: URL: https://github.com/apache/iceberg/pull/11790 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] [Discussion] googletest(gtest) or Catch2 [iceberg-cpp]
pitrou commented on issue #12: URL: https://github.com/apache/iceberg-cpp/issues/12#issuecomment-2545837400 Arrow C++ and Parquet C++ developers are certainly familiar with GTest too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: TableMetadata Statistic Files [iceberg-rust]
c-thiel commented on code in PR #799: URL: https://github.com/apache/iceberg-rust/pull/799#discussion_r1886956770 ## crates/iceberg/src/spec/table_metadata_builder.rs: ## @@ -524,6 +526,52 @@ impl TableMetadataBuilder { self } +/// Set statistics for a snapshot +pub fn set_statistics(mut self, statistics: StatisticsFile) -> Self { +self.metadata +.statistics +.insert(statistics.snapshot_id, statistics.clone()); Review Comment: I thought about this as well, but then followed java. Currently statistics and snapshots are quite separate from each other. If we implement your check (which I like), I think we should eventually also implement: * Upon deserialization discard statistics that belong to nonexistant snapshots * When a snapshot is removed delete the statistics for it as well This would result in snapshots for statistics not missing. It is unclear however what should happen to the puffin files in these cases. We would have coherent metadata, but probably also orphan files. Do we know why the check is not there in Java? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] [DISCUSS] Exceptions vs status codes [iceberg-cpp]
pitrou commented on issue #14: URL: https://github.com/apache/iceberg-cpp/issues/14#issuecomment-2545841290 The main thing I like about the "Status" style is that it makes error propagation and handling (or lack thereof) explicit. However, I have no religious preference :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] [Discussion] googletest(gtest) or Catch2 [iceberg-cpp]
gaborkaszab commented on issue #12: URL: https://github.com/apache/iceberg-cpp/issues/12#issuecomment-2545662593 No strong opinions from me. I myself gravitate towards gtest just because I have experience on that, but could live with any of them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark3.4,3.5: In describe extended view command: fix wrong view catal… [iceberg]
nastra commented on code in PR #11751: URL: https://github.com/apache/iceberg/pull/11751#discussion_r1886968667 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestViews.java: ## @@ -1414,7 +1414,42 @@ public void describeExtendedView() { String.format( "['format-version' = '1', 'location' = '%s', 'provider' = 'iceberg']", location), -"")); +""), +row("View Text", sql, "")); + } + + @TestTemplate + public void describeExtendedViewWithoutCurrentNamespace() { Review Comment: Sorry, I actually confused myself and that's exactly the part I had to double-check. So you're right about how the current namespace is being used. Can you please update the tests to this? ``` + + @TestTemplate + public void createAndDescribeViewInDefaultNamespace() { +String viewName = viewName("createViewInDefaultNamespace"); +String sql = String.format("SELECT id, data FROM %s WHERE id <= 3", tableName); + +sql("CREATE VIEW %s (id, data) AS %s", viewName, sql); +TableIdentifier identifier = TableIdentifier.of(NAMESPACE, viewName); +View view = viewCatalog().loadView(identifier); +assertThat(view.currentVersion().defaultCatalog()).isNull(); +assertThat(view.name()).isEqualTo(ViewUtil.fullViewName(catalogName, identifier)); + assertThat(view.currentVersion().defaultNamespace()).isEqualTo(NAMESPACE); + +String location = viewCatalog().loadView(identifier).location(); +assertThat(sql("DESCRIBE EXTENDED %s.%s", NAMESPACE, viewName)) +.contains( +row("id", "int", ""), +row("data", "string", ""), +row("", "", ""), +row("# Detailed View Information", "", ""), +row("Comment", "", ""), +row("View Catalog and Namespace", String.format("%s.%s", catalogName, NAMESPACE), ""), +row("View Query Output Columns", "[id, data]", ""), +row( +"View Properties", +String.format( +"['format-version' = '1', 'location' = '%s', 'provider' = 'iceberg']", +location), +"")); + } + + @TestTemplate + public void createAndDescribeViewWithoutCurrentNamespace() { +String viewName = viewName("createViewWithoutCurrentNamespace"); +Namespace namespace = Namespace.of("test_namespace"); +String sql = String.format("SELECT id, data FROM %s WHERE id <= 3", tableName); + +sql("CREATE NAMESPACE IF NOT EXISTS %s", namespace); +sql("CREATE VIEW %s.%s (id, data) AS %s", namespace, viewName, sql); +TableIdentifier identifier = TableIdentifier.of(namespace, viewName); +View view = viewCatalog().loadView(identifier); +assertThat(view.currentVersion().defaultCatalog()).isNull(); +assertThat(view.name()).isEqualTo(ViewUtil.fullViewName(catalogName, identifier)); + assertThat(view.currentVersion().defaultNamespace()).isEqualTo(NAMESPACE); + +String location = viewCatalog().loadView(identifier).location(); +assertThat(sql("DESCRIBE EXTENDED %s.%s", namespace, viewName)) +.contains( +row("id", "int", ""), +row("data", "string", ""), +row("", "", ""), +row("# Detailed View Information", "", ""), +row("Comment", "", ""), +row("View Catalog and Namespace", String.format("%s.%s", catalogName, namespace), ""), +row("View Query Output Columns", "[id, data]", ""), +row( +"View Properties", +String.format( +"['format-version' = '1', 'location' = '%s', 'provider' = 'iceberg']", +location), +"")); + } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Deserialize NestedField initial-default and write-default Attributes [iceberg-python]
kevinjqliu commented on code in PR #1432: URL: https://github.com/apache/iceberg-python/pull/1432#discussion_r1886979577 ## tests/conftest.py: ## @@ -149,6 +149,35 @@ def table_schema_simple() -> Schema: ) +@pytest.fixture(scope="session") Review Comment: nit: wydt about adding an NestedField to [table_schema_simple](https://github.com/apache/iceberg-python/pull/1432/files#diff-e52e4ddd58b7ef887ab03c04116e676f6280b824ab7469d5d3080e5cba4f2128R141-R149) instead of creating a new Schema altogether? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Azure: Support vended credentials refresh in ADLSFileIO. [iceberg]
nastra commented on code in PR #11577: URL: https://github.com/apache/iceberg/pull/11577#discussion_r1886971501 ## azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAdlsCredentialProvider.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.azure.adlsv2; + +import com.azure.core.credential.AzureSasCredential; +import java.io.Serializable; +import java.util.List; +import java.util.Map; +import java.util.concurrent.Future; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.commons.io.IOUtils; +import org.apache.iceberg.azure.AzureProperties; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.rest.ErrorHandlers; +import org.apache.iceberg.rest.HTTPClient; +import org.apache.iceberg.rest.RESTClient; +import org.apache.iceberg.rest.RESTUtil; +import org.apache.iceberg.rest.auth.OAuth2Properties; +import org.apache.iceberg.rest.auth.OAuth2Util; +import org.apache.iceberg.rest.credentials.Credential; +import org.apache.iceberg.rest.responses.LoadCredentialsResponse; +import org.apache.iceberg.util.Pair; +import org.apache.iceberg.util.SerializableMap; +import org.apache.iceberg.util.ThreadPools; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class VendedAdlsCredentialProvider implements Serializable, AutoCloseable { + private static final Logger LOG = LoggerFactory.getLogger(VendedAdlsCredentialProvider.class); + + private static final String THREAD_PREFIX = "adls-vended-credential-refresh"; + public static final String URI = "credentials.uri"; + + private final SerializableMap properties; + private transient volatile Map + azureSasCredentialRefresherMap; + private transient volatile RESTClient client; + private transient volatile ScheduledExecutorService refreshExecutor; + + public VendedAdlsCredentialProvider(Map properties) { +Preconditions.checkArgument(null != properties, "Invalid properties: null"); +Preconditions.checkArgument(null != properties.get(URI), "Invalid URI: null"); +this.properties = SerializableMap.copyOf(properties); +azureSasCredentialRefresherMap = Maps.newHashMap(); + } + + public AzureSasCredential credentialForAccount(String storageAccount) { +Map refresherForAccountMap = +azureSasCredentialRefresherMap(); +if (refresherForAccountMap.containsKey(storageAccount)) { + return refresherForAccountMap.get(storageAccount).azureSasCredential(); +} else { + AzureSasCredentialRefresher azureSasCredentialRefresher = + new AzureSasCredentialRefresher( + () -> this.sasTokenWithExpiration(storageAccount), credentialRefreshExecutor()); + refresherForAccountMap.put(storageAccount, azureSasCredentialRefresher); + return azureSasCredentialRefresher.azureSasCredential(); +} + } + + private Pair sasTokenWithExpiration(String storageAccount) { +LoadCredentialsResponse response = fetchCredentials(); +List adlsCredentials = +response.credentials().stream() +.filter(c -> c.prefix().contains(storageAccount)) +.collect(Collectors.toList()); +Preconditions.checkState( +!adlsCredentials.isEmpty(), +String.format("Invalid ADLS Credentials for storage-account %s: empty", storageAccount)); +Preconditions.checkState( +adlsCredentials.size() == 1, +"Invalid ADLS Credentials: only one ADLS credential should exist per storage-account"); + +Credential adlsCredential = adlsCredentials.get(0); +checkCredential(adlsCredential, AzureProperties.ADLS_SAS_TOKEN_PREFIX + storageAccount); +checkCredential( +adlsCredential, AzureProperties.ADLS_SAS_TOKEN_EXPIRES_AT_MS_PREFIX + storageAccount); + +String updatedSasToken = +adlsCredential.config().get(AzureProperties.ADLS_SAS_TOKEN_PREFIX + storageAccount); +Long tokenExpiresAtMillis = +Long.parseLong( +adlsCredential +.config() +.get(Az
Re: [I] [Discussion] googletest(gtest) or Catch2 [iceberg-cpp]
wgtmac commented on issue #12: URL: https://github.com/apache/iceberg-cpp/issues/12#issuecomment-2545859560 I don't have experience with Catch2 either. But I don't object to adopt it if it brings good stuff. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Azure: Support vended credentials refresh in ADLSFileIO. [iceberg]
ChaladiMohanVamsi commented on code in PR #11577: URL: https://github.com/apache/iceberg/pull/11577#discussion_r1886977070 ## azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAdlsCredentialProvider.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.azure.adlsv2; + +import com.azure.core.credential.AzureSasCredential; +import java.io.Serializable; +import java.util.List; +import java.util.Map; +import java.util.concurrent.Future; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import org.apache.commons.io.IOUtils; +import org.apache.iceberg.azure.AzureProperties; +import org.apache.iceberg.relocated.com.google.common.base.Preconditions; +import org.apache.iceberg.relocated.com.google.common.collect.Maps; +import org.apache.iceberg.rest.ErrorHandlers; +import org.apache.iceberg.rest.HTTPClient; +import org.apache.iceberg.rest.RESTClient; +import org.apache.iceberg.rest.RESTUtil; +import org.apache.iceberg.rest.auth.OAuth2Properties; +import org.apache.iceberg.rest.auth.OAuth2Util; +import org.apache.iceberg.rest.credentials.Credential; +import org.apache.iceberg.rest.responses.LoadCredentialsResponse; +import org.apache.iceberg.util.Pair; +import org.apache.iceberg.util.SerializableMap; +import org.apache.iceberg.util.ThreadPools; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class VendedAdlsCredentialProvider implements Serializable, AutoCloseable { + private static final Logger LOG = LoggerFactory.getLogger(VendedAdlsCredentialProvider.class); + + private static final String THREAD_PREFIX = "adls-vended-credential-refresh"; + public static final String URI = "credentials.uri"; + + private final SerializableMap properties; + private transient volatile Map + azureSasCredentialRefresherMap; + private transient volatile RESTClient client; + private transient volatile ScheduledExecutorService refreshExecutor; + + public VendedAdlsCredentialProvider(Map properties) { +Preconditions.checkArgument(null != properties, "Invalid properties: null"); +Preconditions.checkArgument(null != properties.get(URI), "Invalid URI: null"); +this.properties = SerializableMap.copyOf(properties); +azureSasCredentialRefresherMap = Maps.newHashMap(); + } + + public AzureSasCredential credentialForAccount(String storageAccount) { +Map refresherForAccountMap = +azureSasCredentialRefresherMap(); +if (refresherForAccountMap.containsKey(storageAccount)) { + return refresherForAccountMap.get(storageAccount).azureSasCredential(); +} else { + AzureSasCredentialRefresher azureSasCredentialRefresher = + new AzureSasCredentialRefresher( + () -> this.sasTokenWithExpiration(storageAccount), credentialRefreshExecutor()); + refresherForAccountMap.put(storageAccount, azureSasCredentialRefresher); + return azureSasCredentialRefresher.azureSasCredential(); +} + } + + private Pair sasTokenWithExpiration(String storageAccount) { +LoadCredentialsResponse response = fetchCredentials(); +List adlsCredentials = +response.credentials().stream() +.filter(c -> c.prefix().contains(storageAccount)) +.collect(Collectors.toList()); +Preconditions.checkState( +!adlsCredentials.isEmpty(), +String.format("Invalid ADLS Credentials for storage-account %s: empty", storageAccount)); +Preconditions.checkState( +adlsCredentials.size() == 1, +"Invalid ADLS Credentials: only one ADLS credential should exist per storage-account"); + +Credential adlsCredential = adlsCredentials.get(0); +checkCredential(adlsCredential, AzureProperties.ADLS_SAS_TOKEN_PREFIX + storageAccount); +checkCredential( +adlsCredential, AzureProperties.ADLS_SAS_TOKEN_EXPIRES_AT_MS_PREFIX + storageAccount); + +String updatedSasToken = +adlsCredential.config().get(AzureProperties.ADLS_SAS_TOKEN_PREFIX + storageAccount); +Long tokenExpiresAtMillis = +Long.parseLong( +adlsCredential +.config() +
Re: [PR] Deserialize NestedField initial-default and write-default Attributes [iceberg-python]
kevinjqliu commented on code in PR #1432: URL: https://github.com/apache/iceberg-python/pull/1432#discussion_r1886982570 ## pyiceberg/types.py: ## @@ -328,8 +328,8 @@ def __init__( data["type"] = data["type"] if "type" in data else field_type data["required"] = required data["doc"] = doc Review Comment: curious if we should do the same here too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Deserialize NestedField initial-default and write-default Attributes [iceberg-python]
paulcichonski commented on code in PR #1432: URL: https://github.com/apache/iceberg-python/pull/1432#discussion_r1886986206 ## tests/conftest.py: ## @@ -149,6 +149,35 @@ def table_schema_simple() -> Schema: ) +@pytest.fixture(scope="session") Review Comment: Sure, I can make that change and see what happens. I was hesitant because `table_schema_simple` seems to be used by lots of tests so wasn't sure of the implications. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Deserialize NestedField initial-default and write-default Attributes [iceberg-python]
kevinjqliu commented on code in PR #1432: URL: https://github.com/apache/iceberg-python/pull/1432#discussion_r1886988485 ## tests/conftest.py: ## @@ -149,6 +149,35 @@ def table_schema_simple() -> Schema: ) +@pytest.fixture(scope="session") Review Comment: +1 if it becomes too tedious, the current test is fine -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Deserialize NestedField initial-default and write-default Attributes [iceberg-python]
paulcichonski commented on code in PR #1432: URL: https://github.com/apache/iceberg-python/pull/1432#discussion_r1886987316 ## pyiceberg/types.py: ## @@ -328,8 +328,8 @@ def __init__( data["type"] = data["type"] if "type" in data else field_type data["required"] = required data["doc"] = doc Review Comment: I originally had that, but the tests seem to pass without it so I just kept it as is. Happy to change if you'd like. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] PyIceberg appending data creates snapshots incompatible with Athena/Spark [iceberg-python]
kevinjqliu commented on issue #1424: URL: https://github.com/apache/iceberg-python/issues/1424#issuecomment-2545885155 The above replicates the logic of `_generate_snapshot_id` https://github.com/apache/iceberg-python/blob/b981780d313f7fa6fb911381962fe00017073cfe/pyiceberg/table/metadata.py#L322-L333 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Deserialize NestedField initial-default and write-default Attributes [iceberg-python]
kevinjqliu commented on code in PR #1432: URL: https://github.com/apache/iceberg-python/pull/1432#discussion_r1886989722 ## pyiceberg/types.py: ## @@ -328,8 +328,8 @@ def __init__( data["type"] = data["type"] if "type" in data else field_type data["required"] = required data["doc"] = doc Review Comment: weird, im trying to figure out when this bug occurs and see if its present in other places in the codebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Deserialize NestedField initial-default and write-default Attributes [iceberg-python]
paulcichonski commented on code in PR #1432: URL: https://github.com/apache/iceberg-python/pull/1432#discussion_r1886992009 ## pyiceberg/types.py: ## @@ -328,8 +328,8 @@ def __init__( data["type"] = data["type"] if "type" in data else field_type data["required"] = required data["doc"] = doc Review Comment: Let me know what you find, I tried digging into Pydantic to understand why it happens, but ran out of time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org