Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-11 Thread via GitHub
Fokko merged PR #902: URL: https://github.com/apache/iceberg-python/pull/902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-11 Thread via GitHub
Fokko commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1673729169 ## tests/integration/test_deletes.py: ## @@ -291,7 +291,7 @@ def test_partitioned_table_positional_deletes_sequence_number(spark: SparkSessio assert snapshots[

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-10 Thread via GitHub
HonahX commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1673491938 ## tests/integration/test_deletes.py: ## @@ -291,7 +291,7 @@ def test_partitioned_table_positional_deletes_sequence_number(spark: SparkSessio assert snapshots

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-10 Thread via GitHub
HonahX commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1673398145 ## pyiceberg/table/__init__.py: ## @@ -1884,8 +1884,9 @@ def to_arrow_batch_reader(self) -> pa.RecordBatchReader: from pyiceberg.io.pyarrow import projec

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-10 Thread via GitHub
syun64 commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1672794472 ## pyiceberg/io/pyarrow.py: ## @@ -1268,14 +1265,8 @@ def __init__(self, file_schema: Schema): def _cast_if_needed(self, field: NestedField, values: pa.Array

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-10 Thread via GitHub
syun64 commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1672619363 ## pyiceberg/io/pyarrow.py: ## @@ -1271,54 +1274,62 @@ def project_batches( def to_requested_schema( -requested_schema: Schema, file_schema: Schema, batch:

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-10 Thread via GitHub
syun64 commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1672618418 ## pyiceberg/io/pyarrow.py: ## @@ -1271,54 +1274,62 @@ def project_batches( def to_requested_schema( -requested_schema: Schema, file_schema: Schema, batch:

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-10 Thread via GitHub
syun64 commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1672580145 ## pyiceberg/io/pyarrow.py: ## @@ -1268,14 +1265,8 @@ def __init__(self, file_schema: Schema): def _cast_if_needed(self, field: NestedField, values: pa.Array

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-10 Thread via GitHub
Fokko commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1672440233 ## pyiceberg/io/pyarrow.py: ## @@ -1268,14 +1265,8 @@ def __init__(self, file_schema: Schema): def _cast_if_needed(self, field: NestedField, values: pa.Array)

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-10 Thread via GitHub
syun64 commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1672429398 ## pyiceberg/io/pyarrow.py: ## @@ -1268,14 +1265,8 @@ def __init__(self, file_schema: Schema): def _cast_if_needed(self, field: NestedField, values: pa.Array

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-10 Thread via GitHub
Fokko commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1672411410 ## pyiceberg/io/pyarrow.py: ## @@ -1268,14 +1265,8 @@ def __init__(self, file_schema: Schema): def _cast_if_needed(self, field: NestedField, values: pa.Array)

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-10 Thread via GitHub
Fokko commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1672411410 ## pyiceberg/io/pyarrow.py: ## @@ -1268,14 +1265,8 @@ def __init__(self, file_schema: Schema): def _cast_if_needed(self, field: NestedField, values: pa.Array)

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-10 Thread via GitHub
syun64 commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1672267800 ## pyiceberg/table/__init__.py: ## @@ -1884,8 +1884,9 @@ def to_arrow_batch_reader(self) -> pa.RecordBatchReader: from pyiceberg.io.pyarrow import projec

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-10 Thread via GitHub
Fokko commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1671798339 ## pyiceberg/table/__init__.py: ## @@ -1884,8 +1884,9 @@ def to_arrow_batch_reader(self) -> pa.RecordBatchReader: from pyiceberg.io.pyarrow import project

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-10 Thread via GitHub
Fokko commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1671735728 ## pyiceberg/table/__init__.py: ## @@ -1895,7 +1896,7 @@ def to_arrow_batch_reader(self) -> pa.RecordBatchReader: case_sensitive=self.case_sensitive

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-08 Thread via GitHub
syun64 commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1669524329 ## pyiceberg/io/pyarrow.py: ## @@ -1268,14 +1265,8 @@ def __init__(self, file_schema: Schema): def _cast_if_needed(self, field: NestedField, values: pa.Array

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-08 Thread via GitHub
syun64 commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1669486866 ## pyiceberg/table/__init__.py: ## @@ -1884,8 +1884,9 @@ def to_arrow_batch_reader(self) -> pa.RecordBatchReader: from pyiceberg.io.pyarrow import projec

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-08 Thread via GitHub
Fokko commented on PR #902: URL: https://github.com/apache/iceberg-python/pull/902#issuecomment-2214713703 I'm aware of the failing CI. Looking into this. It looks like we can automatically cast in the `RecordBatchReader`. -- This is an automated message from the Apache Git Service. To re

Re: [PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-08 Thread via GitHub
kevinjqliu commented on code in PR #902: URL: https://github.com/apache/iceberg-python/pull/902#discussion_r1668984296 ## pyiceberg/io/pyarrow.py: ## @@ -1170,7 +1167,7 @@ def project_table( if len(tables) < 1: return pa.Table.from_batches([], schema=schema_to_pya

[PR] PyArrow: Don't enforce the schema [iceberg-python]

2024-07-08 Thread via GitHub
Fokko opened a new pull request, #902: URL: https://github.com/apache/iceberg-python/pull/902 PyIceberg struggled with different types of arrow, such as the `string` and `large_string`. They represent the same but are different under the hood. My take is that we should hide this kind