Fokko closed issue #791: Upcasting and Downcasting inconsistencies with PyArrow
Schema
URL: https://github.com/apache/iceberg-python/issues/791
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the sp
syun64 commented on issue #791:
URL: https://github.com/apache/iceberg-python/issues/791#issuecomment-2159307590
Gotcha - thank you for the explanation @Fokko I didn't think of how using a
large_binary could actually improve the performance because the data is grouped
together into large bu
Fokko commented on issue #791:
URL: https://github.com/apache/iceberg-python/issues/791#issuecomment-2159294140
I agree that you cannot write a single field of 2GB+ to a parquet file. In
that case, Parquet is probably not the best way of storing such a big blob.
The difference between how
syun64 commented on issue #791:
URL: https://github.com/apache/iceberg-python/issues/791#issuecomment-2156782618
> For Arrow, the `binary` cannot store more than 2GB in a single buffer, not
a single field. See [Arrow
docs](https://arrow.apache.org/docs/format/Columnar.html#variable-size-bin
Fokko commented on issue #791:
URL: https://github.com/apache/iceberg-python/issues/791#issuecomment-2156736837
For Arrow, the `binary` cannot store more than 2GB in a single buffer, not a
single field. See [Arrow
docs](https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary
Fokko commented on issue #791:
URL: https://github.com/apache/iceberg-python/issues/791#issuecomment-2156729320
This is interesting, why would Polars go with `large_binary` by default? See
https://github.com/apache/iceberg-python/pull/409
--
This is an automated message from the Apache Gi
syun64 commented on issue #791:
URL: https://github.com/apache/iceberg-python/issues/791#issuecomment-2153441786
I'm seeing the same restriction when using PolaRs write_parquet, so it looks
like a Parquet limitation, instead of an Arrow restriction:
```
ComputeError: parquet: File
syun64 opened a new issue, #791:
URL: https://github.com/apache/iceberg-python/issues/791
### Apache Iceberg version
0.6.0 (latest release)
### Please describe the bug 🐞
`schema_to_pyarrow` converts BinaryType to `pa.large_binary()` type. This
creates inconsistencies wit