github-actions[bot] commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-2367034156
This pull request has been marked as stale due to 30 days of inactivity. It
will be closed in 1 week if no further activity occurs. If you think that’s
incorrect or this pull
chenwyi2 commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-2097906401
> > > * It seems that ORC is not experiencing this issue because it creates
value reader based on the iceberg column types.
> > > * Avro reads the fields entirely based on the file typ
openinx commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1880545421
@advancedxy Thanks for the context from spark. There was a mail discussing
this issue, and we agreed that providing an option to promote binary to string
is the correct approach. I think
advancedxy commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1876803507
Spark adds a configuration to control whether to treat binary as string in
parquet reader, see:
https://github.com/apache/spark/blob/c8137960a0ba725d1633795a057c68f2bbef414b/sql/cataly
Apache9 commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1829524111
> It still feels like we are adding a special workaround to Iceberg for
something which shouldn't be happening in the first place. Can you explain the
use case again? Why can't the upstre
RussellSpitzer commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1829023255
It still feels like we are adding a special workaround to Iceberg for
something which shouldn't be happening in the first place. Can you explain the
use case again? Why can't the
Apache9 commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1828987729
> > Anyway, the conversion is based on the fact that the user defines the
column as string and wants to use it as a string. If you think there is an
inappropriate scenario, could you give
RussellSpitzer commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1828351888
>
> Anyway, the conversion is based on the fact that the user defines the
column as string and wants to use it as a string. If you think there is an
inappropriate scenario, cou
fengjiajie commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1826731977
> You can only guarantee this is safe for your data, for any other user this
could be unsafe. That’s the underlying issue with this PR, we are essentially
allowing a cast binary as str
RussellSpitzer commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1826303884
You can only guarantee this is safe for your data, for any other user this
could be unsafe. That’s the underlying issue with this PR, we are essentially
allowing a cast binary as s
fengjiajie commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1825487523
> I'm also a little nervous about this change, how are we guaranteed that
the binary is parsable as UTF8 bytes? Seems like we should just be fixing the
type annotations rather than cha
fengjiajie commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1772038159
> My thoughts were more like, if we have a column defined as "double" we may
allow "float" in the file definition but we wouldn't allow Binary. So how is
this different?
@Russel
RussellSpitzer commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1771952316
My thoughts were more like, if we have a column defined as "double" we may
allow "float" in the file definition but we wouldn't allow Binary. So how is
this different?
--
This i
fengjiajie commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1771924888
> how are we guaranteed that the binary is parsable as UTF8 bytes?
@RussellSpitzer Thank you for participating in the review.
If a column is not encoded in UTF-8, it should not
RussellSpitzer commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1771569346
I'm also a little nervous about this change, how are we guaranteed that the
binary is parsable as UTF8 bytes? Seems like we should just be fixing the type
annotations rather than c
pvary commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1770228648
@RussellSpitzer: Could you please take a look at the Spark change?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use t
fengjiajie commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1769826927
> > * It seems that ORC is not experiencing this issue because it creates
value reader based on the iceberg column types.
> > * Avro reads the fields entirely based on the file type,
pvary commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1768291556
> * It seems that ORC is not experiencing this issue because it creates
value reader based on the iceberg column types.
> * Avro reads the fields entirely based on the file type, which se
nastra commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1767781293
> Additionally, Iceberg has a UUID type, which seems to be supported in
Spark but not in Flink: https://github.com/apache/iceberg/pull/7496
So Spark itself doesn't support UUIDs as a
fengjiajie commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1767764825
> @fengjiajie: Checked the codepath for the Spark readers and I have 2
questions:
>
> * What about ORC and Avro files? Don't we have the same issue there?
> * Would it worth t
pvary commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1767664345
@fengjiajie: Checked the codepath for the Spark readers and I have 2
questions:
- What about ORC and Avro files? Don't we have the same issue there?
- Would it worth to add the same fi
fengjiajie commented on code in PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#discussion_r1361899155
##
flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java:
##
@@ -81,26 +75,87 @@ public void testTwoLevelList() throws IOExceptio
fengjiajie commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1766144239
> This is a small change, so it might not be too hard to keep the different
Flink version changes in sync, but usually we introduce the changes on the
latest Flink, and then create a d
fengjiajie commented on code in PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#discussion_r1361854341
##
flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/data/FlinkParquetReaders.java:
##
@@ -262,7 +262,11 @@ public ParquetValueReader primitive(
switch
pvary commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1765954472
This is a small change, so it might not be too hard to keep the different
Flink version changes in sync, but usually we introduce the changes on the
latest Flink, and then create a differen
pvary commented on code in PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#discussion_r1361742942
##
flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java:
##
@@ -81,26 +75,87 @@ public void testTwoLevelList() throws IOException {
pvary commented on code in PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#discussion_r1361742600
##
flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/data/FlinkParquetReaders.java:
##
@@ -262,7 +262,11 @@ public ParquetValueReader primitive(
switch (pri
fengjiajie commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1765499071
> > Some systems like older versions of Impala do not annotate String type
as UTF-8 columns in Parquet files. When importing these Parquet files into
Iceberg, reading these Binary colu
nastra commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1764681891
> Some systems like older versions of Impala do not annotate String type as
UTF-8 columns in Parquet files. When importing these Parquet files into
Iceberg, reading these Binary columns wi
fengjiajie commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1762465495
@nastra Thank you for taking the time to review my code
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
nastra commented on code in PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#discussion_r1358331014
##
flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java:
##
@@ -90,17 +92,72 @@ public void testTwoLevelList() throws IOException {
nastra commented on code in PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#discussion_r1358327260
##
flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java:
##
@@ -90,17 +92,72 @@ public void testTwoLevelList() throws IOException {
nastra commented on code in PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#discussion_r1358326874
##
flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java:
##
@@ -90,17 +92,72 @@ public void testTwoLevelList() throws IOException {
nastra commented on code in PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#discussion_r1358102450
##
flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java:
##
@@ -98,6 +100,62 @@ public void testTwoLevelList() throws IOException {
nastra commented on PR #8808:
URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1761210176
@fengjiajie thanks for working on this. Could you please add a test that
reproduces the issue?
--
This is an automated message from the Apache Git Service.
To respond to the message, ple
35 matches
Mail list logo