nastra commented on issue #6415:
URL: https://github.com/apache/iceberg/issues/6415#issuecomment-1350603108
See
https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/arrow/src/test/java/org/apache/iceberg/arrow/vectorized/ArrowReaderTest.java#L149-L210
for some bac
nastra commented on issue #6415:
URL: https://github.com/apache/iceberg/issues/6415#issuecomment-1350604548
@rdblue thoughts on getting the above issue fixed?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abo
nastra opened a new issue, #6423:
URL: https://github.com/apache/iceberg/issues/6423
### Feature Request / Improvement
Currently we have a way to run JMH benchmarks on forks. The goal here is
that JMH Benchmarks are executed on a weekly (or any other cadence) via a
GitHub action.
ahshahid opened a new issue, #6424:
URL: https://github.com/apache/iceberg/issues/6424
### Apache Iceberg version
main (development)
### Query engine
Spark
### Please describe the bug 🐞
The size estimation formula used for non partition cols as seen in
C
pvary commented on code in PR #3337:
URL: https://github.com/apache/iceberg/pull/3337#discussion_r1048287741
##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##
@@ -499,11 +499,21 @@ private void unlock(Optional lockId) {
}
@VisibleForTes
nazq commented on issue #6415:
URL: https://github.com/apache/iceberg/issues/6415#issuecomment-1351547529
Happy to create a PR @rdblue , just want to make sure we're on the right
track here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
rubenvdg commented on issue #6361:
URL: https://github.com/apache/iceberg/issues/6361#issuecomment-1351585780
Happy to take this one on, if nobody is working on it atm.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
RussellSpitzer commented on issue #6424:
URL: https://github.com/apache/iceberg/issues/6424#issuecomment-1351621767
The current code is slightly different than this,
https://github.com/apache/iceberg/blob/33217abf7f88c6c22a8c43b320f9de48de998b94/api/src/main/java/org/apache/iceberg/C
nastra commented on issue #6415:
URL: https://github.com/apache/iceberg/issues/6415#issuecomment-1351641875
@nazq just FYI, there's already #3024 that addresses this issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and u
nazq commented on issue #6415:
URL: https://github.com/apache/iceberg/issues/6415#issuecomment-1351656081
Excellent. Thanks for the pointer @nastra
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t
nastra commented on issue #6420:
URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1351656857
@JanKaul I think it would be great to get this out to the DEV mailing list
to get more attention and input from people
--
This is an automated message from the Apache Git Service.
T
ahshahid commented on issue #6424:
URL: https://github.com/apache/iceberg/issues/6424#issuecomment-1351671855
@RussellSpitzer Right, I missed the modifiucation of " - splitOffset".
Though the bug, which I think is in formula, still remains.
My reasoning is as follows:
the fu
stevenzwu merged PR #6313:
URL: https://github.com/apache/iceberg/pull/6313
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@iceberg.a
stevenzwu commented on PR #6313:
URL: https://github.com/apache/iceberg/pull/6313#issuecomment-1351737859
thanks @chenjunjiedada for the contribution
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
RussellSpitzer commented on issue #6424:
URL: https://github.com/apache/iceberg/issues/6424#issuecomment-1351768402
> now if total row count of a split/file = (scannedFileFraction *
file().recordCount())
This is I think the confusion, we are attempting to determine how many rows
are
RussellSpitzer commented on code in PR #6378:
URL: https://github.com/apache/iceberg/pull/6378#discussion_r1048728860
##
core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java:
##
@@ -225,25 +225,40 @@ public void close() {
LOG.info("Closing comm
RussellSpitzer commented on code in PR #6350:
URL: https://github.com/apache/iceberg/pull/6350#discussion_r1048731625
##
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##
@@ -308,6 +339,17 @@ public Scan buildChangelogScan() {
return n
RussellSpitzer commented on code in PR #6350:
URL: https://github.com/apache/iceberg/pull/6350#discussion_r1048750874
##
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestChangelogTable.java:
##
@@ -137,6 +138,64 @@ public void testOverwrites() {
RussellSpitzer commented on code in PR #6344:
URL: https://github.com/apache/iceberg/pull/6344#discussion_r104878
##
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java:
##
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
RussellSpitzer commented on code in PR #6344:
URL: https://github.com/apache/iceberg/pull/6344#discussion_r1048780674
##
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java:
##
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
RussellSpitzer commented on code in PR #6344:
URL: https://github.com/apache/iceberg/pull/6344#discussion_r1048789305
##
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java:
##
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
RussellSpitzer commented on code in PR #6344:
URL: https://github.com/apache/iceberg/pull/6344#discussion_r1048790825
##
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java:
##
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
RussellSpitzer commented on code in PR #6344:
URL: https://github.com/apache/iceberg/pull/6344#discussion_r1048792877
##
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java:
##
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
RussellSpitzer commented on code in PR #6344:
URL: https://github.com/apache/iceberg/pull/6344#discussion_r1048795379
##
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java:
##
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
ahshahid commented on issue #6424:
URL: https://github.com/apache/iceberg/issues/6424#issuecomment-1351885991
Right... That I agree. May be along with split offset ( which is the start
of split) , we need the end of split..
But still, pls allow me to describe this simplified case , wh
rdblue commented on code in PR #6405:
URL: https://github.com/apache/iceberg/pull/6405#discussion_r1048806645
##
api/src/main/java/org/apache/iceberg/expressions/CountStar.java:
##
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more co
RussellSpitzer commented on issue #6424:
URL: https://github.com/apache/iceberg/issues/6424#issuecomment-1351901052
> here length() is the amount of bytes scanned ( only partially read)
https://github.com/apache/iceberg/blob/33217abf7f88c6c22a8c43b320f9de48de998b94/api/src/main/java/o
RussellSpitzer commented on code in PR #6344:
URL: https://github.com/apache/iceberg/pull/6344#discussion_r1048813554
##
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/TestChangelogIterator.java:
##
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (AS
flyrain commented on PR #6350:
URL: https://github.com/apache/iceberg/pull/6350#issuecomment-1351909259
Thanks @RussellSpitzer. Hi @szehon-ho @hililiwei , please let me know if you
have any comments. Thanks1
--
This is an automated message from the Apache Git Service.
To respond to the me
dmgcodevil commented on code in PR #3337:
URL: https://github.com/apache/iceberg/pull/3337#discussion_r1048815939
##
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##
@@ -499,11 +499,21 @@ private void unlock(Optional lockId) {
}
@VisibleF
ahshahid commented on issue #6424:
URL: https://github.com/apache/iceberg/issues/6424#issuecomment-1351924900
Right.. I was also thinking that this is where I have a misunderstanding or
bug...
The question is :
where the recordCount represents the scanned fraction row count, or the
to
RussellSpitzer commented on issue #6424:
URL: https://github.com/apache/iceberg/issues/6424#issuecomment-1351938390
Record count does not represent the scanned fraction. I linked you to the
code, it's a representation of a row in a manifestFile which is a the metadata
for the entire file.
ahshahid commented on issue #6424:
URL: https://github.com/apache/iceberg/issues/6424#issuecomment-1351939955
I see. let me see if I can explain what I mean by test...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use th
asheeshgarg commented on issue #6415:
URL: https://github.com/apache/iceberg/issues/6415#issuecomment-1351998029
@nastra thanks for the references seems to be the case. @rdblue this seems
to be really the case with lot of datasets. Do we have any time line when
https://github.com/apache/ice
ahshahid commented on issue #6424:
URL: https://github.com/apache/iceberg/issues/6424#issuecomment-1352025232
@RussellSpitzer : I see what you are saying about record count corresponding
to total file size.
Let me look into what is causing something wrong in my test for join
--
This is
ahshahid commented on issue #6424:
URL: https://github.com/apache/iceberg/issues/6424#issuecomment-1352040671
@RussellSpitzer :
apologies for bugging , I was hoping one more clarification on this aspect:
long splitOffset = (file().splitOffsets() != null) ?
file().splitOffsets().get(0)
RussellSpitzer commented on issue #6424:
URL: https://github.com/apache/iceberg/issues/6424#issuecomment-1352058762
Parquet files have non-data metadata which is not scanned when we read the
split. So if for example our first row-group starts at byte 1000, we don't want
to count 1000 bytes
flyrain commented on PR #6350:
URL: https://github.com/apache/iceberg/pull/6350#issuecomment-1352094105
Thanks @szehon-ho for the review. I believe you are talking about the case 2
in https://github.com/apache/iceberg/pull/6350#discussion_r1044906141. I did
try to return an empty set, but i
ahshahid commented on issue #6424:
URL: https://github.com/apache/iceberg/issues/6424#issuecomment-1352159451
Thank you @RussellSpitzer ..I will close this.. may be issue I m seeing is
conversion of double to long for fractional value. Will update once I debug
more.
Sorry for false alarm
ahshahid closed issue #6424: The size estimation formula for spark task is
incorrect
URL: https://github.com/apache/iceberg/issues/6424
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific c
ddrinka commented on issue #2768:
URL: https://github.com/apache/iceberg/issues/2768#issuecomment-1352288435
> We could do this with Spark or Dask or Ray depending on what's installed
on the system.
Perhaps consider [Modin](https://github.com/modin-project/modin) as well?
--
This i
stevenzwu commented on PR #6426:
URL: https://github.com/apache/iceberg/pull/6426#issuecomment-1352289861
@pvary regarding your other comments, I am not sure how to proceed yet.
> Null values
for null values, would optional primitive types at top level be enough?
> Edge
yegangy0718 commented on code in PR #6382:
URL: https://github.com/apache/iceberg/pull/6382#discussion_r1048051789
##
flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestShuffleOperator.java:
##
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foun
flyrain merged PR #6350:
URL: https://github.com/apache/iceberg/pull/6350
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa
github-actions[bot] commented on issue #5071:
URL: https://github.com/apache/iceberg/issues/5071#issuecomment-1352388065
This issue has been automatically marked as stale because it has been open
for 180 days with no activity. It will be closed in next 14 days if no further
activity occurs.
github-actions[bot] commented on issue #4948:
URL: https://github.com/apache/iceberg/issues/4948#issuecomment-1352388123
This issue has been closed because it has not received any activity in the
last 14 days since being marked as 'stale'
--
This is an automated message from the Apache Gi
github-actions[bot] closed issue #4948: Create a Github Action to automatically
mark issues as stale and later close if inactive
URL: https://github.com/apache/iceberg/issues/4948
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub a
dennishuo opened a new pull request, #6428:
URL: https://github.com/apache/iceberg/pull/6428
This read-only implementation of the Catalog interface, initially built on
top of the [Snowflake JDBC
driver](https://docs.snowflake.com/en/user-guide/jdbc.html) for the connection
layer, enables e
ahshahid commented on issue #6424:
URL: https://github.com/apache/iceberg/issues/6424#issuecomment-1352519479
@RussellSpitzer . actually part of the issue what I was seeing was related
to scannedFraction approximately equal to 1, but record count of 1., which was
resulting in net rows seen
ahshahid commented on issue #6424:
URL: https://github.com/apache/iceberg/issues/6424#issuecomment-1352520299
The other issue which I was looking at was comparing perf of parquet with
iceberg and in that it seems, that iceberg because of better size estimation as
compared to parquet, resul
xwmr-max commented on PR #6412:
URL: https://github.com/apache/iceberg/pull/6412#issuecomment-1352605298
@openinx
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubs
pvary commented on code in PR #6382:
URL: https://github.com/apache/iceberg/pull/6382#discussion_r1049262265
##
flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/ShuffleOperator.java:
##
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (AS
jiamin13579 commented on PR #6419:
URL: https://github.com/apache/iceberg/pull/6419#issuecomment-1352635716
> Not sure its necessary, looks like for now width can be any of the
arguments:
https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/S
Fokko commented on code in PR #6392:
URL: https://github.com/apache/iceberg/pull/6392#discussion_r1049285936
##
python/Makefile:
##
@@ -26,14 +26,21 @@ lint:
poetry run pre-commit run --all-files
test:
- poetry run coverage run --source=pyiceberg/ -m pytest test
nastra commented on code in PR #6428:
URL: https://github.com/apache/iceberg/pull/6428#discussion_r1049312756
##
snowflake/src/test/java/org/apache/iceberg/snowflake/SnowflakeCatalogTest.java:
##
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under on
55 matches
Mail list logo