Re: [PR] Spark: Remove closing of IO in SerializableTable* [iceberg]

2025-02-14 Thread via GitHub
mgmarino commented on PR #12129: URL: https://github.com/apache/iceberg/pull/12129#issuecomment-2660113119 Ok, I actually found some things to help me out, documenting for later (can't look at this just right now). There's a test that explicitly removes data from the cache of the bloc

Re: [PR] Spark: Remove closing of IO in SerializableTable* [iceberg]

2025-02-14 Thread via GitHub
mgmarino commented on PR #12129: URL: https://github.com/apache/iceberg/pull/12129#issuecomment-2660002476 Hi @Fokko, yes, it could be that the underlying issue is the same (i.e. Spark moving the SerializedTable to disk and closing the IO that is still in use). I will see if I can f

Re: [PR] Spark: Remove closing of IO in SerializableTable* [iceberg]

2025-02-14 Thread via GitHub
Fokko commented on PR #12129: URL: https://github.com/apache/iceberg/pull/12129#issuecomment-2659989552 Thanks for raising this @mgmarino. I think this is related to another issue I fixed recently: https://github.com/apache/iceberg/pull/11858 Would it be possible to add a test to illu

Re: [PR] Spark: Remove closing of IO in SerializableTable* [iceberg]

2025-02-14 Thread via GitHub
mgmarino commented on PR #12129: URL: https://github.com/apache/iceberg/pull/12129#issuecomment-2659943896 @nastra should I ask in the dev mailing list to try for feedback? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Spark: Remove closing of IO in SerializableTable* [iceberg]

2025-02-02 Thread via GitHub
mgmarino commented on PR #12129: URL: https://github.com/apache/iceberg/pull/12129#issuecomment-2630199667 @aokolnychyi Any chance you'll be able to give some feedback on this? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Spark: Remove closing of IO in SerializableTable* [iceberg]

2025-01-29 Thread via GitHub
mgmarino commented on PR #12129: URL: https://github.com/apache/iceberg/pull/12129#issuecomment-2621728020 I am happy to get input here as to whether or not this is the correct way to solve this issue and am happy to adapt as necessary. Thanks! This effectively reverts: #8924 -- Th

[PR] Spark: Remove closing of IO in SerializableTable* [iceberg]

2025-01-29 Thread via GitHub
mgmarino opened a new pull request, #12129: URL: https://github.com/apache/iceberg/pull/12129 This is to fix: #12046 To summarize, the issue is that Spark can remove broadcast variables from memory and persist them to disk in case that memory needs to be freed. In the case that