Re: [PR] API: New API For sequential / streaming updates [iceberg]

2025-01-29 Thread via GitHub
github-actions[bot] commented on PR #9323: URL: https://github.com/apache/iceberg/pull/9323#issuecomment-2623222682 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2025-01-29 Thread via GitHub
github-actions[bot] closed pull request #9323: API: New API For sequential / streaming updates URL: https://github.com/apache/iceberg/pull/9323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2025-01-15 Thread via GitHub
github-actions[bot] commented on PR #9323: URL: https://github.com/apache/iceberg/pull/9323#issuecomment-2594192610 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2024-07-25 Thread via GitHub
jasonf20 commented on PR #9323: URL: https://github.com/apache/iceberg/pull/9323#issuecomment-2250437040 I rebased this PR again so it's mergable. Would appreciate reviving this PR. @rdblue Could you please have a look or let me know if there is someone else who could help? -- This is a

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2024-02-14 Thread via GitHub
jasonf20 commented on PR #9323: URL: https://github.com/apache/iceberg/pull/9323#issuecomment-1944914307 @rdblue Based on our discussions could you have another look at this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2024-02-03 Thread via GitHub
rdblue commented on code in PR #9323: URL: https://github.com/apache/iceberg/pull/9323#discussion_r1477125444 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -221,34 +223,52 @@ protected boolean addsDeleteFiles() { /** Add a data file to the new s

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2024-02-03 Thread via GitHub
rdblue commented on code in PR #9323: URL: https://github.com/apache/iceberg/pull/9323#discussion_r1477125444 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -221,34 +223,52 @@ protected boolean addsDeleteFiles() { /** Add a data file to the new s

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2024-01-23 Thread via GitHub
rdblue commented on PR #9323: URL: https://github.com/apache/iceberg/pull/9323#issuecomment-1906444781 @jasonf20, explicitly setting the sequence number isn't safe. Sequence numbers are assigned when the client attempts to commit and must be updated if the client has to retry. You could mak

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2024-01-04 Thread via GitHub
jasonf20 commented on PR #9323: URL: https://github.com/apache/iceberg/pull/9323#issuecomment-1876954557 @rdblue Sure. I added support for setting the sequence number explicitly per file in `MergingSnapshotProducer`. This was almost supported already (it didn't support per file level for ad

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2024-01-03 Thread via GitHub
rdblue commented on PR #9323: URL: https://github.com/apache/iceberg/pull/9323#issuecomment-1875787341 @jasonf20, to make that work, I think you'd need to keep track of a base sequence number and update the metadata for each new manifest with the correct sequence number when the manifest li

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2023-12-19 Thread via GitHub
jasonf20 commented on PR #9323: URL: https://github.com/apache/iceberg/pull/9323#issuecomment-1863068142 @rdblue Correct, we need multiple sequence (new) sequence numbers since each batch has deletes that need to apply to prior batches, but not newer batches. Committing more than once wo

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2023-12-18 Thread via GitHub
rdblue commented on PR #9323: URL: https://github.com/apache/iceberg/pull/9323#issuecomment-1861669693 @jasonf20, I don't quite understand the use case. It looks like the purpose is to commit multiple batches of data at the same time. Why would not not just use a single operation? Do you ne

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2023-12-17 Thread via GitHub
jasonf20 commented on PR #9323: URL: https://github.com/apache/iceberg/pull/9323#issuecomment-1859197491 **Benchmark** The following test was run locally just to demonstrate that the difference in IO performance is very significant. While the transaction approach IO grows linearly with t

[PR] API: New API For sequential / streaming updates [iceberg]

2023-12-17 Thread via GitHub
jasonf20 opened a new pull request, #9323: URL: https://github.com/apache/iceberg/pull/9323 **Explanation** Certain data production patterns can result in a bunch of micro-batch updates that need to be applied to the table sequentially. If these batches include updates they need to be c