Done https://youtu.be/IVPHvZcJ07Q
Amogh, I also added your gmail as an owner for the Youtube channel. On Mon, Mar 30, 2026 at 8:32 AM Steven Wu <[email protected]> wrote: > Amogh, can you upload the video to the YouTube channel? > https://www.youtube.com/playlist?list=PLkifVhhWtccxt1TE7w_HbNGhY5gpDTaX7 > > On Mon, Mar 30, 2026 at 8:28 AM Amogh Jahagirdar <[email protected]> wrote: > >> Hey a few folks reached out indicating that I didn't properly share the >> last v4 metadata tree meeting recording. So sorry about that! Here's the >> link >> <https://drive.google.com/file/d/1LhDL0Iy8YR4RN_W3D8APOUtkSBYk61fD/view?usp=drive_link> >> , >> do let me know if there are still issues. >> >> On Tue, Mar 3, 2026 at 9:17 AM Steven Wu <[email protected]> wrote: >> >>> My takeaway from the conversation is also that we don't need row-level >>> column updates. Manifest DV can be used for row-level updates instead. >>> Basically, a file (manifest or data) can be updated via (1) delete vector + >>> updated rows in a new file (2) column file overlay. Depends on the >>> percentage of modified rows, engines can choose which way to go. >>> >>> On Tue, Mar 3, 2026 at 6:24 AM Gábor Kaszab <[email protected]> >>> wrote: >>> >>>> Thanks for the summary, Micah! I tried to watch the recording linked to >>>> the calendar event, but apparently I don't have permission to do so. Not >>>> sure about others. >>>> >>>> So if 'm not mistaken, one way to reduce the write cost of an UPDATE >>>> for colocated DVs is to use the column updates. As I see there was some >>>> agreement that row-level partial column updates aren't desired, and we aim >>>> for at least file-level column updates. This is very useful information for >>>> the other conversation >>>> <https://lists.apache.org/thread/w90rqyhmh6pb0yxp0bqzgzk1y1rotyny> >>>> going on for the column update proposal. We can bring this up on the column >>>> update sync tomorrow, but I'm wondering if the consensus on avoiding >>>> row-level column updates is something we can incorporate into the column >>>> update proposal too or if it's something still up to debate. >>>> >>>> Best Regards, >>>> Gabor >>>> >>>> Micah Kornfield <[email protected]> ezt írta (időpont: 2026. febr. >>>> 25., Sze, 22:30): >>>> >>>>> Just wanted to summarize my main takeaways of Monday's sync. >>>>> >>>>> The approach will always collocate DVs with the data files (i.e. every >>>>> data file row in a manifest has an optional DV reference). This implies >>>>> that there is not a separate "Deletion manifest". Rather in V4 all >>>>> manifests are "combined" where data files and DVs are colocated. >>>>> >>>>> Write amplification is avoided in two ways: >>>>> 1. For small updates we will need to carry through metadata >>>>> statistics (and other relevant data file fields) in memory (rescanning >>>>> these is likely two expensive). Once updates are available they will be >>>>> written out a new manifest (either root or leaf) and use metadata DVs to >>>>> remove the old rows. >>>>> 2. For larger updates we will only carry through the DV update parts >>>>> in memory and use column level updates to replace existing DVs (this would >>>>> require rescanning the DV columns for any updated manifest to merge with >>>>> the updated DVs in memory, and then writing out the column update). The >>>>> consensus on the call is that we didn't want to support partial column >>>>> updates (a.k.a. merge-on-read column updates). >>>>> >>>>> The idea is that engines would decide which path to follow based on >>>>> the number of affected files. >>>>> >>>>> To help understand the implications of the new proposal, I put >>>>> together a quick spreadsheet [1] to analyze trade-offs between separate >>>>> deletion manifests and the new approach under scenario 1 and 2. This >>>>> represents the worst case scenario where file updates are uniformly >>>>> distributed across a single update operation. It does not account for >>>>> repeated writes (e.g. on-going compaction). My main take-aways is that >>>>> keeping at most 1 affiliated DV separate might still help (akin to a merge >>>>> on read column update), but maybe not enough relative to other parts of >>>>> the >>>>> system (e.g. the churn on data files) that the complexity. >>>>> >>>>> Hope this is helpful. >>>>> >>>>> Micah >>>>> >>>>> [1] >>>>> https://docs.google.com/spreadsheets/d/1klZQxV7ST2C-p9LTMmai_5rtFiyupj6jSLRPRkdI-u8/edit?gid=0#gid=0 >>>>> >>>>> >>>>> >>>>> On Thu, Feb 19, 2026 at 3:52 PM Amogh Jahagirdar <[email protected]> >>>>> wrote: >>>>> >>>>>> Hey folks, I've set up an additional initial discussion on DVs for >>>>>> Monday. This topic is fairly complex and there is also now a free >>>>>> calendar >>>>>> slot. I think it'd be helpful for us to first make sure we're all on the >>>>>> same page in terms of what the approach proposed by Anton earlier in the >>>>>> thread means and the high level mechanics. I should also have more to >>>>>> share >>>>>> on the doc about how the entry structure and change detection could look >>>>>> like in this approach. Then on Thursday we can get into more details and >>>>>> targeted points of discussion on this topic. >>>>>> >>>>>> Thanks, >>>>>> Amogh Jahagirdar >>>>>> >>>>>> On Tue, Feb 17, 2026 at 9:27 PM Amogh Jahagirdar <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Thanks Steven! I've set up some time next Thursday for the community >>>>>>> to discuss this. We're also looking at how the content entry would look >>>>>>> like in a combined DV with potential column updates for DV changes, and >>>>>>> how >>>>>>> change detection could look like in this approach. I should have more to >>>>>>> share on this by the time of the community discussion next week. >>>>>>> We should also consider potential root churn and memory consumption >>>>>>> stemming from expected root entry inflation due to a combined data file >>>>>>> + >>>>>>> DV entry with possible column updates for certain DV workloads; though >>>>>>> at >>>>>>> least for memory consumption of stats being held after planning, that >>>>>>> arguably is an implementation problem for certain integrations. >>>>>>> >>>>>>> Thanks, >>>>>>> Amogh Jahagirdar >>>>>>> >>>>>>> On Fri, Feb 13, 2026 at 10:58 AM Steven Wu <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I wrote up some analysis with back-of-the-envelope calculations >>>>>>>> about the column update approach for DV colocation. It mainly concerns >>>>>>>> the >>>>>>>> 2nd use case: deleting a large number of rows from a small number of >>>>>>>> files. >>>>>>>> >>>>>>>> >>>>>>>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.gvdulzy486n7 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Feb 4, 2026 at 1:02 AM Péter Váry < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> I fully agree with Anton and Steven that we need benchmarks before >>>>>>>>> choosing any direction. >>>>>>>>> >>>>>>>>> I ran some preliminary column‑stitching benchmarks last summer: >>>>>>>>> >>>>>>>>> - Results are available in the doc: >>>>>>>>> >>>>>>>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww >>>>>>>>> - Code is here: https://github.com/apache/iceberg/pull/13306 >>>>>>>>> >>>>>>>>> I’ve summarized the most relevant results at the end of this >>>>>>>>> email. They show roughly a 10% slowdown on the read path with column >>>>>>>>> stitching in similar scenarios when using local SSDs. I expect that >>>>>>>>> in real >>>>>>>>> deployments the metadata read cost will mostly be driven by blob I/O >>>>>>>>> (assuming no caching). If blob access becomes the dominant factor in >>>>>>>>> read >>>>>>>>> latency, multithreaded fetching should be able to absorb the overhead >>>>>>>>> introduced by column stitching, resulting in latency similar to the >>>>>>>>> single‑file layout (unless IO is already the bottleneck) >>>>>>>>> >>>>>>>>> We should definitely rerun the benchmarks once we have a clearer >>>>>>>>> understanding of the intended usage patterns. >>>>>>>>> Thanks, >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> >>>>>>>>> The relevant(ish) results are for 100 columns, with 2 families >>>>>>>>> with 50-50 columns and local read: >>>>>>>>> >>>>>>>>> The base is: >>>>>>>>> MultiThreadedParquetBenchmark.read 100 0 >>>>>>>>> false ss 20 3.739 ± 0.096 s/op >>>>>>>>> >>>>>>>>> The read for single threaded: >>>>>>>>> MultiThreadedParquetBenchmark.read 100 2 >>>>>>>>> false ss 20 4.036 ± 0.082 s/op >>>>>>>>> >>>>>>>>> The read for multi threaded: >>>>>>>>> MultiThreadedParquetBenchmark.read 100 2 >>>>>>>>> true ss 20 4.063 ± 0.080 s/op >>>>>>>>> >>>>>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2026. febr. >>>>>>>>> 3., K, 23:27): >>>>>>>>> >>>>>>>>>> >>>>>>>>>> I agree with Anton in this >>>>>>>>>> <https://docs.google.com/document/d/1jZy4g6UDi3hdblpkSzDnqgzgATFKFoMaHmt4nNH8M7o/edit?disco=AAAByzDx21w> >>>>>>>>>> comment thread that we probably need to run benchmarks for a few >>>>>>>>>> common >>>>>>>>>> scenarios to guide this decision. We need to write down detailed >>>>>>>>>> plans for >>>>>>>>>> those scenarios and what are we measuring. Also ideally, we want to >>>>>>>>>> measure >>>>>>>>>> using the V4 metadata structure (like Parquet manifest file, column >>>>>>>>>> stats >>>>>>>>>> structs, adaptive tree). There are PoC PRs available for column >>>>>>>>>> stats, >>>>>>>>>> Parquet manifest, and root manifest. It would probably be tricky to >>>>>>>>>> piece >>>>>>>>>> them together to run the benchmark considering the PoC status. We >>>>>>>>>> also need >>>>>>>>>> the column stitching capability on the read path to test the column >>>>>>>>>> file >>>>>>>>>> approach. >>>>>>>>>> >>>>>>>>>> On Tue, Feb 3, 2026 at 1:53 PM Anoop Johnson <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I'm in favor of co-located DV metadata with column file override >>>>>>>>>>> and not doing affiliated/unaffiliated delete manifests. This is >>>>>>>>>>> conceptually similar to strictly affiliated delete manifests with >>>>>>>>>>> positional joins, and will halve the number of I/Os when there is >>>>>>>>>>> no DV >>>>>>>>>>> column override. It is simpler to implement >>>>>>>>>>> and will speed up reads. >>>>>>>>>>> >>>>>>>>>>> Unaffiliated DV manifests are flexible for writers. They reduce >>>>>>>>>>> the chance of physical conflicts when there are concurrent >>>>>>>>>>> large/random >>>>>>>>>>> deletes that change DVs on different files in the same manifest. >>>>>>>>>>> But the >>>>>>>>>>> flexibility comes at a read-time cost. If the number of >>>>>>>>>>> unaffiliated DVs >>>>>>>>>>> exceeds a threshold, it could cause driver OOMs or require >>>>>>>>>>> distributed join >>>>>>>>>>> to pair up DVs with data files. With colocated metadata, manifest >>>>>>>>>>> DVs can >>>>>>>>>>> reduce the chance of conflicts up to a certain write size. >>>>>>>>>>> >>>>>>>>>>> I assume we will still support unaffiliated manifests for >>>>>>>>>>> equality deletes, but perhaps we can restrict it to just equality >>>>>>>>>>> deletes. >>>>>>>>>>> >>>>>>>>>>> -Anoop >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Feb 2, 2026 at 4:27 PM Anton Okolnychyi < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> I added the approach with column files to the doc. >>>>>>>>>>>> >>>>>>>>>>>> To sum up, separate data and delete manifests with affinity >>>>>>>>>>>> would perform somewhat on par with co-located DV metadata (a.k.a. >>>>>>>>>>>> direct >>>>>>>>>>>> assignment) if we add support for column files when we need to >>>>>>>>>>>> replace most >>>>>>>>>>>> or all DVs (use case 1). That said, the support for direct >>>>>>>>>>>> assignment with >>>>>>>>>>>> in-line metadata DVs can help us avoid unaffiliated delete >>>>>>>>>>>> manifests when >>>>>>>>>>>> we need to replace a few DVs (use case 2). >>>>>>>>>>>> >>>>>>>>>>>> So the key question is whether we want to allow >>>>>>>>>>>> unaffiliated delete manifests with DVs... If we don't, then we >>>>>>>>>>>> would likely >>>>>>>>>>>> want to have co-located DV metadata and must support efficient >>>>>>>>>>>> column >>>>>>>>>>>> updates not to regress compared to V2 and V3 for large MERGE jobs >>>>>>>>>>>> that >>>>>>>>>>>> modify a small set of records for most files. >>>>>>>>>>>> >>>>>>>>>>>> пн, 2 лют. 2026 р. о 13:20 Anton Okolnychyi < >>>>>>>>>>>> [email protected]> пише: >>>>>>>>>>>> >>>>>>>>>>>>> Anoop, correct, if we keep data and delete manifests separate, >>>>>>>>>>>>> there is a better way to combine the entries and we should NOT >>>>>>>>>>>>> rely on the >>>>>>>>>>>>> referenced data file path. Reconciling by implicit position will >>>>>>>>>>>>> reduce the >>>>>>>>>>>>> size of the DV entry (no need to store the referenced data file >>>>>>>>>>>>> path) and >>>>>>>>>>>>> will improve the planning performance (no equals/hashCode on the >>>>>>>>>>>>> path). >>>>>>>>>>>>> >>>>>>>>>>>>> Steven, I agree. Most notes in the doc pre-date discussions we >>>>>>>>>>>>> had on column updates. You are right, given that we are >>>>>>>>>>>>> gravitating towards >>>>>>>>>>>>> a native way to handle column updates, it seems logical to use >>>>>>>>>>>>> the same >>>>>>>>>>>>> approach for replacing DVs, since they’re essentially column >>>>>>>>>>>>> updates. Let >>>>>>>>>>>>> me add one more approach to the doc based on what Anurag and >>>>>>>>>>>>> Peter have so >>>>>>>>>>>>> far. >>>>>>>>>>>>> >>>>>>>>>>>>> нд, 1 лют. 2026 р. о 20:59 Steven Wu <[email protected]> >>>>>>>>>>>>> пише: >>>>>>>>>>>>> >>>>>>>>>>>>>> Anton, thanks for raising this. I agree this deserves another >>>>>>>>>>>>>> look. I added a comment in your doc that we can potentially >>>>>>>>>>>>>> apply the >>>>>>>>>>>>>> column update proposal for data file update to the manifest file >>>>>>>>>>>>>> updates as >>>>>>>>>>>>>> well, to colocate the data DV and data manifest files. Data DVs >>>>>>>>>>>>>> can be a >>>>>>>>>>>>>> separate column in the data manifest file and updated separately >>>>>>>>>>>>>> in a >>>>>>>>>>>>>> column file. This is the same as the coalesced positional join >>>>>>>>>>>>>> that Anoop >>>>>>>>>>>>>> mentioned. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sun, Feb 1, 2026 at 4:14 PM Anoop Johnson < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you for raising this, Anton. I had a similar >>>>>>>>>>>>>>> observation while prototyping >>>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/14533> the >>>>>>>>>>>>>>> adaptive metadata tree. The overhead of doing a path-based hash >>>>>>>>>>>>>>> join of a >>>>>>>>>>>>>>> data manifest with the affiliated delete manifest is high: my >>>>>>>>>>>>>>> estimate was >>>>>>>>>>>>>>> that the join adds about 5-10% overhead. The hash table >>>>>>>>>>>>>>> build/probe alone >>>>>>>>>>>>>>> takes about 5 ms for manifests with 25K entries. There are >>>>>>>>>>>>>>> engines that can >>>>>>>>>>>>>>> do vectorized hash joins that can lower this, but the overhead >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> complexity of a SIMD-friendly hash join is non-trivial. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> An alternative to relying on the external file feature in >>>>>>>>>>>>>>> Parquet, is to make affiliated manifests order-preserving: ie >>>>>>>>>>>>>>> DVs in an >>>>>>>>>>>>>>> affiliated delete manifest must appear in the same position as >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> corresponding data file in the data manifest the delete >>>>>>>>>>>>>>> manifest is >>>>>>>>>>>>>>> affiliated to. If a data file does not have a DV, the DV >>>>>>>>>>>>>>> manifest must >>>>>>>>>>>>>>> store a NULL. This would allow us to do positional joins, which >>>>>>>>>>>>>>> are much >>>>>>>>>>>>>>> faster. If we wanted, we could even have multiple affiliated DV >>>>>>>>>>>>>>> manifests >>>>>>>>>>>>>>> for a data manifest and the reader would do a COALESCED >>>>>>>>>>>>>>> positional join >>>>>>>>>>>>>>> (i.e. pick the first non-null value as the DV). It puts the >>>>>>>>>>>>>>> sorting >>>>>>>>>>>>>>> responsibility to the writers, but it might be a reasonable >>>>>>>>>>>>>>> tradeoff. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also, the options don't necessarily have to be mutually >>>>>>>>>>>>>>> exclusive. We could still allow affiliated DVs to be "folded" >>>>>>>>>>>>>>> into data >>>>>>>>>>>>>>> manifest (e.g. by background optimization jobs or the writer >>>>>>>>>>>>>>> itself). That >>>>>>>>>>>>>>> might be the optimal choice for read-heavy tables because it >>>>>>>>>>>>>>> will halve the >>>>>>>>>>>>>>> number of I/Os readers have to make. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Anoop >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Jan 30, 2026 at 6:03 PM Anton Okolnychyi < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I had a chance to catch up on some of the V4 discussions. >>>>>>>>>>>>>>>> Given that we are getting rid of the manifest list and >>>>>>>>>>>>>>>> switching to >>>>>>>>>>>>>>>> Parquet, I wanted to re-evaluate the possibility of direct DV >>>>>>>>>>>>>>>> assignment >>>>>>>>>>>>>>>> that we discarded in V3 to avoid regressions. I have put >>>>>>>>>>>>>>>> together my >>>>>>>>>>>>>>>> thoughts in a doc [1]. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> TL;DR: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - I think the current V4 proposal that keeps data and >>>>>>>>>>>>>>>> delete manifests separate but introduces affinity is a solid >>>>>>>>>>>>>>>> choice for >>>>>>>>>>>>>>>> cases when we need to replace DVs in many / most files. I >>>>>>>>>>>>>>>> outlined an >>>>>>>>>>>>>>>> approach with column-split Parquet files but it doesn't >>>>>>>>>>>>>>>> improve the >>>>>>>>>>>>>>>> performance and takes dependency on a portion of the Parquet >>>>>>>>>>>>>>>> spec that is >>>>>>>>>>>>>>>> not really implemented. >>>>>>>>>>>>>>>> - Pushing unaffiliated DVs directly into the root to >>>>>>>>>>>>>>>> replace a small set of DVs is going to be fast on write but >>>>>>>>>>>>>>>> does require >>>>>>>>>>>>>>>> resolving where those DVs apply at read time. Using inline >>>>>>>>>>>>>>>> metadata DVs >>>>>>>>>>>>>>>> with column-split Parquet files is a little more promising in >>>>>>>>>>>>>>>> this case as >>>>>>>>>>>>>>>> it allows to avoid unaffiliated DVs. That said, it again >>>>>>>>>>>>>>>> relies on >>>>>>>>>>>>>>>> something Parquet doesn't implement right now, requires >>>>>>>>>>>>>>>> changing >>>>>>>>>>>>>>>> maintenance operations, and yields minimal benefits. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> All in all, the V4 proposal seems like a strict improvement >>>>>>>>>>>>>>>> over V3 but I insist that we reconsider usage of the >>>>>>>>>>>>>>>> referenced data file >>>>>>>>>>>>>>>> path when resolving DVs to data files. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [1] - >>>>>>>>>>>>>>>> https://docs.google.com/document/d/1jZy4g6UDi3hdblpkSzDnqgzgATFKFoMaHmt4nNH8M7o >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Anton >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> сб, 22 лист. 2025 р. о 13:37 Amogh Jahagirdar < >>>>>>>>>>>>>>>> [email protected]> пише: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hey all, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Here is the meeting recording >>>>>>>>>>>>>>>>> <https://drive.google.com/file/d/1lG9sM-JTwqcIgk7JsAryXXCc1vMnstJs/view?usp=sharing> >>>>>>>>>>>>>>>>> and generated meeting summary >>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1e50p8TXL2e3CnUwKMOvm8F4s2PeVMiKWHPxhxOW1fIM/edit?usp=sharing>. >>>>>>>>>>>>>>>>> Thanks all for attending yesterday! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Nov 20, 2025 at 8:49 AM Amogh Jahagirdar < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hey folks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I was out for some time, but set up a sync for tomorrow >>>>>>>>>>>>>>>>>> at 9am PST. For this discussion, I do think it would be >>>>>>>>>>>>>>>>>> great to focus on >>>>>>>>>>>>>>>>>> the manifest DV representation, factoring in analyses on >>>>>>>>>>>>>>>>>> bitmap >>>>>>>>>>>>>>>>>> representation storage footprints, and the entry structure >>>>>>>>>>>>>>>>>> considering how >>>>>>>>>>>>>>>>>> we want to approach change detection. If there are other >>>>>>>>>>>>>>>>>> topics that people >>>>>>>>>>>>>>>>>> want to highlight, please do bring those up as well! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I also recognize that this is a bit short term >>>>>>>>>>>>>>>>>> scheduling, so please do reach out to me if this time is >>>>>>>>>>>>>>>>>> difficult to work >>>>>>>>>>>>>>>>>> with; next week is the Thanksgiving holidays here, and since >>>>>>>>>>>>>>>>>> people would >>>>>>>>>>>>>>>>>> be travelling/out I figured I'd try to schedule before then. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Oct 17, 2025 at 9:03 AM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hey folks, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Sorry for the delay, here's the recording link >>>>>>>>>>>>>>>>>>> <https://drive.google.com/file/d/1YOmPROXjAKYAWAcYxqAFHdADbqELVVf2/view> >>>>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>> last week's discussion. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, Oct 10, 2025 at 9:44 AM Péter Váry < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Same here. >>>>>>>>>>>>>>>>>>>> Please record if you can. >>>>>>>>>>>>>>>>>>>> Thanks, Peter >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Fri, Oct 10, 2025, 17:39 Fokko Driesprong < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hey Amogh, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks for the write-up. Unfortunately, I won’t be >>>>>>>>>>>>>>>>>>>>> able to attend. Will it be recorded? Thanks! >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>> Fokko >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Op di 7 okt 2025 om 20:36 schreef Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hey all, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I've setup time this Friday at 9am PST for another >>>>>>>>>>>>>>>>>>>>>> sync on single file commits. In terms of what would be >>>>>>>>>>>>>>>>>>>>>> great to focus on >>>>>>>>>>>>>>>>>>>>>> for the discussion: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 1. Whether it makes sense or not to eliminate the >>>>>>>>>>>>>>>>>>>>>> tuple, and instead representing the tuple via >>>>>>>>>>>>>>>>>>>>>> lower/upper boundaries. As a >>>>>>>>>>>>>>>>>>>>>> reminder, one of the goals is to avoid tying a partition >>>>>>>>>>>>>>>>>>>>>> spec to a >>>>>>>>>>>>>>>>>>>>>> manifest; in the root we can have a mix of files >>>>>>>>>>>>>>>>>>>>>> spanning different >>>>>>>>>>>>>>>>>>>>>> partition specs, and even in leaf manifests avoiding >>>>>>>>>>>>>>>>>>>>>> this coupling can >>>>>>>>>>>>>>>>>>>>>> enable more desirable clustering of metadata. >>>>>>>>>>>>>>>>>>>>>> In the vast majority of cases, we could leverage the >>>>>>>>>>>>>>>>>>>>>> property that a file is effectively partitioned if the >>>>>>>>>>>>>>>>>>>>>> lower/upper for a >>>>>>>>>>>>>>>>>>>>>> given field is equal. The nuance here is with the >>>>>>>>>>>>>>>>>>>>>> particular case of >>>>>>>>>>>>>>>>>>>>>> identity partitioned string/binary columns which can be >>>>>>>>>>>>>>>>>>>>>> truncated in stats. >>>>>>>>>>>>>>>>>>>>>> One approach is to require that writers must not produce >>>>>>>>>>>>>>>>>>>>>> truncated stats >>>>>>>>>>>>>>>>>>>>>> for identity partitioned columns. It's also important to >>>>>>>>>>>>>>>>>>>>>> keep in mind that >>>>>>>>>>>>>>>>>>>>>> all of this is just for the purpose of reconstructing >>>>>>>>>>>>>>>>>>>>>> the partition tuple, >>>>>>>>>>>>>>>>>>>>>> which is only required during equality delete matching. >>>>>>>>>>>>>>>>>>>>>> Another area we >>>>>>>>>>>>>>>>>>>>>> need to cover as part of this is on exact bounds on >>>>>>>>>>>>>>>>>>>>>> stats. There are other >>>>>>>>>>>>>>>>>>>>>> options here as well such as making all new equality >>>>>>>>>>>>>>>>>>>>>> deletes in V4 be >>>>>>>>>>>>>>>>>>>>>> global and instead match based on bounds, or keeping the >>>>>>>>>>>>>>>>>>>>>> tuple but each >>>>>>>>>>>>>>>>>>>>>> tuple is effectively based off a union schema of all >>>>>>>>>>>>>>>>>>>>>> partition specs. I am >>>>>>>>>>>>>>>>>>>>>> adding a separate appendix section outlining the span of >>>>>>>>>>>>>>>>>>>>>> options here and >>>>>>>>>>>>>>>>>>>>>> the different tradeoffs. >>>>>>>>>>>>>>>>>>>>>> Once we get this more to a conclusive state, I'll >>>>>>>>>>>>>>>>>>>>>> move a summarized version to the main doc. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 2. @[email protected] <[email protected]> has >>>>>>>>>>>>>>>>>>>>>> updated the doc with a section >>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.rrpksmp8zkb#heading=h.qau0y5xkh9mn> >>>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>>> how we can do change detection from the root in a >>>>>>>>>>>>>>>>>>>>>> variety of write >>>>>>>>>>>>>>>>>>>>>> scenarios. I've done a review on it, and it covers the >>>>>>>>>>>>>>>>>>>>>> cases I would >>>>>>>>>>>>>>>>>>>>>> expect. It'd be good for folks to take a look and please >>>>>>>>>>>>>>>>>>>>>> give feedback >>>>>>>>>>>>>>>>>>>>>> before we discuss. Thank you Steven for adding that >>>>>>>>>>>>>>>>>>>>>> section and all the >>>>>>>>>>>>>>>>>>>>>> diagrams. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 18, 2025 at 3:19 PM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hey folks just following up from the discussion last >>>>>>>>>>>>>>>>>>>>>>> Friday with a summary and some next steps: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 1.) For the various change detection cases, we >>>>>>>>>>>>>>>>>>>>>>> concluded it's best just to go through those in an >>>>>>>>>>>>>>>>>>>>>>> offline manner on the >>>>>>>>>>>>>>>>>>>>>>> doc since it's hard to verify all that correctness in a >>>>>>>>>>>>>>>>>>>>>>> large meeting >>>>>>>>>>>>>>>>>>>>>>> setting. >>>>>>>>>>>>>>>>>>>>>>> 2.) We mostly discussed eliminating the >>>>>>>>>>>>>>>>>>>>>>> partition tuple. On the original proposal, I was mostly >>>>>>>>>>>>>>>>>>>>>>> aiming for the >>>>>>>>>>>>>>>>>>>>>>> ability to re-constructing the tuple from the stats for >>>>>>>>>>>>>>>>>>>>>>> the purpose of >>>>>>>>>>>>>>>>>>>>>>> equality delete matching (a file is partitioned if the >>>>>>>>>>>>>>>>>>>>>>> lower and upper >>>>>>>>>>>>>>>>>>>>>>> bounds are equal); There's some nuance in how we need >>>>>>>>>>>>>>>>>>>>>>> to handle identity >>>>>>>>>>>>>>>>>>>>>>> partition values since for string/binary they cannot be >>>>>>>>>>>>>>>>>>>>>>> truncated. >>>>>>>>>>>>>>>>>>>>>>> Another potential option is to treat all equality >>>>>>>>>>>>>>>>>>>>>>> deletes as effectively >>>>>>>>>>>>>>>>>>>>>>> global and narrow their application based on the stats >>>>>>>>>>>>>>>>>>>>>>> values. This may >>>>>>>>>>>>>>>>>>>>>>> require defining tight bounds. I'm still collecting my >>>>>>>>>>>>>>>>>>>>>>> thoughts on this one. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks folks! Please also let me know if any of the >>>>>>>>>>>>>>>>>>>>>>> following links are inaccessible for any reason. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Meeting recording link: >>>>>>>>>>>>>>>>>>>>>>> https://drive.google.com/file/d/1gv8TrR5xzqqNxek7_sTZkpbwQx1M3dhK/view >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Meeting summary: >>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/131N0CDpzZczURxitN0HGS7dTqRxQT_YS9jMECkGGvQU >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 8, 2025 at 3:40 PM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Update: I moved the discussion time to this Friday >>>>>>>>>>>>>>>>>>>>>>>> at 9 am PST since I found out that quite a few folks >>>>>>>>>>>>>>>>>>>>>>>> involved in the >>>>>>>>>>>>>>>>>>>>>>>> proposals will be out next week, and I also know some >>>>>>>>>>>>>>>>>>>>>>>> folks will also be >>>>>>>>>>>>>>>>>>>>>>>> out the week after that. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> Amogh J >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 8, 2025 at 8:57 AM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hey folks sorry for the late follow up here, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks @Kevin Liu <[email protected]> for >>>>>>>>>>>>>>>>>>>>>>>>> sharing the recording link of the previous >>>>>>>>>>>>>>>>>>>>>>>>> discussion! I've set up another >>>>>>>>>>>>>>>>>>>>>>>>> sync for next Tuesday 09/16 at 9am PST. This time >>>>>>>>>>>>>>>>>>>>>>>>> I've set it up from my >>>>>>>>>>>>>>>>>>>>>>>>> corporate email so we can get recordings and >>>>>>>>>>>>>>>>>>>>>>>>> transcriptions (and I've made >>>>>>>>>>>>>>>>>>>>>>>>> sure to keep the meeting invite open so we don't have >>>>>>>>>>>>>>>>>>>>>>>>> to manually let >>>>>>>>>>>>>>>>>>>>>>>>> people in). >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> In terms of next steps of areas which I think >>>>>>>>>>>>>>>>>>>>>>>>> would be good to focus on for establishing consensus: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> 1. How do we model the manifest entry structure >>>>>>>>>>>>>>>>>>>>>>>>> so that changes to manifest DVs can be obtained >>>>>>>>>>>>>>>>>>>>>>>>> easily from the root? There >>>>>>>>>>>>>>>>>>>>>>>>> are a few options here; the most promising approach >>>>>>>>>>>>>>>>>>>>>>>>> is to keep an >>>>>>>>>>>>>>>>>>>>>>>>> additional DV which encodes the diff in additional >>>>>>>>>>>>>>>>>>>>>>>>> positions which have >>>>>>>>>>>>>>>>>>>>>>>>> been removed from a leaf manifest. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> 2. Modeling partition transforms via expressions >>>>>>>>>>>>>>>>>>>>>>>>> and establishing a unified table ID space so that we >>>>>>>>>>>>>>>>>>>>>>>>> can simplify how >>>>>>>>>>>>>>>>>>>>>>>>> partition tuples may be represented via stats and >>>>>>>>>>>>>>>>>>>>>>>>> also have a way in the >>>>>>>>>>>>>>>>>>>>>>>>> future to store stats on any derived column. I have a >>>>>>>>>>>>>>>>>>>>>>>>> short >>>>>>>>>>>>>>>>>>>>>>>>> proposal >>>>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1oV8dapKVzB4pZy5pKHUCj5j9i2_1p37BJSeT7hyKPpg/edit?tab=t.0> >>>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>>> this that probably still needs some tightening up on >>>>>>>>>>>>>>>>>>>>>>>>> the expression >>>>>>>>>>>>>>>>>>>>>>>>> modeling itself (and some prototyping) but the >>>>>>>>>>>>>>>>>>>>>>>>> general idea for >>>>>>>>>>>>>>>>>>>>>>>>> establishing a unified table ID space is covered. All >>>>>>>>>>>>>>>>>>>>>>>>> feedback welcome! >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Aug 25, 2025 at 1:34 PM Kevin Liu < >>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Amogh. Looks like the recording for last >>>>>>>>>>>>>>>>>>>>>>>>>> week's sync is available on Youtube. Here's the link, >>>>>>>>>>>>>>>>>>>>>>>>>> https://www.youtube.com/watch?v=uWm-p--8oVQ >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>> Kevin Liu >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Aug 12, 2025 at 9:10 PM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hey folks, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Just following up on this to give the community >>>>>>>>>>>>>>>>>>>>>>>>>>> as to where we're at and my proposed next steps. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I've been editing and merging the contents from >>>>>>>>>>>>>>>>>>>>>>>>>>> our proposal into the proposal >>>>>>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0#heading=h.unn922df0zzw> >>>>>>>>>>>>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>>>>>>>>>> Russell and others. For any future comments on >>>>>>>>>>>>>>>>>>>>>>>>>>> docs, please comment on the >>>>>>>>>>>>>>>>>>>>>>>>>>> linked proposal. I've also marked it on our doc in >>>>>>>>>>>>>>>>>>>>>>>>>>> red text so it's clear >>>>>>>>>>>>>>>>>>>>>>>>>>> to redirect to the other proposal as a source of >>>>>>>>>>>>>>>>>>>>>>>>>>> truth for comments. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In terms of next steps, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 1. An important design decision point is around >>>>>>>>>>>>>>>>>>>>>>>>>>> inline manifest DVs, external manifest DVs or >>>>>>>>>>>>>>>>>>>>>>>>>>> enabling both. I'm working on >>>>>>>>>>>>>>>>>>>>>>>>>>> measuring different approaches for representing the >>>>>>>>>>>>>>>>>>>>>>>>>>> compressed DV >>>>>>>>>>>>>>>>>>>>>>>>>>> representation since that will inform how many >>>>>>>>>>>>>>>>>>>>>>>>>>> entries can reasonably fit >>>>>>>>>>>>>>>>>>>>>>>>>>> in a small root manifest; from that we can derive >>>>>>>>>>>>>>>>>>>>>>>>>>> implications on different >>>>>>>>>>>>>>>>>>>>>>>>>>> write patterns and determine the right approach for >>>>>>>>>>>>>>>>>>>>>>>>>>> storing these manifest >>>>>>>>>>>>>>>>>>>>>>>>>>> DVs. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Another key point is around determining >>>>>>>>>>>>>>>>>>>>>>>>>>> if/how we can reasonably enable V4 to represent >>>>>>>>>>>>>>>>>>>>>>>>>>> changes in the root >>>>>>>>>>>>>>>>>>>>>>>>>>> manifest so that readers can effectively just infer >>>>>>>>>>>>>>>>>>>>>>>>>>> file level changes from >>>>>>>>>>>>>>>>>>>>>>>>>>> the root. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 3. One of the aspects of the proposal is getting >>>>>>>>>>>>>>>>>>>>>>>>>>> away from partition tuple requirement in the root >>>>>>>>>>>>>>>>>>>>>>>>>>> which currently holds us >>>>>>>>>>>>>>>>>>>>>>>>>>> to have associativity between a partition spec and >>>>>>>>>>>>>>>>>>>>>>>>>>> a manifest. These >>>>>>>>>>>>>>>>>>>>>>>>>>> aspects can be modeled as essentially column stats >>>>>>>>>>>>>>>>>>>>>>>>>>> which gives a lot of >>>>>>>>>>>>>>>>>>>>>>>>>>> flexibility into the organization of the manifest. >>>>>>>>>>>>>>>>>>>>>>>>>>> There are important >>>>>>>>>>>>>>>>>>>>>>>>>>> details around field ID spaces here which tie into >>>>>>>>>>>>>>>>>>>>>>>>>>> how the stats are >>>>>>>>>>>>>>>>>>>>>>>>>>> structured. What we're proposing here is to have a >>>>>>>>>>>>>>>>>>>>>>>>>>> unified expression ID >>>>>>>>>>>>>>>>>>>>>>>>>>> space that could also benefit us for storing things >>>>>>>>>>>>>>>>>>>>>>>>>>> like virtual columns >>>>>>>>>>>>>>>>>>>>>>>>>>> down the line. I go into this in the proposal but >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm working on separating >>>>>>>>>>>>>>>>>>>>>>>>>>> the appropriate parts so that the original proposal >>>>>>>>>>>>>>>>>>>>>>>>>>> can mostly just focus >>>>>>>>>>>>>>>>>>>>>>>>>>> on the organization of the content metadata tree >>>>>>>>>>>>>>>>>>>>>>>>>>> and not how we want to >>>>>>>>>>>>>>>>>>>>>>>>>>> solve this particular ID space problem. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 4. I'm planning on scheduling a recurring >>>>>>>>>>>>>>>>>>>>>>>>>>> community sync starting next Tuesday at 9am PST, >>>>>>>>>>>>>>>>>>>>>>>>>>> every 2 weeks. If I get >>>>>>>>>>>>>>>>>>>>>>>>>>> feedback from folks that this time will never work, >>>>>>>>>>>>>>>>>>>>>>>>>>> I can certainly adjust. >>>>>>>>>>>>>>>>>>>>>>>>>>> For some reason, I don't have the ability to add to >>>>>>>>>>>>>>>>>>>>>>>>>>> the Iceberg Dev >>>>>>>>>>>>>>>>>>>>>>>>>>> calendar, so I'll figure that out and update the >>>>>>>>>>>>>>>>>>>>>>>>>>> thread when the event is >>>>>>>>>>>>>>>>>>>>>>>>>>> scheduled. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 22, 2025 at 11:47 AM Russell Spitzer >>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I think this is a great way forward, starting >>>>>>>>>>>>>>>>>>>>>>>>>>>> out with this much parallel development shows that >>>>>>>>>>>>>>>>>>>>>>>>>>>> we have a lot of >>>>>>>>>>>>>>>>>>>>>>>>>>>> consensus already :) >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 22, 2025 at 12:42 PM Amogh >>>>>>>>>>>>>>>>>>>>>>>>>>>> Jahagirdar <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hey folks, just following up on this. It looks >>>>>>>>>>>>>>>>>>>>>>>>>>>>> like our proposal and the proposal that @Russell >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Spitzer <[email protected]> shared >>>>>>>>>>>>>>>>>>>>>>>>>>>>> are pretty aligned. I was just chatting with >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Russell about this, and we >>>>>>>>>>>>>>>>>>>>>>>>>>>>> think it'd be best to combine both proposals and >>>>>>>>>>>>>>>>>>>>>>>>>>>>> have a singular large >>>>>>>>>>>>>>>>>>>>>>>>>>>>> effort on this. I can also set up a focused >>>>>>>>>>>>>>>>>>>>>>>>>>>>> community discussion (similar >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to what we're doing on the other V4 proposals) on >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this starting sometime >>>>>>>>>>>>>>>>>>>>>>>>>>>>> next week just to get things moving, if that >>>>>>>>>>>>>>>>>>>>>>>>>>>>> works for people. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 9:48 PM Amogh >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jahagirdar <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hey Russell, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for sharing the proposal! A few of us >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Ryan, Dan, Anoop and I) have also been working >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on a proposal for an >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adaptive metadata tree structure as part of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enabling more efficient one >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file commits. From a read of the summary, it's >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> great to see that we're >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking along the same lines about how to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tackle this fundamental area! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Here is our proposal: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 8:08 PM Russell >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Spitzer <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hey y'all! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We (Yi Fang, Steven Wu and Myself) wanted to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> share some >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the thoughts we had on how one-file >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> commits could work in Iceberg. This is pretty >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> much just a high level overview of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> concepts we think we need and how Iceberg would >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behave. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We haven't gone very far into the actual >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation and changes that would need to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> occur in the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SDK to make this happen. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The high level summary is: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Manifest Lists are out >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Root Manifests take their place >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A Root manifest can have data manifests, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> delete manifests, manifest delete vectors, data >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> delete vectors and data >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> files >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Manifest delete vectors allow for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> modifying a manifest without deleting it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entirely >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Data files let you append without writing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an intermediary manifest >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Having child data and delete >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifests lets you still scale >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please take a look if you like, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm excited to see what other proposals and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ideas are floating around the community, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Russ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 6:29 PM John Zhuge < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very excited about the idea! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 1:17 PM Anoop >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Johnson <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm very interested in this initiative. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Micah Kornfield and I presented >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://youtu.be/4d4nqKkANdM?si=9TXgaUIXbq-l8idi&t=1405> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on high-throughput ingestion for Iceberg >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables at the 2024 Iceberg Summit, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which leveraged Google infrastructure like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Colossus for efficient appends. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This new proposal is particularly exciting >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it offers significant advancements in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> commit latency and metadata >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> storage footprint. Furthermore, a consistent >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifest structure promises to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simplify the design and codebase, which is a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> major benefit. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A related idea I've been exploring is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> having a loose affinity between data and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> delete manifests. While the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current separation of data and delete >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifests in Iceberg is valuable for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> avoiding data file rewrites (and stats >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> updates) when deletes change, it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does necessitate a join operation during >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reads. I'd be keen to discuss >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approaches that could potentially reduce this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> read-side cost while >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retaining the benefits of separate manifests. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Anoop >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 13, 2025 at 11:06 AM Jagdeep >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sidhu <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am new to the Iceberg community but >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would love to participate in these >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussions to reduce the number of file >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> writes, especially for small writes/commits. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Jagdeep >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 5, 2025 at 4:02 PM Anurag >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mantripragada >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We have been hitting all the metadata >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problems you mentioned, Ryan. I’m on-board >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to help however I can to improve >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this area. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~ Anurag Mantripragada >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 3, 2025, at 2:22 AM, Huang-Hsiang >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheng <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am interested in this idea and looking >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forward to collaboration. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Huang-Hsiang >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 2, 2025, at 10:14 AM, namratha mk >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am interested in contributing to this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> effort. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Namratha >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 1:36 PM Amogh >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jahagirdar <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for kicking this thread off >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryan, I'm interested in helping out here! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've been working on a proposal >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in this area and it would be great to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> collaborate with different folks and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exchange ideas here, since I think a lot >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of people are interested in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> solving this problem. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Blue <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Like Russell’s recent note, I’m >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> starting a thread to connect those of us >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that are interested in the idea of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> changing Iceberg’s metadata in v4 so that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in most cases committing a change >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only requires writing one additional >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata file. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *Idea: One-file commits* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The current Iceberg metadata structure >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires writing at least one manifest >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and a new manifest list to produce a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new snapshot. The goal of this work is to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> allow more flexibility by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> allowing the manifest list layer to store >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data and delete files. As a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> result, only one file write would be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needed before committing the new >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> snapshot. In addition, this work will >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also try to explore: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Avoiding small manifests that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> must be read in parallel and later >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compacted (metadata maintenance changes) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Extend metadata skipping to use >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aggregated column ranges that are >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatible with geospatial data (manifest >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Using soft deletes to avoid >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriting existing manifests (metadata >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DVs) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you’re interested in these >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problems, please reply! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryan >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> John Zhuge >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
