amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1502013528
########## tests/catalog/test_hive.py: ########## @@ -277,7 +277,7 @@ def test_create_table(table_schema_simple: Schema, hive_database: HiveDatabase, ) ], current_schema_id=0, - last_partition_id=1000, + last_partition_id=999, Review Comment: @Fokko I took a look and integrated the changes, and after going back/forth I think I'd like to keep the changes as is. My rationale is that even after those changes there's some more workarounds that need to be done to make sure new IDs start at 1000 (after taking the changes directly, field IDs for non-REST will start at 1001 without any more changes). Now, technically it does not seem to be a hard requirement that ids need to start at 1000 for v2 tables. Even for v1 starting at 1000 does not seem to be a requirement, rather just how the Java lib was implemented. ``` In v1, partition field IDs were not tracked, but were assigned sequentially starting at 1000 in the reference implementation. ``` I read this as "this was how we originally implemented in Java but not really required so long as it's unique and increasing for new fields" All that said, I'd advocate for just following the practice of starting at 1000 for both v1 and v2 because it's just the established model and avoids any confusion. On returning 999 for unpartitioned spec: As @HonahX alluded to I think that makes for a logical API, considering we want to start a field at 1000, the unpartitioned spec (no fields) last assigned field ID should be one less than that. I don't think we want to return 1000 in that case. This is also what Spark sets for the unpartitioned spec, and is spec compliant (since the spec doesn't mandate any particular IDs) I could see the argument where we just want to return None for a last_assigned_partition_id for this API but that just shifts the responsibility to other locations to make sure Ids are set correctly according to our scheme which ends up being more messy imo compared to just defining the API to return 999 if the only spec is the unpartitioned spec (which is the value which would be set anyways). I also think it makes sense to set last-assigned-partition-id for both V1 and V2. Even though it's optional for V1 we set the other optional fields for v1 metadata so it seems a bit odd to make this particular case an exception. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org