liurenjie1024 opened a new issue, #3: URL: https://github.com/apache/iceberg-rust/issues/3
When designing in memory model for iceberg spce(metadata, snapshots, manifests etc), there are some problems to consider: ## API safety. For a field in spec, there are several cases to consider: 1. Exists in both v1 and v2, and both are required 2. Exists in both v1 and v2, and both are optional 3. Exists in both v1 and v2, but optional in one and require in another. 4. Exists in v1, but deprecated v2. 5. Exists in v2, and have default value in v1. Let's use the `summary` field in [snapshot spec](https://iceberg.apache.org/spec/#snapshots) as an example. There are some potential solutions to this: 1. Just like in https://github.com/JanKaul/iceberg-rust, we have different models for v1 and v2 spec, e.g. `SnapshotV1` and `SnapshotV2`. This way we can guarantee the api safety of model, but it introduces extra maintaince burden when spec evolves, since most fields are common in both spec. 2. Use one struct for both version(https://github.com/icelake-io/icelake/), and let users to do the check at runtime. This way we don't have extra maintaince effort for different specs, but we can't guarantee the api safety. ## Field access Currently, in both two repos, all fields are accessed through the public field. Personally, I'm not in favor of this approach since it may cause some problems: 1. Misuse of fields. For example, for `TableMetadata`, when we want to append a new snapshot, we should both append snapshot and snapshot log, but exposing `Vec<Snapshot>` may lead to wrong usage. 2. In-memory data structure should be designed for high-performance access. For example, `TableMetadata` should contain a map of snapshot id to snapshot, rather than a vector of snapshots, so that we can access snapshot by id fast. 3. The in-memory data struct fields should not have a one-one mapping as spec, but should provide more friendly access methods to user. For example, there should no `ManifestList` data structure, and user should access manifest fiels from `Snapshot`'s method. So for in-memory data structure, I would propose hiding all public fields and provide public methods to access necessary fields. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
