liurenjie1024 opened a new issue, #3:
URL: https://github.com/apache/iceberg-rust/issues/3

   When designing in memory model for iceberg spce(metadata, snapshots, 
manifests etc), there are some problems to consider:
   
   ## API safety.
   
   For a field in spec, there are several cases to consider:
   
   1. Exists in both v1 and v2, and both are required
   2. Exists in both v1 and v2, and both are optional
   3. Exists in both v1 and v2, but optional in one and require in another.
   4. Exists in v1, but deprecated v2.
   5. Exists in v2, and have default value in v1.
   
   Let's use the `summary` field in [snapshot 
spec](https://iceberg.apache.org/spec/#snapshots) as an example. There are some 
potential solutions to this:
   1. Just like in https://github.com/JanKaul/iceberg-rust, we have different 
models for v1 and v2 spec, e.g. `SnapshotV1` and `SnapshotV2`. This way we can 
guarantee the api safety of model, but it introduces extra maintaince burden 
when spec evolves, since most fields are common in both spec.
   2. Use one struct for both version(https://github.com/icelake-io/icelake/), 
and let users to do the check at runtime. This way we don't have extra 
maintaince effort for different specs, but we can't guarantee the api safety.
   
   ##  Field access 
   Currently, in both two repos, all fields are accessed through the public 
field. Personally, I'm not in favor of this approach since it may cause some 
problems:
   1. Misuse of fields. For example, for `TableMetadata`, when we want to 
append a new snapshot, we should both append snapshot and snapshot log, but 
exposing `Vec<Snapshot>` may lead to wrong usage.
   2. In-memory data structure should be designed for high-performance access. 
For example, `TableMetadata` should contain a map of snapshot id to snapshot, 
rather than a vector of snapshots, so that we can access snapshot by id fast.
   3. The in-memory data struct fields should not have a one-one mapping as 
spec, but should provide more friendly access methods to user. For example, 
there should no `ManifestList` data structure, and user should access manifest 
fiels from `Snapshot`'s method.
   
   So for in-memory data structure, I would propose hiding all public fields 
and provide public methods to access necessary fields.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to