laskoviymishka commented on issue #1090:
URL: https://github.com/apache/iceberg-go/issues/1090#issuecomment-4482533843

   Good investigation!
   
   What I’d suggest as plan here:
   
   1. **Generate locally, then pin the bytes.**
      Add a small reproducible generator under something like 
`table/testdata/geo/generate/`. `pyarrow + geoarrow-pyarrow` seems totally fine 
here — probably the most ergonomic stack for WKB + GeoArrow extension metadata.
   
      I’d check in both the script and the generated `.parquet` files. The 
committed parquet files are the actual contract; the script is just there so 
future readers can see how they were produced and regenerate them if needed. 
Keep it lightweight: no package-manager setup, just README instructions with 
the `pip install` command and the package versions used.
   
   2. **Keep the fixture set small and intentional.**
      A few well-chosen cases are more useful than trying to cover everything. 
Exhaustive conformance is GeoArrow’s job, not Iceberg’s. I’d start with 
something like:
   
      * point column, geometry, WGS84
      * polygon column, geometry, WGS84
      * geography variant of one of those
      * mixed geometry types in one column
      * nulls plus one empty geometry
   
   3. **Document the upstream migration path.**
      In the README, call out that once Apache Iceberg or 
`apache/parquet-testing` has canonical geo fixtures, we should replace these 
locally generated files with upstream-pinned bytes. That makes it an explicit 
follow-up, not quiet tech debt.
   
   Scope-wise, I’d keep the PR to: generator script, the small set of generated 
`.parquet` files, and a simple Go loader test that opens each file and checks 
it parses. No geo-specific assertions yet, since the geo type plumbing hasn’t 
landed.
   
   That gets the fixtures into the tree, gives #984 and the downstream PRs 
something to reference, and we can tighten the assertions as the feature lands.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to