laskoviymishka commented on code in PR #1138:
URL: https://github.com/apache/iceberg-go/pull/1138#discussion_r3382461780


##########
table/arrow_utils.go:
##########
@@ -375,6 +376,17 @@ func (c convertToIceberg) Primitive(dt arrow.DataType) 
(result iceberg.NestedFie
                        result.Type = iceberg.PrimitiveTypes.UUID
                case "parquet.variant":
                        result.Type = iceberg.VariantType{}
+               case "geoarrow.wkb":
+                       wkb, ok := dt.(*geoarrow.WKBType)
+                       if !ok {
+                               panic(fmt.Errorf("%w: unsupported arrow type 
for conversion - %s", iceberg.ErrInvalidSchema, dt))
+                       }
+
+                       iceType, err := 
geoArrowMetadataToIcebergType(wkb.Metadata())
+                       if err != nil {
+                               panic(fmt.Errorf("%w: %v", 
iceberg.ErrInvalidSchema, err))

Review Comment:
   The `%v` on `err` drops it out of the wrapped chain, so 
`errors.Is`/`errors.As` against the underlying error won't work downstream. I'd 
use `%w` for both:
   
   ```go
   panic(fmt.Errorf("%w: converting geoarrow metadata: %w", 
iceberg.ErrInvalidSchema, err))
   ```



##########
table/arrow_utils.go:
##########
@@ -1828,3 +1844,72 @@ func positionDeleteRecordsToDataFilesDV(ctx 
context.Context, rootLocation string
                }
        }
 }
+
+func geoArrowCRSToIcebergCRS(meta geoarrow.Metadata) (string, error) {
+       if len(meta.CRS) == 0 && meta.CRSType == "" {
+               return "srid:0", nil
+       }
+
+       var crs string
+       if len(meta.CRS) > 0 {
+               if err := json.Unmarshal(meta.CRS, &crs); err != nil {
+                       return "", fmt.Errorf("invalid geoarrow CRS metadata: 
%w", err)
+               }
+       }
+
+       if crs == "OGC:CRS84" || crs == "EPSG:4326" {
+               return "OGC:CRS84", nil
+       }
+
+       switch meta.CRSType {
+       case geoarrow.CRSTypeSRID:
+               return "srid:" + crs, nil
+       case geoarrow.CRSTypePROJJSON:
+               return "", errors.New("geoarrow CRS type projjson not supported 
yet")
+       default:
+               return crs, nil
+       }
+}
+
+func geoArrowMetadataToIcebergType(meta geoarrow.Metadata) (iceberg.Type, 
error) {
+       crs, err := geoArrowCRSToIcebergCRS(meta)
+       if err != nil {
+               return nil, err
+       }
+
+       switch meta.Edges {
+       case geoarrow.EdgePlanar:
+               return iceberg.GeometryTypeOf(crs)
+       case geoarrow.EdgeVincenty, geoarrow.EdgeKarney, geoarrow.EdgeThomas, 
geoarrow.EdgeAndoyer, geoarrow.EdgeSpherical:
+               return iceberg.GeographyTypeOf(crs, string(meta.Edges))
+       default:
+               return nil, fmt.Errorf("unsupported geoarrow edges %q", 
meta.Edges)
+       }
+}
+
+func icebergCRSToGeoArrowMetadata(crs string) geoarrow.Metadata {
+       if strings.HasPrefix(strings.ToLower(crs), "srid:") {
+               id := crs[len("srid:"):]
+
+               if id == "0" {
+                       return geoarrow.NewMetadata() // srid:0 maps to omitted 
GeoArrow CRS
+               }
+
+               raw, _ := json.Marshal(id)
+
+               return geoarrow.Metadata{
+                       CRS:     raw,
+                       CRSType: geoarrow.CRSTypeSRID,
+               }
+       }
+
+       if strings.HasPrefix(strings.ToLower(crs), "projjson:") {

Review Comment:
   This panics on `projjson`, but the symmetric read function 
`geoArrowCRSToIcebergCRS` returns a clean error for the same case. The 
asymmetry is the problem: `VisitGeometry`/`VisitGeography` have no error return 
and nothing recovers, so a `GeometryType` with a `projjson:` CRS detonates the 
whole conversion at `TypeToArrowType` rather than surfacing an error.
   
   I'd return an error here and thread it through the same way the read path 
does, so a projjson CRS fails gracefully instead of crashing the process.



##########
table/arrow_utils.go:
##########
@@ -1828,3 +1844,72 @@ func positionDeleteRecordsToDataFilesDV(ctx 
context.Context, rootLocation string
                }
        }
 }
+
+func geoArrowCRSToIcebergCRS(meta geoarrow.Metadata) (string, error) {
+       if len(meta.CRS) == 0 && meta.CRSType == "" {
+               return "srid:0", nil
+       }
+
+       var crs string
+       if len(meta.CRS) > 0 {
+               if err := json.Unmarshal(meta.CRS, &crs); err != nil {
+                       return "", fmt.Errorf("invalid geoarrow CRS metadata: 
%w", err)
+               }
+       }
+
+       if crs == "OGC:CRS84" || crs == "EPSG:4326" {

Review Comment:
   EPSG:4326 and OGC:CRS84 aren't the same CRS — they have opposite axis order 
(CRS84 is lon/lat, EPSG:4326 is lat/lon). Collapsing them here means a 
`geometry("EPSG:4326")` field — which is exactly what PyIceberg emits via 
`ga.wkb().with_crs("EPSG:4326")` — reads back as `geometry("OGC:CRS84")`, so 
the schema silently changes on the read side.
   
   The write path doesn't share the collapse either: 
`icebergCRSToGeoArrowMetadata` emits `EPSG:4326` verbatim, so 
`GeometryType{crs:"EPSG:4326"} → Arrow → GeometryType{crs:"OGC:CRS84"}` and the 
two aren't `Equals`.
   
   I'd drop the `EPSG:4326` case here and treat it as a distinct authority 
code. If we genuinely want them unified, it has to be symmetric — normalize on 
both the write and read sides so the round trip is stable — and it shouldn't 
live silently at the schema layer. Either way a round-trip regression test for 
`EPSG:4326` specifically would lock it down. wdyt?



##########
table/arrow_utils.go:
##########
@@ -1828,3 +1844,72 @@ func positionDeleteRecordsToDataFilesDV(ctx 
context.Context, rootLocation string
                }
        }
 }
+
+func geoArrowCRSToIcebergCRS(meta geoarrow.Metadata) (string, error) {
+       if len(meta.CRS) == 0 && meta.CRSType == "" {
+               return "srid:0", nil
+       }
+
+       var crs string
+       if len(meta.CRS) > 0 {
+               if err := json.Unmarshal(meta.CRS, &crs); err != nil {
+                       return "", fmt.Errorf("invalid geoarrow CRS metadata: 
%w", err)
+               }
+       }
+
+       if crs == "OGC:CRS84" || crs == "EPSG:4326" {
+               return "OGC:CRS84", nil
+       }
+
+       switch meta.CRSType {
+       case geoarrow.CRSTypeSRID:
+               return "srid:" + crs, nil
+       case geoarrow.CRSTypePROJJSON:
+               return "", errors.New("geoarrow CRS type projjson not supported 
yet")
+       default:

Review Comment:
   `CRSTypeAuthorityCode` and `CRSTypeWKT22019` both fall into `default` and 
get returned as-is. For authority_code that's almost right but loses the type 
tag; for WKT2:2019 it hands a multi-KB WKT2 blob straight back as the Iceberg 
CRS string, which isn't a valid CRS identifier.
   
   I'd add an explicit `CRSTypeAuthorityCode` case, and for `CRSTypeWKT22019` 
either error like projjson does or reduce it to the authority code — whichever 
we pick, a test for each so the behavior is pinned.



##########
table/arrow_utils.go:
##########
@@ -1828,3 +1844,72 @@ func positionDeleteRecordsToDataFilesDV(ctx 
context.Context, rootLocation string
                }
        }
 }
+
+func geoArrowCRSToIcebergCRS(meta geoarrow.Metadata) (string, error) {
+       if len(meta.CRS) == 0 && meta.CRSType == "" {
+               return "srid:0", nil
+       }
+
+       var crs string
+       if len(meta.CRS) > 0 {
+               if err := json.Unmarshal(meta.CRS, &crs); err != nil {
+                       return "", fmt.Errorf("invalid geoarrow CRS metadata: 
%w", err)
+               }
+       }
+
+       if crs == "OGC:CRS84" || crs == "EPSG:4326" {
+               return "OGC:CRS84", nil
+       }
+
+       switch meta.CRSType {
+       case geoarrow.CRSTypeSRID:
+               return "srid:" + crs, nil
+       case geoarrow.CRSTypePROJJSON:
+               return "", errors.New("geoarrow CRS type projjson not supported 
yet")
+       default:
+               return crs, nil
+       }
+}
+
+func geoArrowMetadataToIcebergType(meta geoarrow.Metadata) (iceberg.Type, 
error) {
+       crs, err := geoArrowCRSToIcebergCRS(meta)
+       if err != nil {
+               return nil, err
+       }
+
+       switch meta.Edges {
+       case geoarrow.EdgePlanar:
+               return iceberg.GeometryTypeOf(crs)
+       case geoarrow.EdgeVincenty, geoarrow.EdgeKarney, geoarrow.EdgeThomas, 
geoarrow.EdgeAndoyer, geoarrow.EdgeSpherical:
+               return iceberg.GeographyTypeOf(crs, string(meta.Edges))
+       default:
+               return nil, fmt.Errorf("unsupported geoarrow edges %q", 
meta.Edges)
+       }
+}
+
+func icebergCRSToGeoArrowMetadata(crs string) geoarrow.Metadata {
+       if strings.HasPrefix(strings.ToLower(crs), "srid:") {
+               id := crs[len("srid:"):]
+
+               if id == "0" {
+                       return geoarrow.NewMetadata() // srid:0 maps to omitted 
GeoArrow CRS
+               }
+
+               raw, _ := json.Marshal(id)
+
+               return geoarrow.Metadata{
+                       CRS:     raw,
+                       CRSType: geoarrow.CRSTypeSRID,
+               }
+       }
+
+       if strings.HasPrefix(strings.ToLower(crs), "projjson:") {
+               panic(fmt.Errorf("%w: projjson CRS not supported yet", 
iceberg.ErrInvalidSchema))
+       }
+
+       raw, _ := json.Marshal(crs)
+
+       return geoarrow.Metadata{

Review Comment:
   For an authority-code CRS like `EPSG:4326` this emits `{"crs":"EPSG:4326"}` 
with no `crs_type`, but the GeoArrow spec wants `crs_type: "authority_code"` 
for `AUTHORITY:CODE` strings. Without it the encoding is ambiguous, and 
combined with the collapse in `geoArrowCRSToIcebergCRS` it's the other half of 
the EPSG:4326 round-trip break.
   
   I'd set `CRSType: geoarrow.CRSTypeAuthorityCode` when the CRS matches the 
authority:code shape, so the write side is unambiguous and pairs cleanly with 
an explicit authority_code read case.



##########
table/arrow_utils.go:
##########
@@ -1828,3 +1844,72 @@ func positionDeleteRecordsToDataFilesDV(ctx 
context.Context, rootLocation string
                }
        }
 }
+
+func geoArrowCRSToIcebergCRS(meta geoarrow.Metadata) (string, error) {
+       if len(meta.CRS) == 0 && meta.CRSType == "" {
+               return "srid:0", nil

Review Comment:
   A bare `ga.wkb()` field with no CRS — the PyIceberg default and what older 
clients emit — reads back as `geometry("srid:0")` rather than `geometry()`, 
which isn't `Equals` to a default-CRS geometry.
   
   Separately, an empty `meta.CRS` with a non-empty `CRSType` skips this early 
return and falls into the switch, so `CRSTypeSRID` yields `"srid:"` with an 
empty id and `GeometryTypeOf` accepts it silently. I'd return the `OGC:CRS84` 
default for the bare case and guard the empty-CRS-with-CRSType combination, 
with a test for bare `geoarrow.wkb`.



##########
table/arrow_utils.go:
##########
@@ -1828,3 +1844,72 @@ func positionDeleteRecordsToDataFilesDV(ctx 
context.Context, rootLocation string
                }
        }
 }
+
+func geoArrowCRSToIcebergCRS(meta geoarrow.Metadata) (string, error) {
+       if len(meta.CRS) == 0 && meta.CRSType == "" {
+               return "srid:0", nil
+       }
+
+       var crs string
+       if len(meta.CRS) > 0 {
+               if err := json.Unmarshal(meta.CRS, &crs); err != nil {
+                       return "", fmt.Errorf("invalid geoarrow CRS metadata: 
%w", err)
+               }
+       }
+
+       if crs == "OGC:CRS84" || crs == "EPSG:4326" {
+               return "OGC:CRS84", nil
+       }
+
+       switch meta.CRSType {
+       case geoarrow.CRSTypeSRID:
+               return "srid:" + crs, nil
+       case geoarrow.CRSTypePROJJSON:
+               return "", errors.New("geoarrow CRS type projjson not supported 
yet")
+       default:
+               return crs, nil
+       }
+}
+
+func geoArrowMetadataToIcebergType(meta geoarrow.Metadata) (iceberg.Type, 
error) {
+       crs, err := geoArrowCRSToIcebergCRS(meta)
+       if err != nil {
+               return nil, err
+       }
+
+       switch meta.Edges {
+       case geoarrow.EdgePlanar:
+               return iceberg.GeometryTypeOf(crs)
+       case geoarrow.EdgeVincenty, geoarrow.EdgeKarney, geoarrow.EdgeThomas, 
geoarrow.EdgeAndoyer, geoarrow.EdgeSpherical:
+               return iceberg.GeographyTypeOf(crs, string(meta.Edges))
+       default:
+               return nil, fmt.Errorf("unsupported geoarrow edges %q", 
meta.Edges)
+       }
+}
+
+func icebergCRSToGeoArrowMetadata(crs string) geoarrow.Metadata {

Review Comment:
   Small thing while we're here: `strings.ToLower(crs)` runs twice (here and 
for the projjson check) and the slice `crs[len("srid:"):]` indexes the 
original-case string after a lowercased prefix check. I'd hoist `lower := 
strings.ToLower(crs)` once and reuse it. Also worth noting the `EPSG:4326` 
match in the read function is case-sensitive, so `epsg:4326` falls through — 
`strings.EqualFold` would close that.



##########
table/arrow_utils.go:
##########
@@ -658,20 +670,24 @@ func (c convertToArrow) VisitVariant() arrow.Field {
        return arrow.Field{Type: extensions.NewDefaultVariantType()}
 }
 
-func (c convertToArrow) VisitGeometry(iceberg.GeometryType) arrow.Field {
+func (c convertToArrow) VisitGeometry(g iceberg.GeometryType) arrow.Field {
+       meta := icebergCRSToGeoArrowMetadata(g.CRS())
        if c.useLargeTypes {
-               return arrow.Field{Type: 
geoarrow.NewWKBType(geoarrow.WKBWithLargeBinaryStorage())}
+               return arrow.Field{Type: 
geoarrow.NewWKBType(geoarrow.WKBWithLargeBinaryStorage(), 
geoarrow.WKBWithMetadata(meta))}
        }
 
-       return arrow.Field{Type: 
geoarrow.NewWKBType(geoarrow.WKBWithBinaryStorage())}
+       return arrow.Field{Type: 
geoarrow.NewWKBType(geoarrow.WKBWithBinaryStorage(), 
geoarrow.WKBWithMetadata(meta))}
 }
 

Review Comment:
   The "always add an edge to differentiate geography from geometry" convention 
is Go-only. PyIceberg emits edges only for spherical algorithms and nothing for 
planar, so a `geography(OGC:CRS84, planar)` field written by PyIceberg arrives 
as `{"crs":"OGC:CRS84"}` with no edges and `geoArrowMetadataToIcebergType` 
reads it back as `GeometryType` — a genuine round-trip break for that 
combination.
   
   This is inherent to the GeoArrow↔Iceberg mapping: the Arrow extension 
metadata can't always distinguish the two, so the canonical source of the geo 
type is the Iceberg schema JSON, not the Arrow metadata. I don't think it 
blocks the PR, but I'd document it right at this comment — that the edge 
convention is a best-effort hint and planar geography from other clients won't 
round-trip through Arrow alone. wdyt?



##########
table/arrow_utils.go:
##########
@@ -1828,3 +1844,72 @@ func positionDeleteRecordsToDataFilesDV(ctx 
context.Context, rootLocation string
                }
        }
 }
+
+func geoArrowCRSToIcebergCRS(meta geoarrow.Metadata) (string, error) {
+       if len(meta.CRS) == 0 && meta.CRSType == "" {
+               return "srid:0", nil
+       }
+
+       var crs string
+       if len(meta.CRS) > 0 {
+               if err := json.Unmarshal(meta.CRS, &crs); err != nil {
+                       return "", fmt.Errorf("invalid geoarrow CRS metadata: 
%w", err)
+               }
+       }
+
+       if crs == "OGC:CRS84" || crs == "EPSG:4326" {
+               return "OGC:CRS84", nil
+       }
+
+       switch meta.CRSType {
+       case geoarrow.CRSTypeSRID:
+               return "srid:" + crs, nil
+       case geoarrow.CRSTypePROJJSON:
+               return "", errors.New("geoarrow CRS type projjson not supported 
yet")
+       default:
+               return crs, nil
+       }
+}
+
+func geoArrowMetadataToIcebergType(meta geoarrow.Metadata) (iceberg.Type, 
error) {
+       crs, err := geoArrowCRSToIcebergCRS(meta)
+       if err != nil {
+               return nil, err
+       }
+
+       switch meta.Edges {
+       case geoarrow.EdgePlanar:
+               return iceberg.GeometryTypeOf(crs)
+       case geoarrow.EdgeVincenty, geoarrow.EdgeKarney, geoarrow.EdgeThomas, 
geoarrow.EdgeAndoyer, geoarrow.EdgeSpherical:
+               return iceberg.GeographyTypeOf(crs, string(meta.Edges))
+       default:
+               return nil, fmt.Errorf("unsupported geoarrow edges %q", 
meta.Edges)
+       }
+}
+
+func icebergCRSToGeoArrowMetadata(crs string) geoarrow.Metadata {
+       if strings.HasPrefix(strings.ToLower(crs), "srid:") {
+               id := crs[len("srid:"):]
+
+               if id == "0" {
+                       return geoarrow.NewMetadata() // srid:0 maps to omitted 
GeoArrow CRS
+               }
+
+               raw, _ := json.Marshal(id)

Review Comment:
   `errcheck` is enabled outside `_test.go`, so the discarded errors here and 
at the `json.Marshal(crs)` below will fail CI. Marshaling a string can't 
actually fail, but the linter doesn't know that — I'd either build the raw JSON 
without the error return (e.g. `json.RawMessage(strconv.AppendQuote(nil, id))`) 
or add a `//nolint` with a one-line why.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to