Thanks Jia and Dewey,

Dewey, you mentioned current writers using inline strings - what are they
inlining ? are they inlining projjsons or authority:identifiers ?
Given that current implementations avoided using srid:<number> and
projjson:<field_ref> perhaps we should remove these examples from spec as
they seem to bring some confusion.

@Jia Yu <[email protected]>, you mentioned that OGC:CRS84 are understood to
map directly to its corresponding PROJJSON definition.
Arent EPSG:<number> also understood to map directly to
corresponding PROJJSON definition ?

Also I'm fine with not being explicit about `authorithy:identifier` if that
was the prior consensus, but if reality of current implementations is such
that most implementations do write `authorithy:identifier`, spec should be
written so that at least it doesn't look like thats invalid.

What are your thoughts?

Milan

On Wed, 25 Mar 2026 at 15:53, Dewey Dunnington <[email protected]>
wrote:

> Hi Milan,
>
> A short answer is that the current language of the spec does not
> forbid writing "OGC:CRS84" to the CRS field (which is "just a string"
> as far as thrift is concerned). All existing readers that I know about
> (DuckDB, arrow-rs, Arrow C++, GDAL) will accept that string and
> interpret it unambiguously on read (for example,
> `GeoPandas.from_arrow(pyarrow.parquet.read_table(...))` works). There
> is also an example file in parquet-testing that covers this case
> (arbitrary string that is neither of the recommended options) [1]. I
> put together a small example script to demonstrate the read path for
> the tools I mentioned [2].
>
> Jia is correct that the GeoParquet community will require writing an
> inline PROJJSON string in the forthcoming 2.0 version of the
> specification [3]. This was a pragmatic decision that reflects the
> needs of existing GeoParquet users because:
>
> - srid does not explicitly name the EPSG database, so any code written
> there does not have an unambiguous interpretation (even if it did it
> would place ambiguous licencing and/or dependency requirements on
> consumers)
> - projjson:some_field was not pragmatic to implement on the write side
> for either of the implementations I was involved in (C++ and Rust).
> Implementations just don't expose the global key/value metadata when
> converting types and doing so would have required breaking changes in
> the APIs. There are also ambiguities with respect to existing
> propagation of schema metadata (i.e., the projjson schema key is often
> propagated in unexpected ways into pyarrow and beyond, including being
> written into the key/value metadata of a resulting Parquet file).
>
> As a result, most of the tools that can write GEOMETRY and GEOGRAPHY
> (Arrow C++, GDAL, arrow-rs are currently writing inline strings
> (because inline strings are what is available in the representation
> passed to Arrow-based writers and this was better than omitting CRS
> information). For all the implementations I was involved in, we also
> try to explicitly omit the CRS when we detect that the string we were
> passed is lon/lat (i.e., if they see "OGC:CRS84", they write an
> omitted CRS to minimize the need for consumers to be CRS aware).
>
> I'll echo Jia's comment that none of us are keen to reopen a CRS
> discussion but I also agree that the current language of the spec is
> vague and doesn't reflect the reality of the ecosystem as it has
> evolved. I'm happy to review any PRs to improve the language or
> implementations :)
>
> Cheers,
>
> -dewey
>
> [1]
> https://github.com/apache/parquet-testing/tree/master/data/geospatial#geospatial-test-files
> [2] https://gist.github.com/paleolimbot/7759e58bf1f98ecf8f2c459367bbdeda
> [3]
> https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#crs-parquet-property
>
> On Wed, Mar 25, 2026 at 12:49 AM Jia Yu <[email protected]> wrote:
> >
> > Hi Milan,
> >
> > The authority:identifier pattern was explicitly rejected in prior
> > community discussions. The core concern is that it forces query
> > engines to rely on external registries to resolve CRS definitions,
> > which breaks the goal of self-contained data. More importantly, the
> > most widely used authority, the EPSG database, comes with licensing
> > terms that are not particularly open-source friendly:
> > https://epsg.org/terms-of-use.html
> >
> > As a result, the community has leaned toward requiring data writers to
> > use a fully self-contained CRS representation such as PROJJSON. In
> > that model, a reference like OGC:CRS84 is understood to map directly
> > to its corresponding PROJJSON definition, as outlined in the
> > GeoParquet specification:
> >
> https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#ogccrs84-details
> >
> > That said, this expectation is not clearly spelled out in the Parquet
> > and Iceberg specifications, which leaves some ambiguity in practice.
> >
> > I don’t have a strong stance either way. In fact, I can see the case
> > for allowing authority:identifier. But it’s worth noting that
> > introducing it now would likely reopen a fairly contentious discussion
> > in the community.
> >
> > Jia
> >
> > On Tue, Mar 24, 2026 at 10:09 AM Milan Stefanovic
> > <[email protected]> wrote:
> > >
> > > Hi everyone,
> > >
> > > I’m looking for some clarification (and potentially a small spec
> update)
> > > regarding the Geospatial Physical Types documentation -
> > > https://parquet.apache.org/docs/file-format/types/geospatial/,
> specifically
> > > the CRS Customization section.
> > >
> > > 1) The Confusion
> > >
> > > Currently, the spec states that custom CRS values should follow the
> > > `type:identifier` format, where type is either `srid` or `projjson` -
> > > (e.g., `srid:4326` or `projjson:property_name`). The spec also defines
> the
> > > default CRS as `OGC:CRS84`.
> > >
> > > Depending on how the specification is read, the reader may consider as
> > > valid CRS definition to be only strings of the form `srid:<some
> number>` or
> > > `projjson:<property name>`, which implies that `OGC:CRS84` does not
> adhere
> > > to the rules defined in the customization section. This creates
> confusion
> > > for implementers: should the type string always be parsed as a strict
> > > "custom" format which necessitates the srid: prefix?
> > >
> > > 2) The Suggestion
> > >
> > > I suggest we update the language to be explicit about allowed formats
> for
> > > CRS, and my suggestion is that we break it down like this:
> > >    - Standard CRS: Any string from a known authority in a format of
> > > `<authority>:<identifier>` (e.g., `EPSG:4326`, `OGC:CRS84`,
> `ESRI:102100`)
> > > is accepted.
> > >    - Custom CRS: in the format of `type:identifier`
> > >          - `srid:1234`: The definition resides in a local/database
> spatial
> > > reference table.
> > >          - `projjson:key`: The definition is stored in Parquet
> file/table
> > > metadata.
> > >
> > > This would validate `OGC:CRS84` as a first-class string while
> providing a
> > > clear "escape hatch" for custom definitions.
> > >
> > > What are your thoughts ?
> > >
> > > Kind regards,
> > > Milan
>

Reply via email to