Hi Milan,

A short answer is that the current language of the spec does not
forbid writing "OGC:CRS84" to the CRS field (which is "just a string"
as far as thrift is concerned). All existing readers that I know about
(DuckDB, arrow-rs, Arrow C++, GDAL) will accept that string and
interpret it unambiguously on read (for example,
`GeoPandas.from_arrow(pyarrow.parquet.read_table(...))` works). There
is also an example file in parquet-testing that covers this case
(arbitrary string that is neither of the recommended options) [1]. I
put together a small example script to demonstrate the read path for
the tools I mentioned [2].

Jia is correct that the GeoParquet community will require writing an
inline PROJJSON string in the forthcoming 2.0 version of the
specification [3]. This was a pragmatic decision that reflects the
needs of existing GeoParquet users because:

- srid does not explicitly name the EPSG database, so any code written
there does not have an unambiguous interpretation (even if it did it
would place ambiguous licencing and/or dependency requirements on
consumers)
- projjson:some_field was not pragmatic to implement on the write side
for either of the implementations I was involved in (C++ and Rust).
Implementations just don't expose the global key/value metadata when
converting types and doing so would have required breaking changes in
the APIs. There are also ambiguities with respect to existing
propagation of schema metadata (i.e., the projjson schema key is often
propagated in unexpected ways into pyarrow and beyond, including being
written into the key/value metadata of a resulting Parquet file).

As a result, most of the tools that can write GEOMETRY and GEOGRAPHY
(Arrow C++, GDAL, arrow-rs are currently writing inline strings
(because inline strings are what is available in the representation
passed to Arrow-based writers and this was better than omitting CRS
information). For all the implementations I was involved in, we also
try to explicitly omit the CRS when we detect that the string we were
passed is lon/lat (i.e., if they see "OGC:CRS84", they write an
omitted CRS to minimize the need for consumers to be CRS aware).

I'll echo Jia's comment that none of us are keen to reopen a CRS
discussion but I also agree that the current language of the spec is
vague and doesn't reflect the reality of the ecosystem as it has
evolved. I'm happy to review any PRs to improve the language or
implementations :)

Cheers,

-dewey

[1] 
https://github.com/apache/parquet-testing/tree/master/data/geospatial#geospatial-test-files
[2] https://gist.github.com/paleolimbot/7759e58bf1f98ecf8f2c459367bbdeda
[3] 
https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#crs-parquet-property

On Wed, Mar 25, 2026 at 12:49 AM Jia Yu <[email protected]> wrote:
>
> Hi Milan,
>
> The authority:identifier pattern was explicitly rejected in prior
> community discussions. The core concern is that it forces query
> engines to rely on external registries to resolve CRS definitions,
> which breaks the goal of self-contained data. More importantly, the
> most widely used authority, the EPSG database, comes with licensing
> terms that are not particularly open-source friendly:
> https://epsg.org/terms-of-use.html
>
> As a result, the community has leaned toward requiring data writers to
> use a fully self-contained CRS representation such as PROJJSON. In
> that model, a reference like OGC:CRS84 is understood to map directly
> to its corresponding PROJJSON definition, as outlined in the
> GeoParquet specification:
> https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#ogccrs84-details
>
> That said, this expectation is not clearly spelled out in the Parquet
> and Iceberg specifications, which leaves some ambiguity in practice.
>
> I don’t have a strong stance either way. In fact, I can see the case
> for allowing authority:identifier. But it’s worth noting that
> introducing it now would likely reopen a fairly contentious discussion
> in the community.
>
> Jia
>
> On Tue, Mar 24, 2026 at 10:09 AM Milan Stefanovic
> <[email protected]> wrote:
> >
> > Hi everyone,
> >
> > I’m looking for some clarification (and potentially a small spec update)
> > regarding the Geospatial Physical Types documentation -
> > https://parquet.apache.org/docs/file-format/types/geospatial/, specifically
> > the CRS Customization section.
> >
> > 1) The Confusion
> >
> > Currently, the spec states that custom CRS values should follow the
> > `type:identifier` format, where type is either `srid` or `projjson` -
> > (e.g., `srid:4326` or `projjson:property_name`). The spec also defines the
> > default CRS as `OGC:CRS84`.
> >
> > Depending on how the specification is read, the reader may consider as
> > valid CRS definition to be only strings of the form `srid:<some number>` or
> > `projjson:<property name>`, which implies that `OGC:CRS84` does not adhere
> > to the rules defined in the customization section. This creates confusion
> > for implementers: should the type string always be parsed as a strict
> > "custom" format which necessitates the srid: prefix?
> >
> > 2) The Suggestion
> >
> > I suggest we update the language to be explicit about allowed formats for
> > CRS, and my suggestion is that we break it down like this:
> >    - Standard CRS: Any string from a known authority in a format of
> > `<authority>:<identifier>` (e.g., `EPSG:4326`, `OGC:CRS84`, `ESRI:102100`)
> > is accepted.
> >    - Custom CRS: in the format of `type:identifier`
> >          - `srid:1234`: The definition resides in a local/database spatial
> > reference table.
> >          - `projjson:key`: The definition is stored in Parquet file/table
> > metadata.
> >
> > This would validate `OGC:CRS84` as a first-class string while providing a
> > clear "escape hatch" for custom definitions.
> >
> > What are your thoughts ?
> >
> > Kind regards,
> > Milan

Reply via email to