jornfranke commented on issue #2586:
URL: https://github.com/apache/iceberg/issues/2586#issuecomment-1671940533

   I think it would be good to have Geospatial support in Apache Iceberg, 
although it is certainly a more complex feature.
   While the spatialx-project seems to have done a lot of useful 
implementation, it is a bit difficult to use as the changes made to the Iceberg 
are unclear and I could also not find some documentation on how geometry is 
added to Iceberg.
   
   I propose that this documentation is started so this issue can move on. 
@badbye does it make sense to you that you or me create a Google Docs (or 
Cryptpad, https://cryptpad.fr/) document that is viewable by everybody 
(similarly to how other specs are done in Iceberg?)? Happy to also help with 
the writing structuring.
   
   One could initially have it as follows:
   
   What benefit does one have to use Apache Iceberg with Geospatial data 
instead of using, for instance, simply [geoparquet](https://geoparquet.org/)? I 
would think about:
   * Support writing of individual rows (this can be useful in streaming 
scenarios, e.g. Internet of Thing devices communicating their position).
   * Natively already query the geospatial data without manual/error-prone 
conversion (e.g. using right CRS when loading etc.) and by having a much higher 
performance
   ... maybe you have some other motivations why you started geolake?
   
   One can also think about other features (would not add them in the first 
spec due to complexity):
   * Partition of data according to spatial (location) criteria (see also: 
https://github.com/opengeospatial/geoparquet/issues/13#issuecomment-1057437189),
 which seems to be supported by Geolake (I wonder can we instead/additionally 
use the z-ordering feature of Iceberg to reuse the Iceberg functionality?)
   * Loading/Storing rasters (at the moment all proposals, including 
geoparquet, include only vector data), more complex, the raster should be split 
in equal small tiles et.
   
   
   I suggest that a public Google Doc is started and that one can add what it 
would mean for Iceberg to support Geospatial support, e.g.:
   * Augmentation of the Iceberg Spec (https://iceberg.apache.org/spec/)
      * Update Data type to include Geometry 
(https://iceberg.apache.org/spec/#schemas-and-data-types), probably it should 
be internally based on geoarrow (https://github.com/geoarrow/geoarrow/) - or 
maybe you have some idea based on your Geolake implementation?
      * Requirements for storage formats (Suggest to focus in the initial 
release only on parquet as it the only one which has geoparquet defined, but in 
future releases one could also include avro, orc using a similar specification 
as geoparquet)
      * ...
   * Interdependencies to tools 
     * Apache Sedona - how can we make sure that the Geometry column is 
compatible (should we reuse the Sedona Geometry class? Or should we provide as 
a pull request to Sedona a Spark function that does the conversion?). It seems 
you provide a solution here: 
https://github.com/spatialx-project/sedona-iceberg-extension
     * Geopandas - how can we integrate Geopandas 
(https://geopandas.org/en/stable/) with PyIceberg 
(https://py.iceberg.apache.org/)
     * ... (e.g. QGIS support - this could be solved if the Geopandas support 
is solved)
   * Planning of a roadmap of features (as said before, I suggest to have more 
complex things later)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to