Francis Markham wrote:
As discussed in
http://lists.osgeo.org/pipermail/gdal-dev/2010-May/024619.html and
http://lists.osgeo.org/pipermail/gdal-dev/2010-July/025192.html OGR's
shapefile driver does not allow the shapefile's codepage to be set or
retrieved using the DBF LDID byte or an *.cpg file.
This functionality is implemented in recent shapelib releases, when
creating a new shapefile.
Issue #882 http://trac.osgeo.org/gdal/ticket/882 addresses this issue,
but the discussion there largely predates RFCs 5 and 23 (
http://trac.osgeo.org/gdal/wiki/rfc5_unicode and
http://trac.osgeo.org/gdal/wiki/rfc23_ogr_unicode ).
I would be interested in exposing this shapelib feature in OGR.
However, there are a number of design decisions to make:
1) Should encoding retrieval and setting be an OGR wide feature, or
one specific to the shapefile driver?
Francis,
Note that RFC 23 mandates that OGR layers return attributes in UTF-8,
so on "read" the expected action would be for the shapefile driver
to use the cpg and LDID files to identify the incoming encoding and
then use CPLRecode to convert to UTF-8. So on read there is no
need for an OGR wide change.
On write I would anticipate the output encoding being set with
a layer creation option. Ideally this layer option could be
the same for any other driver which needs the ability to set the
encoding on export but there is no need for any implementation
beyond the shapefile driver for now.
2) Should encodings be specified as a string or an enumeration of
well-known encodings? If encoding retrieval and setting occurs only
at the shapefile driver level, then a string that mimics shapelib's
API might be sensible (if the codepage is set to "LDID/n" and -1 < n <
255 then the ldid byte of the dbf is set to the n, otherwise the whole
codepage string is written to the .CPG file). Otherwise, commonsense
would suggest a standardised enum of encodings might be the way to go.
They should be specified as strings, per RFC 23. If there is no
apparent mapping to some shapefile output encoding, we might also
want to provide an extra mechanism to specify the encodings directly
as the codes used in the .cpg or LDID field.
3) What should the API be? A patch at issue #882 creates two new
OGRLayer member functions, GetEncoding() and SetEncoding(), and a
GetEncoding() implementation for shapefiles (although it fails to
allow the encoding to be set, as far as I can see).
In my opinion there is no need for GetEncoding() and SetEncoding()
methods in the OGR API.
Is this the appropriate place to have this discussion? I would be
happy to provide a patch implementing this feature however it is
deemed most appropriate.
This is a reasonable place to have the discussion. If you can
provide code implementing RFC 23 for the shapefile driver, with some
test samples to help demonstrate that would be much appreciated. I'm
happy to have Chaitanya provide support as well.
Best regards,
--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up | Frank Warmerdam, warmer...@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush | Geospatial Programmer for Rent
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev