Hi Julien These improvements look really great!
Unfortunately I am on a long-term vacation and semi-retired from my GIS work. So I am sorry but I cannot really help on testing and committing the work... That is why I did not respond sooner. As the last "official" netcdf maintainer I should at least give some comments. Perhaps Even can test the patches and commit them, or someone could step in as a new netcdf maintainer. If it is possible, It would be best to separate these improvements into a number of patches. In any case you should add your patch(es) to a number of new gdal trac ticket(s) (or existing ticket if it fixes a bug). I did significant improvements some time ago and I understand that it is hard to separate these improvements in several patches, so I don't think it is absolutely necessary to split them up - but it helps debugging any regressions. Especially the groups support, I/O improvements and NASA products should have independent patches and tickets. Regarding 10) it would be best to maintain backwards compatibility, but if the issue is only with 1D dataset I don't think it matters that much. Cheers, Etienne On Thu, Feb 5, 2015 at 11:10 PM, Julien Demaria <julien.dema...@acri-st.fr> wrote: > Hi GDAL team, > > I've implemented several improvements to the NetCDF driver and I would > like to provide them to the community. > Main goal of the changes is to add full support of NetCDF-4 including > groups. > NetCDF-4 is the future format of ESA Sentinel-3 products (no groups) and > NASA Ocean Color team is switching their L2/L3 products to NetCDF-4 with > groups (VIIRS has already switched to the new format in December). > With the changes NASA L2 products geolocation is automatically handled as > geolocation arrays and can be reprojected using gdalwarp. > > I validated with autotests that nothing is broken in tests netcdf.py > (excepting test 13 but see my point 5), netcdf_cf.py and hdf5.py, using > NetCDF-3 and 4 libraries. > I've also tested the new functionalities on various NetCDF-4 files. > I think the only possible regression could be for marginal cases where a > file was seen directly as a dataset and is now seen as multiple subdatasets > (for example if a file has only one var in the top group and has nested > groups containing variables), but I think this is not very common. > > For the moment I have all these changes in local GIT separated commits on > the latest gdal-1.11 branch, let me know what changes you want and how can > I provide them. > > Changes : > > 1) Implement full support for NetCDF-4 groups on reading: > - explore recursively all nested groups to create the subdatasets list > - subdatasets in nested groups use the /group1/group2/.../groupn/var > standard > NetCDF-4 convention, excepting for variables in the root group which > do not > have a leading slash for backward compatibility > - when accessing a subdataset using NETCDF:$file:$path, the leading > slash is optional > - global attributes of each nested group are also collected in the > GDAL dataset > metadata, using the same convention > /group1/group2/.../groupn/NC_GLOBAL#attr_name, > excepting for the root group which do not have a leading slash for > backward compatibility > - when searching for a variable containing auxiliary information on > the selected subdataset, > like coordinate variables or grid_mapping, we now also search in > parent groups (using NCDFResolveVar). > I now this is something not specified at this time in the CF > convention because CF does not know groups, > but it seems logical to me to support this: NetCDF-4 specifies that > dimensions of a group are > shared to its nested groups, so associated coordinate variables > could be defined as the same level of its > corresponding dimension. > - reference to coordinate variables using the "coordinates" attribute > support now also absolute paths, > this allow for example to specify coordinate variables located > outside the group of the selected variable > or its parents. Relative paths could be implemented if needed. > This feature is used to add support for new NASA Ocean Color L2 > products. > > 2) Implement full read/write support for new NetCDF4 types NC_UBYTE, > NC_USHORT, NC_UINT and NC_STRING, only if NETCDF_HAS_NC4 is defined (and > only if format=NC4 for writing). > Support implemented for variables and attributes. > NC_STRING type is supported for reading (scalar and arrays) attributes > and is used for writing only for array attributes (scalar are still written > as NC_CHAR). > If NETCDF_HAS_NC4 is not defined or format!=NC4, NC_STRING array > attributes are written as a single NC_CHAR string using the GDAL > {v1,v2,...} convention. > Add missing support for NC_BYTE in CreateBandMetadata() and > NC_BYTE/SHORT in NCDFPut1DVar(). > > 3) Add support for new NASA Ocean Color L2 products and ESA Sentinel-3 L1 > or > L2 products which use the NetCDF-4 format (with groups for NASA, see > http://oceancolor.gsfc.nasa.gov/DOCS/FormatChange.html): > - NASA products: simulate a "coordinates" variable attribute to detect > CF > geolocation arrays, and set bBottomUp to FALSE > - ESA products: set bBottomUp to FALSE and disable warning on missing > Conventions attribute > > 4) Fix bug #4554 with a more generic solution by disabling the > installation of the HDF5 atexit() cleanup routine using H5dont_atexit(). > Previous fix was to call GDALExit() (for the moment only defined > gdalwarp.cpp) at the end of every program, which is more painful. > > 5) Fix implementation of GetScale/Offset to not always return > pbSuccess=TRUE. > Fix CopyMetadata to handle bands with only scale or offset. > ==> WARNING this commit breaks the autotest netcdf_13 (check for > scale/offset = 1.0/0.0 if no scale or offset is available), but for me it > is not logical to return always pbSuccess=TRUE > > 6) Optimize IReadBlock() and CheckData() handling of partial blocks in the > x axis by re-using the GDAL block buffer instead of allocating a new > temporary buffer for each block. > > 7) Force block size to 1 scanline for bottom-up datasets if nBlockYSize != > 1 instead of raising a fatal error > ==> Solve a recent problem raised on the mailing list > > 8) Implement Get/SetUnitType using the standard "units" NetCDF attribute > > 9) Change default block size to 256x256 instead of scanline (only affect > file without NetCDF chunking) > ==> because I think this is better for a random access to the data, > but I'm not sure if the community want this change which could impact > performances > > 10) I've also implemented for my needs support for 1D variables by > simulating 2D datasets with only one row (dimensionless variables are not > supported for the moment), > but this breaks backward compatibility because files containing > only one variable and associated 1D coordinate variables are now seen as > multiple sub datasets... > and maybe this is not the goal of GDAL to give access to > not-2D-raster variables (but sometimes it's useful ;-) ) > > Thanks for GDAL! > > Julien > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > http://lists.osgeo.org/mailman/listinfo/gdal-dev >
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev