Please see https://jira.hdfgroup.org/browse/HDFFV-10300

And also the original mailing list thread here: 
https://forum.hdfgroup.org/t/hdf-lib-incompatible-with-hdf-file-spec/4084 
(slightly mangled in the Discourse transition).

Frustratingly, it's still not possible for outsiders to comment on bugs in 
JIRA, so I'm posting here.

Can someone give an update on what happened with this bug? It was reported in 
2017 and marked as Priority: Blocker, but there has been no activity on it 
since then.

For all I can see, @Markus  found a serious issue in the HDF5 library that 
makes it corrupt files that was not written by the HDF5 library itself. 
Possibly due to it assuming it was itself who wrote it and making too liberal 
assumptions about the physical file layout.

I had a look at the file Markus provided (`sizeoptimized.h5`), and from what I 
can see this is a valid HDF5 file. It passes checks using `h5check`:

```
[estan@newton hdf5bug]$ h5check-inst/bin/h5check -v2 sizeoptimized.h5 
VERBOSE is true:verbose # = 2

VALIDATING sizeoptimized.h5 according to library version 1.8.0 

FOUND super block signature
VALIDATING the super block at physical address 0...
Validating version 0/1 superblock...
INITIALIZING filters ...
VALIDATING the object header at logical address 96...
VALIDATING version 1 object header...
Version 1 object header encountered
VALIDATING the local heap at logical address 184...
FOUND local heap signature.
VALIDATING version 1 btree at logical address 136...
FOUND version 1 btree signature.
VALIDATING the Symbol table node at logical address 304...
FOUND Symbol table node signature.
VALIDATING the object header at logical address 432...
VALIDATING version 1 object header...
Version 1 object header encountered
VALIDATING the object header at logical address 720...
VALIDATING version 1 object header...
Version 1 object header encountered
VALIDATING the object header at logical address 992...
VALIDATING version 1 object header...
Version 1 object header encountered
No non-compliance errors found
[estan@newton hdf5bug]$
```

It's possible to list the file:

```
[estan@newton hdf5bug]$ h5ls -r sizeoptimized.h5 
/                        Group
/test1                   Dataset {10/Inf}
/test2                   Dataset {1}
/test3                   Dataset {10/Inf}
[estan@newton hdf5bug]$
```

And dump out a dataset, say `/test1`:

```
[estan@newton hdf5bug]$ h5dump -d /test1 sizeoptimized.h5 
HDF5 "sizeoptimized.h5" {
DATASET "/test1" {
   DATATYPE  H5T_COMPOUND {
      H5T_IEEE_F32LE "valuef";
      H5T_IEEE_F64LE "valued";
   }
   DATASPACE  SIMPLE { ( 10 ) / ( H5S_UNLIMITED ) }
   DATA {
   (0): {
         0,
         0
      }, {
         0,
         0
      },
   (2): {
         0,
         0
      }, {
         0,
         0
      },
   (4): {
         0,
         0
      }, {
         0,
         0
      },
   (6): {
         0,
         0
      }, {
         0,
         0
      },
   (8): {
         0,
         0
      }, {
         0,
         0
      }
   }
}
}
[estan@newton hdf5bug]$
```

I've also browsed around the file using `h5debug`, and I can't see anything 
suspicious, though the tool is not very convenient and I didn't check every 
single number.

So this file produced by @Markus embedded code looks like an OK HDF5 file.

However, run the following program on it, which simply uses the HDF5 library to 
add another compound dataset `/test4` to the file, and it gets silently 
corrupted:

```c
/*
 * Adds a /test4 compound dataset to the file given on command line.
 */
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

#include <hdf5.h>

typedef struct {
    short v1;
    float v2;
} sensor_t;

int main(int argc, char *argv[]) {
    hid_t file = H5Fopen(argv[1], H5F_ACC_RDWR, H5P_DEFAULT);
    assert(file >= 0);

    hid_t memtype = H5Tcreate(H5T_COMPOUND, sizeof(sensor_t));
    assert(H5Tinsert(memtype, "v1", HOFFSET(sensor_t, v1), H5T_NATIVE_SHORT) >= 
0);
    assert(H5Tinsert(memtype, "v2", HOFFSET(sensor_t, v2), H5T_NATIVE_FLOAT) >= 
0);

    hid_t filetype = H5Tcreate(H5T_COMPOUND, 8 + sizeof(hvl_t) + 8 + 8);
    assert(H5Tinsert(filetype, "v1", 0, H5T_STD_I16LE) >= 0);
    assert(H5Tinsert(filetype, "v2", 2, H5T_IEEE_F32LE) >= 0);

    hsize_t dims[1] = {1};
    hsize_t max_dims[1] = {H5S_UNLIMITED};
    hid_t space = H5Screate_simple(1, dims, max_dims);
    assert(space >= 0);

    hid_t dcpl = H5Pcreate(H5P_DATASET_CREATE);
    assert(dcpl >= 0);
    hsize_t chunk[1] = {6};
    assert(H5Pset_chunk(dcpl, 1, chunk) >= 0);

    hid_t dset = H5Dcreate(file, "/test4", filetype, space, H5P_DEFAULT, dcpl, 
H5P_DEFAULT);
    assert(dset >= 0);

    sensor_t data[1];
    data[0].v1 = 1;
    data[0].v2 = 2.0;
    assert(H5Dwrite(dset, memtype, H5S_ALL, H5S_ALL, H5P_DEFAULT, data) >= 0);

    assert(H5Dclose(dset) >= 0);
    assert(H5Sclose(space) >= 0);
    assert(H5Tclose(filetype) >= 0);
    assert(H5Fclose(file) >= 0);

    return 0;
}
```

```
[estan@newton hdf5bug]$ gcc -Lhdf5-inst/lib -o add_dataset -Ihdf5-inst/include 
add_dataset.c -lhdf5
[estan@newton hdf5bug]$ ./add_dataset sizeoptimized.h5 
[estan@newton hdf5bug]$ h5check-inst/bin/h5check -v2 sizeoptimized.h5   
VERBOSE is true:verbose # = 2

VALIDATING sizeoptimized.h5 according to library version 1.8.0 

FOUND super block signature
VALIDATING the super block at physical address 0...
Validating version 0/1 superblock...
INITIALIZING filters ...
VALIDATING the object header at logical address 96...
VALIDATING version 1 object header...
Version 1 object header encountered
VALIDATING the local heap at logical address 184...
FOUND local heap signature.
VALIDATING version 1 btree at logical address 136...
FOUND version 1 btree signature.
VALIDATING the Symbol table node at logical address 304...
FOUND Symbol table node signature.
VALIDATING the object header at logical address 432...
VALIDATING version 1 object header...
***Error***
Object Header:corrupt object header at addr 681
Object Header:corrupt object header at addr 674
Object Header:corrupt object header at addr 667
Object Header:corrupt object header at addr 660
Object Header:corrupt object header at addr 653
Object Header:corrupt object header at addr 646
Object Header:corrupt object header at addr 639
Object Header:corrupt object header at addr 632
Object Header:corrupt object header at addr 625
Object Header:corrupt object header at addr 618
Object Header:corrupt object header at addr 611
Object Header:corrupt object header at addr 604
Object Header:corrupt object header at addr 597
Object Header:corrupt object header at addr 590
Object Header:corrupt object header at addr 583
Object Header:corrupt object header at addr 576
Object Header:corrupt object header at addr 569
Object Header:corrupt object header at addr 562
Object Header:corrupt object header at addr 555
Object Header:corrupt object header at addr 548
Object Header:corrupt object header at addr 541
Object Header:corrupt object header at addr 534
Object Header:corrupt object header at addr 527
Object Header:corrupt object header at addr 520
Object Header:corrupt object header at addr 513
Object Header:corrupt object header at addr 506
Object Header:corrupt object header at addr 499
Object Header:corrupt object header at addr 492
Object Header:corrupt object header at addr 485
Object Header:corrupt object header at addr 478
Object Header:corrupt object header at addr 471
Version 1 Object Header:Bad version number at addr 432; Value decoded: 32
***End of Error messages***
***Error***
Errors found when decoding message at addr 1208
Dataspace Message v.1:Corrupt flags at addr 1210
***End of Error messages***
VALIDATING the object header at logical address 720...
VALIDATING version 1 object header...
Version 1 object header encountered
VALIDATING the object header at logical address 992...
VALIDATING version 1 object header...
Version 1 object header encountered
VALIDATING the object header at logical address 1280...
VALIDATING version 1 object header...
Version 1 object header encountered
VALIDATING version 1 btree at logical address 1552...
FOUND version 1 btree signature.
Non-compliance errors found
[estan@newton hdf5bug]$
```

The above corruption does not happen if this program is run against a file that 
was produced by the HDF5 library itself.

(In his own testing @Markus used HDFView, but the above program is the minimal 
equivalent)

Could someone from the HDF5 group please look at this bug? The last comment in 
JIRA mentioned that this would be brought up in a SE meeting on 16 October 
2017, but after that it has been silent.

In my tests above, I was using HDF5 1.10.5 and h5check 2.0.1, both compiled 
from Git.

I'm suprised that this issue has not been given more attention, since it's a 
data loss bug. I would say that it prevents people from implementing their own 
HDF5 writers, since they now have to fear that what they write will be 
destroyed if the file is later extended using the offical HDF5 library.

@Barbara_Jones @epourmal





---
[Visit 
Topic](https://forum.hdfgroup.org/t/requesting-update-on-blocker-bug-hdffv-10300/5869/1)
 or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://forum.hdfgroup.org/email/unsubscribe/a17cbb9c59cee77b286e09d5a43dd49c065bb503290488cfb5bdc98bd002baef).

Reply via email to