On Fri, 7 Sep 2018 at 22:24, Ben Peart <peart...@gmail.com> wrote:
> > Ben Peart <benpe...@microsoft.com> writes:

> >> - 160-bit SHA-1 over the extension types and their sizes (but not
> >> their contents).  E.g. if we have "TREE" extension that is N-bytes
> >> long, "REUC" extension that is M-bytes long, followed by "EOIE",
> >> then the hash would be:

> The purpose of the SHA isn't to detect disk corruption (we already have
> a SHA for the entire index that can serve that purpose) but to help
> ensure that this was actually a valid EOIE extension and not a lucky
> random set of bytes. [...]

> >> +#define EOIE_SIZE 24 /* <4-byte offset> + <20-byte hash> */

> >> +    the_hash_algo->init_fn(&c);
> >> +    while (src_offset < mmap_size - the_hash_algo->rawsz - 
> >> EOIE_SIZE_WITH_HEADER) {
[...]
> >> +    the_hash_algo->final_fn(hash, &c);
> >> +    if (hashcmp(hash, (unsigned char *)index))
> >> +            return 0;
> >> +
> >> +    /* Validate that the extension offsets returned us back to the eoie 
> >> extension. */
> >> +    if (src_offset != mmap_size - the_hash_algo->rawsz - 
> >> EOIE_SIZE_WITH_HEADER)
> >> +            return 0;

Besides the issue you and Junio discussed with "should we document this
as being SHA-1 or NewHash" (or "the hash algo"), it seems to me that
this implementation is living somewhere between using SHA-1 and "the
hash algo". The hashing uses `the_hash_algo`, but the hash size is
hardcoded at 20 bytes.

Maybe it all works out, e.g., so that when someone (brian) merges a
NewHash and runs the testsuite, this will fail consistently and in a
safe way. But I wonder if it would be too hard to avoid the hardcoded 24
already now.

Martin

Reply via email to