Re: Groff History in Git. (Was: groff in git)

G. Branden Robinson Mon, 12 Dec 2022 06:05:25 -0800

At 2022-12-12T09:06:22+0000, Ralph Corderoy wrote:
> > Eric, can reposurgeon retroactively add an earlier release to git
> > without changing all the existing git hashes (which are referenced
> > all over the place, in the bug tracker and elsewhere)?  I know
> > nothing about how these hashes are generated, so this may be utterly
> > infeasible.
> 
> A Git commit ID is effectively a hash of its ancestry so that history
> can't be changed in this case without the unwanted ripple.


I concur with Ralph's analysis.  Because each object's ID is hashed
using the ID of its predecessor as part of its input, there is no way to
insert new items into an object's history at any point, including the
origin, without altering all subsequent IDs.[1]

We could indeed rebuild groff's Git repo as proposed, but it would
invalidate _every checkout of it in the world_.  I assume Keith Marshall
would notice that.

Because the universe has an evil sense of humor, I suspect the likeliest
means of coaxing any version of groff earlier than 1.01 out of hiding is
to actually undertake this disruption, particularly if we express
confidence that it is the last, or only, time we will have to do it.

> The alternative is to have a Git repo specifically for maintaining
> historical versions, not for development, and then the commit IDs can
> be completely regenerated as new discoveries are inserted.  This is
> what Spinellis does for his
> https://github.com/dspinellis/unix-history-repo#readme

This is the more responsible approach, and would be valuable to have,
but anyone in possession of groff tarballs earlier than 1.01 may well
prove unable to locate them if doing so wouldn't cause painful
disruption.  Not on purpose, mind you, but because that's how it goes.

To live is to suffer.

Regards,
Branden

[1] ...assuming your hash algorithm is immune to collisions.  As I
    understand it, strictly speaking this is impossible--an essential
    property of a hash on unbounded inputs is to lose information,[2]
    otherwise what you have developed is a compression algorithm.  The
    property that "strong" or "one-way" hashes have is that the Hamming
    distance between any two inputs that produce the same hash is
    spectactularly large.  Intuitively, this seems like a
    straightforward property to verify for highly structured and
    human-readable inputs like natural language text or source code.
    For arbitrary binary data, perhaps less confidence is warranted.  I
    assume that strong hashes tend to greatly separate inputs that are
    similar in _size_ in, uh, "Hamming space" (you can tell I haven't
    really studied this), which, if true, might help.

[2] For a bounded set of inputs, you can indeed produce a "perfect hash
    function", and there exist tools to generate one for you.

    https://www.gnu.org/software/gperf/

signature.asc
Description: PGP signature

Re: Groff History in Git. (Was: groff in git)

Reply via email to