At 2022-12-12T09:06:22+0000, Ralph Corderoy wrote: > > Eric, can reposurgeon retroactively add an earlier release to git > > without changing all the existing git hashes (which are referenced > > all over the place, in the bug tracker and elsewhere)? I know > > nothing about how these hashes are generated, so this may be utterly > > infeasible. > > A Git commit ID is effectively a hash of its ancestry so that history > can't be changed in this case without the unwanted ripple.
I concur with Ralph's analysis. Because each object's ID is hashed using the ID of its predecessor as part of its input, there is no way to insert new items into an object's history at any point, including the origin, without altering all subsequent IDs.[1] We could indeed rebuild groff's Git repo as proposed, but it would invalidate _every checkout of it in the world_. I assume Keith Marshall would notice that. Because the universe has an evil sense of humor, I suspect the likeliest means of coaxing any version of groff earlier than 1.01 out of hiding is to actually undertake this disruption, particularly if we express confidence that it is the last, or only, time we will have to do it. > The alternative is to have a Git repo specifically for maintaining > historical versions, not for development, and then the commit IDs can > be completely regenerated as new discoveries are inserted. This is > what Spinellis does for his > https://github.com/dspinellis/unix-history-repo#readme This is the more responsible approach, and would be valuable to have, but anyone in possession of groff tarballs earlier than 1.01 may well prove unable to locate them if doing so wouldn't cause painful disruption. Not on purpose, mind you, but because that's how it goes. To live is to suffer. Regards, Branden [1] ...assuming your hash algorithm is immune to collisions. As I understand it, strictly speaking this is impossible--an essential property of a hash on unbounded inputs is to lose information,[2] otherwise what you have developed is a compression algorithm. The property that "strong" or "one-way" hashes have is that the Hamming distance between any two inputs that produce the same hash is spectactularly large. Intuitively, this seems like a straightforward property to verify for highly structured and human-readable inputs like natural language text or source code. For arbitrary binary data, perhaps less confidence is warranted. I assume that strong hashes tend to greatly separate inputs that are similar in _size_ in, uh, "Hamming space" (you can tell I haven't really studied this), which, if true, might help. [2] For a bounded set of inputs, you can indeed produce a "perfect hash function", and there exist tools to generate one for you. https://www.gnu.org/software/gperf/
signature.asc
Description: PGP signature