Benoit Chauvet <[email protected]> writes: > We now would like to have a round of public review on the actual data > output of the ingestion process, so we can either validate or ask for > some more updates, and then eventually deploy it in production.
I read through some of the code, and got the impression that it no longer stores ExtIDs for the original tarballs. For instance, the current archive has: https://archive.softwareheritage.org/api/1/extid/subresource-integrity/raw:sha256-hyYHPoGGG8eyMh52Jyy9vTPH4aEhU1qYJ5dyZbkDPsA%3D/ which allows me to go from a SHA-256 of a tarball to its contents in SWH: swh:1:dir:d532d6b54a7aae43f0e20f1da49b1bc6e662e3cc. However, this does not exist: https://webapp.staging.swh.network/api/1/extid/subresource-integrity/raw:sha256-hyYHPoGGG8eyMh52Jyy9vTPH4aEhU1qYJ5dyZbkDPsA%3D/ even though https://webapp.staging.swh.network/browse/directory/d532d6b54a7aae43f0e20f1da49b1bc6e662e3cc/ does. Skimming the code (esp. ‘TarballDirectoryLoader’) it looks like nothing holds on to the original hash anymore. Is it still there? If it was omitted on purpose, that’s fine. But if it was an accident I think it should still be there. (As I’ve stated elsewhere, the general thrust of the changes is awesome and much appreciated!) -- Tim
