Benoit Chauvet <[email protected]> writes:

> We now would like to have a round of public review on the actual data
> output of the ingestion process, so we can either validate or ask for
> some more updates, and then eventually deploy it in production.

I read through some of the code, and got the impression that it no
longer stores ExtIDs for the original tarballs.

For instance, the current archive has:

https://archive.softwareheritage.org/api/1/extid/subresource-integrity/raw:sha256-hyYHPoGGG8eyMh52Jyy9vTPH4aEhU1qYJ5dyZbkDPsA%3D/

which allows me to go from a SHA-256 of a tarball to its contents in
SWH: swh:1:dir:d532d6b54a7aae43f0e20f1da49b1bc6e662e3cc.

However, this does not exist:

https://webapp.staging.swh.network/api/1/extid/subresource-integrity/raw:sha256-hyYHPoGGG8eyMh52Jyy9vTPH4aEhU1qYJ5dyZbkDPsA%3D/

even though

https://webapp.staging.swh.network/browse/directory/d532d6b54a7aae43f0e20f1da49b1bc6e662e3cc/

does.

Skimming the code (esp. ‘TarballDirectoryLoader’) it looks like nothing
holds on to the original hash anymore.  Is it still there?  If it was
omitted on purpose, that’s fine.  But if it was an accident I think it
should still be there.

(As I’ve stated elsewhere, the general thrust of the changes is awesome
and much appreciated!)


-- Tim

Reply via email to