Re: sha256sum --text generating blank spaces and hyphens?

Max Nikulin Thu, 27 Apr 2023 19:21:01 -0700

On 26/04/2023 21:33, Albretch Mueller wrote:

  a) the crazy long name
  b) its base64 representation
  c) §b's sha256sum representation which is the one used for the file
name and the log of the download.

I see no point in base64 step since sha may be calculated for originalURI directly. However an important step of URI normalization is missed:

- often http: and https: are alternatives

- domain name may contain unicode characters or be represented as pureASCII punycode- #anchors (sometimes empty #) at the end of URI usually does not changeserved content. It may be abused however by some web application toprovide content dependent of anchors. Or a web page may hide parts ofits content using CSS depending on the anchor. So its stripping maycause troubles.- Session or user activity tracking query ("search") parameters thatmust be stripped for archival purposes- Some parts of URI may be percent encoded keeping equivalence with"canonical" URI- Web page may suggest "canonical" URL, but sometimes it is a misleadinghint.


So URI comparison is not a trivial task.

Another point is that the same page may be saved multiple times, so URIhash is not enough for unique key.


On 26/04/2023 21:48, Nicolas George wrote:

OTOH, HTTP does have a place to state the type of the file, and the
extension in URLs is not reliable: if you want to do it properly, you
must set your local file extension based on the Content-Type response
header.

And you will quickly face servers that sends incorrectly Content-Type orintentionally put application/octet-stream with no sniff header to forcebrowser to save the file instead of opening it e.g. in built-in PDF reader.

Re: sha256sum --text generating blank spaces and hyphens?

Reply via email to