Perhaps use the digest package? Isn't "R the R packages?" > On May 1, 2020, at 2:00 PM, Dénes Tóth <toth.de...@kogentum.hu> wrote: > > > AFAIK there is no hashing utility in base R which can create hash digests of > arbitrary R objects. However, as also described by Henrik Bengtsson in [1], > we have tools::md5sum() which calculates MD5 hashes of files. Calculating > hashes of in-memory objects is a very common task in several areas, as > demonstrated by the popularity of the 'digest' package (~850.000 > downloads/month). > > Upon the inspection of the relevant files in the R-source (e.g., [2] and > [3]), it seems all building blocks have already been implemented so that > hashing should not be restricted to files. I would like to ask: > > 1) Why is md5_buffer unused?: > In src/library/tools/src/md5.c [see 2], md5_buffer is implemented which seems > to be the counterpart of md5_stream for non-file inputs: > > --- > #ifdef UNUSED > /* Compute MD5 message digest for LEN bytes beginning at BUFFER. The > result is always in little endian byte order, so that a byte-wise > output yields to the wanted ASCII representation of the message > digest. */ > static void * > md5_buffer (const char *buffer, size_t len, void *resblock) > { > struct md5_ctx ctx; > > /* Initialize the computation context. */ > md5_init_ctx (&ctx); > > /* Process whole buffer but last len % 64 bytes. */ > md5_process_bytes (buffer, len, &ctx); > > /* Put result in desired memory area. */ > return md5_finish_ctx (&ctx, resblock); > } > #endif > --- > > 2) How can the R-community help so that this feature becomes available in > package 'tools'? > > Suggestions: > As a first step, it would be great if tools::md5sum would support connections > (credit goes to Henrik for the idea). E.g., instead of the signature > tools::md5sum(files), we could have tools::md5sum(files, conn = NULL), which > would allow: > > x <- runif(10) > tools::md5sum(conn = rawConnection(serialize(x, NULL))) > > To avoid the inconsistency between 'files' (which computes the hash digests > in a vectorized manner, that is, one for each file) and 'conn' (which expects > a single connection), and to make it easier to extend the hashing for other > algorithms without changing the main R interface, a more involved solution > would be to introduce tools::hash and tools::hashes, in a similar vein to > digest::digest and digest::getVDigest. > > Regards, > Denes > > > [1]: https://github.com/HenrikBengtsson/Wishlist-for-R/issues/21 > [2]: > https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/library/tools/src/md5.c#L172 > [3]: > https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/library/tools/src/Rmd5.c#L27 > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
--------------- John Mount http://www.win-vector.com/ <http://www.win-vector.com/> Our book: Practical Data Science with R http://practicaldatascience.com <http://practicaldatascience.com/> [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel