> > We have one in vctrs but it's not exported: > https://github.com/r-lib/vctrs/blob/main/src/hash.c > > The main use is vectorised hashing: >
Thanks for showing me this function. I have read the source code. That's a great idea. However, I think I might have missed something. When I tried vctrs::obj_hash, I couldn't get identical outputs. ``` r options(keep.source = TRUE) a <- function(){} vctrs:::obj_hash(a) #> [1] 68 e8 5a 0c a <- function(){} vctrs:::obj_hash(a) #> [1] b2 6a 55 9c a <- function(){} vctrs:::obj_hash(a) #> [1] 01 a9 bc 30 options(keep.source = FALSE) a <- function(){} vctrs:::obj_hash(a) #> [1] 93 d7 f2 72 a <- function(){} vctrs:::obj_hash(a) #> [1] f3 1d d2 f4 ``` Created on 2024-01-17 with [reprex v2.1.0](https://reprex.tidyverse.org) > > Best, > Lionel > > On Wed, Jan 17, 2024 at 10:32 AM Tomas Kalibera > <tomas.kalib...@gmail.com> wrote: >> >> I think one could implement hashing on the fly without any >> serialization, similarly to how identical works, but I am not aware of >> any existing implementation. Again, if that wasn't clear: I don't think >> trying to compute a hash of an object from its serialized representation >> is a good idea - it is of course convenient, but has problems like the >> one you have ran into. >> >> In some applications it may still be good enough: if by various tweaks, >> such as ensuring source references are off in your case, you achieve a >> state when false alarms are rare (identical objects have different >> hashes), and hence say unnecessary re-computation is rare, maybe it is >> good enough. I really appreciate you answer my questions and solve my puzzles. I went back and read the R internal code for `serialize` and totally agree on this, that serialization is not a good idea for digesting R objects, especially on environments, expressions, and functions. What I want is a function that can produce the same and stable hash for identical objects. However, there is no function (given our best knowledge) on the market that can do this. `digest::digest` and `rlang::hash` are the first functions that come into my mind. Both are widely used, but they use serialize. The author of `digest` said: > "As you know, digest takes and (ahem) "digests" what serialize gives it, so you would have to look into what serialize lets you do." vctrs:::obj_hash is probably the closest to the implementation of `identical`, but the above examples give different results for identical objects. The existence of digest:: digest and rlang::hash shows that there is a huge demand for this "ideal" hash function. However, I bet most people are using digest/hash "incorrectly". >> >> Tomas >> [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel