Dear all, In a previous mail I noted that the current file-hashing of images is a disaster when it comes to video support. Moreover, the complexity of the hashing-code caused me quite some headache and the current implementation is buggy (different hashes for original and local filename). Instead of coming up with a scheme for videos (e.g. hash 1 MB in the middle of the file if it exceeds a certain size) and fixing those bugs, I wonder if the whole hashing thing is necessary at all.
AFAIK, the reason for the hashes is twofold: 1) Notice when images have changed to recalculate their thumbnails. 2) Use it to find moved images, i.e. if the log is transported to a different computer. The first case never worked, since in routine operation the images are not rehashed. This functionality is now instead provided by PR#1336. The second case is questionable, because users might have edited their files without changing the filename. To me this seems to be a more likely case than pictures getting renamed. The only reason of the hash seems therefore to protect from equally-named images. This can be circumvented by not only checking the filename, but also the names of the parent directories. I implemented a proof-of-concept in PR#1349. In principle, it does two things: 1) Replace all the hash-to-filename associations by a simple canonical_filename->local_filename associative array. 2) Find moved images based on file-paths. The way this works is by scoring the match between the file-paths: the higher the number of matching path-items (starting from the filename up to the first miss), the higher the score. Note that a significant part of the PR is actually the conversion of the old associations to the simplified ones. Apart from this, the final code is distinctly less complex than the original one. It can/should of course still be improved. For example, we could improve the heuristics by remembering image meta-data. And certainly, the user should be presented the list of new associations, before actually applying them. But before I continue to work on this (as probably all of us, I have only limited time) there needs to be a decision made on whether this is the correct path forward and what are the must-features (at least some sort of user interaction me thinks). If we chose to go this way, there is at least one additional implementation detail to discuss: To emulate the old behavior, *all* pictures that we ever encountered are matched. If a log is opened, all pictures in the log are remembered in the canonical_filename->local_filename associative array. I wonder if it would not be more sensible to match only the pictures of the currently opened log (or even only selected dives?). Thus we would only have to remember those images where canonical and local filenames differ. Thanks, Berthold _______________________________________________ subsurface mailing list [email protected] http://lists.subsurface-divelog.org/cgi-bin/mailman/listinfo/subsurface
