ok cool, good info,
I was just digging into it again, and the date I switched to the
snapshot was recorded as Feb 1st. Here is a list of the mirror_metadata
files leading up to that:
-rw------- 1 root root 2.7M Jan 21 05:25
mirror_metadata.2022-01-21T05:20:05-09:00.snapshot.gz
-rw------- 1 root root 632 Jan 23 05:25
mirror_metadata.2022-01-22T05:20:26-09:00.diff.gz
-rw------- 1 root root 790 Jan 24 05:26
mirror_metadata.2022-01-23T05:20:04-09:00.diff.gz
-rw------- 1 root root 783 Jan 25 05:24
mirror_metadata.2022-01-24T05:20:33-09:00.diff.gz
-rw------- 1 root root 778 Jan 26 05:29
mirror_metadata.2022-01-25T05:19:31-09:00.diff.gz
-rw------- 1 root root 731 Jan 27 05:25
mirror_metadata.2022-01-26T05:23:21-09:00.diff.gz
-rw------- 1 root root 723 Jan 28 05:27
mirror_metadata.2022-01-27T05:20:37-09:00.diff.gz
-rw------- 1 root root 786 Jan 29 05:29
mirror_metadata.2022-01-28T05:21:17-09:00.diff.gz
-rw------- 1 root root 772 Jan 30 05:26
mirror_metadata.2022-01-29T05:23:55-09:00.diff.gz
-rw------- 1 root root 2.7M Jan 30 05:26
mirror_metadata.2022-01-30T05:20:43-09:00.snapshot.gz
-rw------- 1 root root 725 Feb 1 05:26
mirror_metadata.2022-01-31T05:21:21-09:00.diff.gz
-rw------- 1 root root 2.6M Feb 3 15:33
mirror_metadata.2022-02-01T05:20:43-09:00.diff.gz
-rw------- 1 root root 613 Feb 4 05:16
mirror_metadata.2022-02-03T14:20:54-09:00.diff.gz
-rw------- 1 root root 1.7K Feb 5 05:17
mirror_metadata.2022-02-04T05:13:29-09:00.diff.gz
-rw------- 1 root root 852 Feb 6 05:55
mirror_metadata.2022-02-05T05:14:57-09:00.diff.gz
-rw------- 1 root root 1.7K Feb 7 06:36
mirror_metadata.2022-02-06T05:52:59-09:00.diff.gz
-rw------- 1 root root 73K Feb 8 05:39
mirror_metadata.2022-02-07T06:33:04-09:00.diff.gz
-rw------- 1 root root 2.7M Feb 8 05:39
mirror_metadata.2022-02-08T05:33:08-09:00.snapshot.gz
You will see that the mirror_metadata.2022-02-01T05:20:43-09:00.diff.gz
with the modified date of Feb 3rd is about the same size as the previous
snapshot file a couple of days before.
If you grep for the lines that match "^File" then I presume you get a
good count of the number of files that changed, or at least recorded for
some reason. Here are those stats:
find increments -name "*2022-02-01*" -exec ls -lh {} \; | wc
85287 767583 11064660
gzip -dc mirror_metadata.2022-01-30T05:20:43-09:00.snapshot.gz | egrep
"^File " | wc
89287 178574 4535737
gzip -dc mirror_metadata.2022-02-01T05:20:43-09:00.diff.gz | egrep
"^File " | wc
85288 170576 4374253
Notice how the number of files with that date in the name, (the first wc
output) is almost the same as the number of files listed in the diff.gz
file on the last wc call for the diff.gz file.
I also compared some of the entries in the snapshot file to the diff.gz
file, and never found any differences. Of course I only checked a dozen
or two. Here are a couple:
File bin/bash
Type reg
Size 1168776
SHA1Digest 0533efae0065e72c1d833b9f7a678a20995bd5a6
ModTime 1555560756
Uid 0
Uname root
Gid 0
Gname root
Permissions 493
File bin/bunzip2
Type reg
Size 38984
NumHardLinks 3
Inode 131113
DeviceLoc 64798
SHA1Digest 6e86f2cf232ab7becc73013d2bc743f8668e2536
ModTime 1562786272
Uid 0
Uname root
Gid 0
Gname root
Permissions 493
File bin/bzcat
Type reg
Size 38984
NumHardLinks 3
Inode 131113
DeviceLoc 64798
ModTime 1562786272
Uid 0
Uname root
Gid 0
Gname root
Permissions 493
[...]
File usr/share/mime/application/x-doom-wad.xml
Type reg
Size 1663
SHA1Digest 11509c6e2a188657e2e53edfd9796d1b1efdddf6
ModTime 1616737565
Uid 0
Uname root
Gid 0
Gname root
Permissions 420
File usr/share/mime/application/x-dvi.xml
Type reg
Size 3079
SHA1Digest abfc865aba5c7cec58075b577463c92ec96c9ad8
ModTime 1616737592
Uid 0
Uname root
Gid 0
Gname root
Permissions 420
File usr/share/mime/application/x-e-theme.xml
Type reg
Size 3230
SHA1Digest 6d990accf51c683bbe4eb2f131380a7084e7bb04
ModTime 1616737594
Uid 0
Uname root
Gid 0
Gname root
Permissions 420
File usr/share/mime/application/x-egon.xml
Type reg
Size 3414
SHA1Digest 323298d7d9d47cd4c544074107e6c5612915aa3f
ModTime 1616737567
Uid 0
Uname root
Gid 0
Gname root
Permissions 420
File usr/share/mime/application/x-executable.xml
Type reg
Size 2745
SHA1Digest 53135426082c05de56084b5923ce7da105cba309
ModTime 1616737570
Uid 0
Uname root
Gid 0
Gname root
Permissions 420
Anyway, why do we have 85,288 files listed that apparently didn't
change? Is there another part of the puzzle that I haven't looked at yet?
Thanks for your help,
Clif
On 2/8/22 9:40 AM, Robert Nichols wrote:
On 2/8/22 1:05 AM, Mr. Clif wrote:
Hey folks,
thanks for the feedback. :-) More comments below...
On 2/7/22 8:25 PM, Robert Nichols wrote:
On 2/7/22 7:23 PM, Leland Best wrote:
Hi Cliff,
On Mon, 2022-02-07 at 11:45 -0800, Mr. Clif wrote:
Hey Eric,
any ideas on this? How do these diff files normally work?
[...]
I'm not an 'rdiff-backup' developer or anything so all you experts
out there
correct me if I'm wrong but ...
IIRC 'rdiff-backup' keeps inode info as part of the metadata for
each file.
When you mount a filesystem Linux assigns "fake" inode numbers to
avoid
collisions between filesystems on different
devices/partitions/etc.. So if you
change the mount point, every file could potentially get a new
inode number and,
consequently, have changed metadata. That results in
'rdiff-backup' creating a
'*.diff*' file for every source file.
Device and inode metadata is kept only for files with multiple hard
links. That's
to keep track of which links reference the same file. That
information is not
needed for files with just a single hard link, and unless something
has changed
in the latest release that metadata is not kept. You can look in the
mirror_metadata file (it's compressed ASCII) and see what fields are
present
for each file.
Cool, these are the diff.gz files? I tried ungzipping them but the
first "line" of data still seems to be binary. Is it encoded somehow?
No, I'm talking about the files named "mirror_metadata..." in the
rdiff-backup-data
directory itself. Those are gzip-ed ASCII files that hold the
principal metadata
for every file in the mirror. The most recent will have a name that
ends in
".snapshot.gz". The one for the previous backup date will most likely
have a
name ending in ".diff.gz", but is also a gzip-ed ASCII file that
contains the
metadata for every file that was somehow different then than it is in the
latest backup. You can look at those files and see what was somehow
"different".