> Check that *both* sides use rsync v3; it uses an incremental list
> whereas v<=2 transfer a list of all files at once.

Yep; both sides are running recent -CURRENT snapshots (where recent
is less than or equal to a month old), and both are running rsync
v3.0.7.

> There are various memory limits on i386 that you won't be able to avoid.
> It might be informative to see how large the process grows before failing
> (e.g. watch with top -s1 -grsync).

I added a cron job to do just this on both end points last night, and
managed to catch where it failed.

This is from the remote host (the one being backed up), each line is
1 second:

1823 root       2    0   18M 2508K sleep/0   select    0:01  1.03% rsync
1823 root       2    0   17M 2248K sleep/1   select    0:01  0.98% rsync
1823 root       2    0 4100K 1456K sleep/1   select    0:01  0.98% rsync
1823 root       2    0 4100K 1456K sleep/1   select    0:01  0.93% rsync
1823 root       2    0 4100K 1456K sleep/1   select    0:01  0.88% rsync
1823 root       2    0 4100K 1456K sleep/1   select    0:01  0.83% rsync
1823 root       2    0 4100K 1456K sleep/1   select    0:01  0.78% rsync
1823 root       2    0 4100K 1456K sleep/1   select    0:01  0.73% rsync
1823 root       2    0 4100K 1456K sleep/1   select    0:01  0.68% rsync
1823 root       2    0 4100K 1456K sleep/1   select    0:01  0.63% rsync
1823 root       2    0 4100K 1456K sleep/1   select    0:01  0.59% rsync
1823 root       2    0 4100K 1456K sleep/1   select    0:01  0.54% rsync
1823 root       2    0 4100K 1456K sleep/1   select    0:01  0.49% rsync
1823 root       2    0 2052K 1364K sleep/1   select    0:01  0.44% rsync

.. and that's where it died.


This is from the backup host (the one that's initiating the rsync
connection, and sucking the files back to).  The top command showed
*two* rsync processes running, so each *pair* of lines is 1 second:


31881 root       2    0   18M 1892K sleep     select    0:00  0.20% rsync
24378 root       2    0 3992K 1224K idle      select    0:00  0.00% rsync

31881 root       2    0   18M 1892K sleep     select    0:00  0.15% rsync
24378 root       2    0 3992K 1224K idle      select    0:00  0.00% rsync

31881 root       2    0   18M 1892K sleep     select    0:00  0.10% rsync
24378 root       2    0 3992K 1224K idle      select    0:00  0.00% rsync

31881 root       2    0   18M 1892K sleep     select    0:00  0.05% rsync
24378 root       2    0 3992K 1224K idle      select    0:00  0.00% rsync

31881 root       2    0   18M 1892K sleep     select    0:00  0.00% rsync
24378 root       2    0 3992K 1224K idle      select    0:00  0.00% rsync

31881 root       2    0   18M 1892K sleep     select    0:01  0.00% rsync
24378 root       2    0 3992K 1224K idle      select    0:00  0.00% rsync

31881 root       2    0   18M 1892K sleep     select    0:01  0.00% rsync
24378 root       2    0 1944K 1120K sleep     select    0:00  0.00% rsync

30531 root       2    0  664K  876K sleep     select    0:00  0.00% rsync

.. and that's where one of them died.

The rsnapshot log shows that rsync dies with an error code of 22,
which according to the man page is:

22     Error allocating core memory buffers

I've bumped up the logging levels on rsnapshot for tonight's run,
but I'd really appreciate any help with this...  It appears to be
the same directory each night which is puzzling, because it's not
particularly large or with a large number of files or anything
like that...

Thanks folks!

Benny


-- 
"Show me on the doll where the marketing touched you."
                               -- "Mally" on Fazed.net


Reply via email to