Re: [Bacula-users] Block checksum mismatch but Termination: Backup OK?

Ralf Gross Wed, 08 Aug 2007 03:31:33 -0700

Arno Lehmann schrieb:
> > while looking for my verify problems, I noticed that my full backup had 
> > errors
> > at the end of the first tape but terminated with status OK. Therefor I 
> > noticed
> > this error very late.
> 
> But you actually kept and read the system log! Congratulations ;-)


Yeah, but only after opening a bug report regarding the verify errors.
 
> > /var/log/kern.log
> > Aug  5 06:00:29 VU0EM005 kernel: st0: Current: sense key: Aborted Command
> > Aug  5 06:00:29 VU0EM005 kernel: Additional sense: Data phase error
> > 
> > 
> > 05-Aug 02:13 VU0EM005: Start Backup JobId 416, 
> > Job=VU0EM003.2007-08-05_00.05.01
> > 05-Aug 02:13 VU0EM005: Spooling data ...
> > VU0EM003:      /home is a different filesystem. Will not descend from / 
> > into /home
> > VU0EM003:      /public is a different filesystem. Will not descend from / 
> > into /public
> > VU0EM003:      /lib/init/rw is a different filesystem. Will not descend 
> > from / into /lib/init/rw
> > VU0EM003:      /server is a different filesystem. Will not descend from / 
> > into /server
> > 05-Aug 04:44 VU0EM005: User specified spool size reached.
> > 05-Aug 04:44 VU0EM005: Writing spooled data to Volume. Despooling 
> > 450,971,582,471 bytes ...
> > 05-Aug 06:00 VU0EM005: VU0EM003.2007-08-05_00.05.01 Error: block.c:569 
> > Write error at 405:11595 on device "LTO3" (/dev/ULTRIUM-TD3). 
> > ERR=Eingabe-/Ausgabefehler.
> > 05-Aug 06:00 VU0EM005: VU0EM003.2007-08-05_00.05.01 Error: block.c:317 
> > Volume data error at 405:4294967295!
> > Block checksum mismatch in block=5235094 len=64512: calc=2a90e8e7 
> > blk=20b379e5
> > 05-Aug 06:00 VU0EM005: VU0EM003.2007-08-05_00.05.01 Error: Re-read last 
> > block at EOT failed. ERR=block.c:317 Volume data error at 405:4294967295!
> > Block checksum mismatch in block=5235094 len=64512: calc=2a90e8e7 
> > blk=20b379e5
> > 05-Aug 06:00 VU0EM005: End of medium on Volume "06D128L3" 
> > Bytes=404,723,386,368 Blocks=6,273,613 at 05-Aug-2007 06:00.
> 
> Ok, this is unverified by me, but I *believe* that in these cases 
> Bacul does the right thing, i.e. it writes the block in question to 
> the next tape again.
> 
> At least I hope this is the point of doing the re-read of the block 
> where an error occured...
> 
> And it would explain why this is not an error from the job's point of 
> view.

Hm, I still *hope* something was wrong with the backup, because if
not, I don't know why the verify and restore jobs both failed.
 

> > 05-Aug 06:02 VU0EM005: Using Volume "06D138L3" from 'Scratch' pool.
> > 05-Aug 06:02 VU0EM005: 3307 Issuing autochanger "unload slot 9, drive 0" 
> > command.
> > 05-Aug 06:03 VU0EM005: 3304 Issuing autochanger "load slot 19, drive 0" 
> > command.
> > 05-Aug 06:03 VU0EM005: 3305 Autochanger "load slot 19, drive 0", status is 
> > OK.
> > 05-Aug 06:03 VU0EM005: 3301 Issuing autochanger "loaded? drive 0" command.
> > 05-Aug 06:03 VU0EM005: 3302 Autochanger "loaded? drive 0", result is Slot 
> > 19.
> > 05-Aug 06:03 VU0EM005: Recycled volume "06D138L3" on device "LTO3" 
> > (/dev/ULTRIUM-TD3), all previous data lost.
> > 05-Aug 06:03 VU0EM005: New volume "06D138L3" mounted on device "LTO3" 
> > (/dev/ULTRIUM-TD3) at 05-Aug-2007 06:03.
> > 05-Aug 06:31 VU0EM005: Despooling elapsed time = 01:44:20, Transfer rate = 
> > 72.04 M bytes/second
> > 05-Aug 06:42 VU0EM005: Spooling data again ...
> > 05-Aug 08:12 VU0EM005: Job write elapsed time = 05:55:07, Transfer rate = 
> > 33.16 M bytes/second
> > 05-Aug 08:12 VU0EM005: Committing spooled data to Volume "06D138L3". 
> > Despooling 256,418,397,693 bytes ...
> > 05-Aug 09:13 VU0EM005: Despooling elapsed time = 01:01:30, Transfer rate = 
> > 69.49 M bytes/second
> > 05-Aug 09:20 VU0EM005: Alert: smartctl version 5.36 
> > [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
> > 05-Aug 09:20 VU0EM005: Alert: Home page is 
> > http://smartmontools.sourceforge.net/
> > 05-Aug 09:20 VU0EM005: Alert:
> > 05-Aug 09:20 VU0EM005: Alert: TapeAlert: OK
> > 05-Aug 09:20 VU0EM005: Alert:
> > 05-Aug 09:20 VU0EM005: Alert: Error Counter logging not supported
> > 05-Aug 09:20 VU0EM005: Alert:
> > 05-Aug 09:20 VU0EM005: Alert: Non-medium error count:        0
> > 05-Aug 09:20 VU0EM005: Sending spooled attrs to the Director. Despooling 
> > 600,205,128 bytes ...
> > 05-Aug 09:37 VU0EM005: Bacula 2.0.3 (06Mar07): 05-Aug-2007 09:37:43
> > 05-Aug 09:37 VU0EM005: Bacula 2.0.3 (06Mar07): 05-Aug-2007 09:37:43
> >   JobId:                  416
> >   Job:                    VU0EM003.2007-08-05_00.05.01
> >   Backup Level:           Full
> >   Client:                 "VU0EM003" 2.0.2 (28Jan07) 
> > x86_64-unknown-linux-gnu,debian,4.0
> >   FileSet:                "VU0EM003" 2007-06-12 00:05:01
> >   Pool:                   "Full" (From Run pool override)
> >   Storage:                "NEC-T40A" (From Job resource)
> >   Scheduled time:         05-Aug-2007 00:05:00
> >   Start time:             05-Aug-2007 02:13:49
> >   End time:               05-Aug-2007 09:37:43
> >   Elapsed time:           7 hours 23 mins 54 secs
> >   Priority:               10
> >   FD Files Written:       1,631,258
> >   SD Files Written:       1,631,258
> >   FD Bytes Written:       706,308,678,026 (706.3 GB)
> >   SD Bytes Written:       706,666,406,084 (706.6 GB)
> >   Rate:                   26519.1 KB/s
> >   Software Compression:   None
> >   VSS:                    no
> >   Encryption:             no
> >   Volume name(s):         06D128L3|06D138L3
> >   Volume Session Id:      106
> >   Volume Session Time:    1184053226
> >   Last Volume Bytes:      369,532,025,856 (369.5 GB)
> >   Non-fatal FD errors:    0
> >   SD Errors:              0
> >   FD termination status:  OK
> >   SD termination status:  OK
> >   Termination:            Backup OK
> > 
> > 
> > 
> > I've rerun the btape test (test/fill). No problems. At the moment I'm 
> > running
> > the full backup to the same tapes again. The end of the first tape was 
> > detected
> > without Block checksum mismatch.
> 
> You could have tried a restore of the job with the (possible) error...

I did. First I restored only a few diretories, without errors (70 GB).
I was still thinking the verify job had problems, not the real backup.

There were many different errors with the verify jobs. But the last
verify job  Itried also had a block mismatch at the end of tape #1.

07-Aug 19:54 VU0EM005: End of file 405 on device "LTO3" (/dev/ULTRIUM-TD3), 
Volume "06D128L3"
07-Aug 19:54 VU0EM005: VerifyVU0EM003.2007-08-07_18.18.29 Error: block.c:317 
Volume data error at 405:11594!


Then I did a full restore. This time I got an completely different
error at a different volfile.

07-Aug 21:56 VU0EM005: End of file 203 on device "LTO3" (/dev/ULTRIUM-TD3), 
Volume "06D128L3"
07-Aug 22:12 VU0EM005: RestoreFiles.2007-08-07_20.56.07 Fatal error: read.c:139 
Error sending to File daemon. ERR=Die Wartezeit für die Verbindung ist 
abgelaufen
07-Aug 22:12 VU0EM005: RestoreFiles.2007-08-07_20.56.07 Error: bnet.c:439 Write 
error sending 65536 bytes to client:10.60.1.252:36643: ERR=Die Wartezeit für 
die Verbindung ist abgelaufen
07-Aug 22:35 VU0EM003: RestoreFiles.2007-08-07_20.56.07 Fatal error: 
restore.c:252 Data record error. ERR=Unterbrechung während des
Betriebssystemaufrufs


> Regarding the tape write/read error, I believe these happen from time 
> to time, especially when related to end of tape detection (400GB on 
> LTO3 seems like the tape was actually full, but for some reason the 
> drive didn't report the approaching EOT to the OS). I would worry if 
> this happened regularly, but if it's a one-time glitch, I think 
> there's no problem.

That was the first backup with two tapes since I tested the drive 6
months ago...

Ralf

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Block checksum mismatch but Termination: Backup OK?

Reply via email to