Ralf Gross schrieb: > Kern Sibbald schrieb: > > > I wrote a mail to the -users list about problems with verify jobs, > > > that may or may not hardware related. > > > > > > Now I have an additional question to the developers. > > > > > > my 2 org. mails: > > > >> Now the VolumeToCatalog verify job fails each time. I tried two > > > >> different drives with the same result. The job fails not always at > > > >> the > > > >> same position or tape. > > > >> > > > >> > > > >> 16-Apr 14:58 VU0EA003-sd JobId 11277: Forward spacing Volume > > > >> "A00147L4" to file:block 0:1. > > > >> 16-Apr 17:46 VU0EA003-sd JobId 11277: Error: block.c:318 Volume data > > > >> error at 406:6128! Block checksum mismatch in block=26762981 > > > >> len=64512: calc=dae26793 blk=b522f8d9 > > > >> > > > >> 17-Apr 13:55 VU0EA003-sd JobId 11292: Forward spacing Volume > > > >> "A00141L4" to file:block 0:1. > > > >> 17-Apr 18:35 VU0EA003-sd JobId 11292: Error: block.c:318 Volume data > > > >> error at 657:11272! Block checksum mismatch in block=56358402 > > > >> len=64512: calc=5d582a92 blk=63befe58 > > > >> > > > >> > > > >> > > > >> There are no SCSI errors in the linux syslog or errors in the > > > >> changers > > > >> system log. > > > >> > > > >> Any Idea what to do next? It looks like a hardware problem, but why > > > >> does it then fail on different drives and different tapes and not at > > > >> the same position? > > > > > > > >Hm, I used btape with the option scanblocks with the tape where the > > > >verify job had block checksum mismatch. > > > > > > > >[...] > > > >15500 blocks of 64512 bytes in file 822 > > > >End of File mark. > > > >8243 blocks of 64512 bytes in file 823 > > > >End of File mark. > > > >Total files=823, blocks=12749243, bytes = 822,479,100,004 > > > > > > > > > > > >But this seems not to check the the block checksum that is checked > > > >during a verify > > > > > > > >Is there an other bacula tool to check the block checksum? > > > > Since I don't know what version of Bacula you are using, nor exactly what > > commands produced the errors above, I can only respond in general. > > Sorry, this was in my original mail to the -users list. It's bacula > 2.4.4 on debian etch. The above error occured during 2 verify jobs of > the same backup job. Not the same tape, not the same drive. > > > > If you are getting block checkum errors, it means that the data that was > > read > > is not the same as the data that was written. I'd be very surprised if > > there > > are not SCSI errors noted in the log. I'd also be very surprised if there > > are not errors reported by the drive itself (you should definitely enable > > alert checking and run manual alert checks on your drive). > > > No scsi errors in the bacula, syslog or changer log. Neither during > backup nor during verify. > > > > The first thing to do is to do a controlled back (i.e. known files, small > > number of files). Verify that there are check sum errors. Restore the > > backup (check for check sum errors) and compare the files on disk versus > > the > > files restored. > > The problem is, that the backup job is ~10 TB large and the checksum > errors didn't occur at the same position or tape. So where to start to > be sure that there was/is no problem. I've to think about it... > > > > > How can I check the block checksum of a tape (not the whole backup > > > job) with one of the bacula tools? > > > > Btape scanning reads blocks and does not look at the block data (e.g. the > > block checksum is in the block header). > > > > Checksum verification is almost certainly enabled with bextract, bcopy and > > bscan, bls, ... (in short any program that looks at the contents of the > > blocks), but that approach seems to me not to be very useful. What counts > > is > > whether you get the right data when you restore using Bacula. > > I did a complete bscan of the tape where the checksum error occured the > second time. No error this time. > > [...] > bscan: bscan.c:410 Record: SessId=20 SessTim=1239118594 FileIndex=443 > Stream=2 len=65536 > 18-Apr 11:56 bscan JobId 0: End of file 823 on device "ULTRIUM-TD4-D3" > (/dev/ULTRIUM-TD4-D3), Volume "A00141L4" > 18-Apr 11:56 bscan JobId 0: End of Volume at file 823 on device > "ULTRIUM-TD4-D3" (/dev/ULTRIUM-TD4-D3), Volume "A00141L4" bscan: > bscan.c:323-0 ========== JobId=0 ======== > 18-Apr 11:56 bscan JobId 0: End of all volumes. > bscan: bscan.c:410 Record: SessId=0 SessTim=0 FileIndex=-6 Stream=0 len=0 > bscan: bscan.c:637 End of all Volumes. VolFiles=823 VolBlocks=0 > VolBytes=822,020,123,328
a couple of weeks later.... I'm seeing these block checksum errors very often - mostly during very long verify jobs. I've already used bextract to dump the whole content of a tape to disk to check the md5sums with the orig. md5sums of the files on the server. No difference and no checksum error during the bextract. 26-May 04:51 VU0EA003-sd JobId 12376: Ready to read from volume "A00098L4" on device "ULTRIUM-TD4-D3" (/dev/ULTRIUM-TD4-D3). 26-May 04:51 VU0EA003-sd JobId 12376: Forward spacing Volume "A00098L4" to file:block 0:1. 26-May 08:06 VU0EA003-sd JobId 12376: Error: block.c:318 Volume data error at 721:4451! Block checksum mismatch in block=106863406 len=64512: calc=9a831670 blk=594c2ab 26-May 08:08 VU0EA003-sd JobId 12375: Ready to read from volume "A00101L4" on device "ULTRIUM-TD4-D1" (/dev/ULTRIUM-TD4-D1). 26-May 08:08 VU0EA003-sd JobId 12375: Forward spacing Volume "A00101L4" to file:block 0:1. 26-May 11:21 VU0EA003-sd JobId 12375: Error: block.c:318 Volume data error at 536:5128! Block checksum mismatch in block=98302463 len=64512: calc=4b5009bb blk=8f19f2ae 23-May 05:42 VU0EA003-sd JobId 12179: Ready to read from volume "A00103L4" on device "ULTRIUM-TD4-D1" (/dev/ULTRIUM-TD4-D1). 23-May 05:42 VU0EA003-sd JobId 12179: Forward spacing Volume "A00103L4" to file:block 0:1. 23-May 10:04 VU0EA003-sd JobId 12179: Error: block.c:318 Volume data error at 670:6402! Block checksum mismatch in block=72810058 len=64512: calc=8e4317ce blk=5c6466cc 16-May 01:52 VU0EA003-sd JobId 12108: Ready to read from volume "A00139L4" on device "ULTRIUM-TD4-D2" (/dev/ULTRIUM-TD4-D2). 16-May 01:52 VU0EA003-sd JobId 12108: Forward spacing Volume "A00139L4" to file:block 0:1. 16-May 03:45 VU0EA003-sd JobId 12108: Error: block.c:318 Volume data error at 265:6176! Block checksum mismatch in block=31284246 len=64512: calc=3e144e45 blk=bca5470 This is something I can't understand. It happens on 3 different drives (both scsi and fiber channel, 2 were already changed by the support) and different tapes. It even fails on different tapes if I rerun the same verify jobs. I thought it may be a thermical issue, but I see no problems during backup and the drive's temperature is ok. No SCSI errors, no library errors. Question: is there a way to only check the affected blocks? Maybe check only block=31284246. This wont't solve the main issue, but a least I can check the affected block right after the verify job failed and see what I get then. Ralf ------------------------------------------------------------------------------ Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://www.creativitycat.com _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
