Ralf Gross schrieb:
> Kern Sibbald schrieb:
> > > I wrote a mail to the -users list about problems with verify jobs,
> > > that may or may not hardware related.
> > >
> > > Now I have an additional question to the developers.
> > >
> > > my 2 org. mails:
> > > >> Now the VolumeToCatalog verify job fails each time. I tried two
> > > >> different drives with the same result. The job fails not always at
> > > >> the
> > > >> same position or tape.
> > > >>
> > > >>
> > > >> 16-Apr 14:58 VU0EA003-sd JobId 11277: Forward spacing Volume
> > > >> "A00147L4" to file:block 0:1.
> > > >> 16-Apr 17:46 VU0EA003-sd JobId 11277: Error: block.c:318 Volume data
> > > >> error at 406:6128!  Block checksum mismatch in block=26762981
> > > >> len=64512: calc=dae26793 blk=b522f8d9
> > > >>
> > > >> 17-Apr 13:55 VU0EA003-sd JobId 11292: Forward spacing Volume
> > > >> "A00141L4" to file:block 0:1.
> > > >> 17-Apr 18:35 VU0EA003-sd JobId 11292: Error: block.c:318 Volume data
> > > >> error at 657:11272!  Block checksum mismatch in block=56358402
> > > >> len=64512: calc=5d582a92 blk=63befe58
> > > >>
> > > >>
> > > >>
> > > >> There are no SCSI errors in the linux syslog or errors in the
> > > >> changers
> > > >> system log.
> > > >>
> > > >> Any Idea what to do next? It looks like a hardware problem, but why
> > > >> does it then fail on different drives and different tapes and not at
> > > >> the same position?
> > > >
> > > >Hm, I used btape with the option scanblocks with the tape where the
> > > >verify job had block checksum mismatch.
> > > >
> > > >[...]
> > > >15500 blocks of 64512 bytes in file 822
> > > >End of File mark.
> > > >8243 blocks of 64512 bytes in file 823
> > > >End of File mark.
> > > >Total files=823, blocks=12749243, bytes = 822,479,100,004
> > > >
> > > >
> > > >But this seems not to check the the block checksum that is checked
> > > >during a verify
> > > >
> > > >Is there an other bacula tool to check the block checksum?
> > 
> > Since I don't know what version of Bacula you are using, nor exactly what 
> > commands produced the errors above, I can only respond in general.
> 
> Sorry, this was in my original mail to the -users list. It's bacula
> 2.4.4 on debian etch. The above error occured during 2 verify jobs of
> the same backup job. Not the same tape, not the same drive.
> 
>  
> > If you are getting block checkum errors, it means that the data that was 
> > read 
> > is not the same as the data that was written.  I'd be very surprised if 
> > there 
> > are not SCSI errors noted in the log.  I'd also be very surprised if there 
> > are not errors reported by the drive itself (you should definitely enable 
> > alert checking and run manual alert checks on your drive).
> 
> 
> No scsi errors in the bacula, syslog or changer log. Neither during
> backup nor during verify.
> 
>  
> > The first thing to do is to do a controlled back (i.e. known files, small 
> > number of files).  Verify that there are check sum errors.  Restore the 
> > backup (check for check sum errors) and compare the files on disk versus 
> > the 
> > files restored.
> 
> The problem is, that the backup job is ~10 TB large and the checksum
> errors didn't occur at the same position or tape. So where to start to
> be sure that there was/is no problem. I've to think about it...
> 
> 
> > > How can I check the block checksum of a tape (not the whole backup
> > > job) with one of the bacula tools?
> > 
> > Btape scanning reads blocks and does not look at the block data (e.g. the 
> > block checksum is in the block header).
> > 
> > Checksum verification is almost certainly enabled with bextract, bcopy and 
> > bscan, bls, ... (in short any program that looks at the contents of the 
> > blocks), but that approach seems to me not to be very useful. What counts 
> > is 
> > whether you get the right data when you restore using Bacula.
> 
> I did a complete bscan of the tape where the checksum error occured the
> second time. No error this time.
> 
> [...]
> bscan: bscan.c:410 Record: SessId=20 SessTim=1239118594 FileIndex=443 
> Stream=2 len=65536
> 18-Apr 11:56 bscan JobId 0: End of file 823 on device "ULTRIUM-TD4-D3" 
> (/dev/ULTRIUM-TD4-D3), Volume "A00141L4"
> 18-Apr 11:56 bscan JobId 0: End of Volume at file 823 on device 
> "ULTRIUM-TD4-D3" (/dev/ULTRIUM-TD4-D3), Volume "A00141L4" bscan: 
> bscan.c:323-0 ========== JobId=0 ========
> 18-Apr 11:56 bscan JobId 0: End of all volumes.
> bscan: bscan.c:410 Record: SessId=0 SessTim=0 FileIndex=-6 Stream=0 len=0
> bscan: bscan.c:637 End of all Volumes. VolFiles=823 VolBlocks=0 
> VolBytes=822,020,123,328


a couple of weeks later....

I'm seeing these block checksum errors very often - mostly during very long
verify jobs. I've already used bextract to dump the whole content of a tape to
disk to check the md5sums with the orig.  md5sums of the files on the server.
No difference and no checksum error during the bextract.


26-May 04:51 VU0EA003-sd JobId 12376: Ready to read from volume "A00098L4" on 
device "ULTRIUM-TD4-D3" (/dev/ULTRIUM-TD4-D3).
26-May 04:51 VU0EA003-sd JobId 12376: Forward spacing Volume "A00098L4" to 
file:block 0:1.
26-May 08:06 VU0EA003-sd JobId 12376: Error: block.c:318 Volume data error at 
721:4451!
Block checksum mismatch in block=106863406 len=64512: calc=9a831670 blk=594c2ab

26-May 08:08 VU0EA003-sd JobId 12375: Ready to read from volume "A00101L4" on 
device "ULTRIUM-TD4-D1" (/dev/ULTRIUM-TD4-D1).
26-May 08:08 VU0EA003-sd JobId 12375: Forward spacing Volume "A00101L4" to 
file:block 0:1.
26-May 11:21 VU0EA003-sd JobId 12375: Error: block.c:318 Volume data error at 
536:5128!
Block checksum mismatch in block=98302463 len=64512: calc=4b5009bb blk=8f19f2ae

23-May 05:42 VU0EA003-sd JobId 12179: Ready to read from volume "A00103L4" on 
device "ULTRIUM-TD4-D1" (/dev/ULTRIUM-TD4-D1).
23-May 05:42 VU0EA003-sd JobId 12179: Forward spacing Volume "A00103L4" to 
file:block 0:1.
23-May 10:04 VU0EA003-sd JobId 12179: Error: block.c:318 Volume data error at 
670:6402!
Block checksum mismatch in block=72810058 len=64512: calc=8e4317ce blk=5c6466cc

16-May 01:52 VU0EA003-sd JobId 12108: Ready to read from volume "A00139L4" on 
device "ULTRIUM-TD4-D2" (/dev/ULTRIUM-TD4-D2).
16-May 01:52 VU0EA003-sd JobId 12108: Forward spacing Volume "A00139L4" to 
file:block 0:1.
16-May 03:45 VU0EA003-sd JobId 12108: Error: block.c:318 Volume data error at 
265:6176!
Block checksum mismatch in block=31284246 len=64512: calc=3e144e45 blk=bca5470


This is something I can't understand. It happens on 3 different drives (both
scsi and fiber channel, 2 were already changed by the support) and different
tapes. It even fails on different tapes if I rerun the same verify jobs. I
thought it may be a thermical issue, but I see no problems during backup and
the drive's temperature is ok. No SCSI errors, no library errors.


Question: is there a way to only check the affected blocks? Maybe check only
block=31284246. This wont't solve the main issue, but a least I can check the
affected block right after the verify job failed and see what I get then.

Ralf

------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA, & Big Spaceship. http://www.creativitycat.com 
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to