Kern Sibbald schrieb:
> > I wrote a mail to the -users list about problems with verify jobs,
> > that may or may not hardware related.
> >
> > Now I have an additional question to the developers.
> >
> > my 2 org. mails:
> > >> Now the VolumeToCatalog verify job fails each time. I tried two
> > >> different drives with the same result. The job fails not always at
> > >> the
> > >> same position or tape.
> > >>
> > >>
> > >> 16-Apr 14:58 VU0EA003-sd JobId 11277: Forward spacing Volume
> > >> "A00147L4" to file:block 0:1.
> > >> 16-Apr 17:46 VU0EA003-sd JobId 11277: Error: block.c:318 Volume data
> > >> error at 406:6128!  Block checksum mismatch in block=26762981
> > >> len=64512: calc=dae26793 blk=b522f8d9
> > >>
> > >> 17-Apr 13:55 VU0EA003-sd JobId 11292: Forward spacing Volume
> > >> "A00141L4" to file:block 0:1.
> > >> 17-Apr 18:35 VU0EA003-sd JobId 11292: Error: block.c:318 Volume data
> > >> error at 657:11272!  Block checksum mismatch in block=56358402
> > >> len=64512: calc=5d582a92 blk=63befe58
> > >>
> > >>
> > >>
> > >> There are no SCSI errors in the linux syslog or errors in the
> > >> changers
> > >> system log.
> > >>
> > >> Any Idea what to do next? It looks like a hardware problem, but why
> > >> does it then fail on different drives and different tapes and not at
> > >> the same position?
> > >
> > >Hm, I used btape with the option scanblocks with the tape where the
> > >verify job had block checksum mismatch.
> > >
> > >[...]
> > >15500 blocks of 64512 bytes in file 822
> > >End of File mark.
> > >8243 blocks of 64512 bytes in file 823
> > >End of File mark.
> > >Total files=823, blocks=12749243, bytes = 822,479,100,004
> > >
> > >
> > >But this seems not to check the the block checksum that is checked
> > >during a verify
> > >
> > >Is there an other bacula tool to check the block checksum?
> 
> Since I don't know what version of Bacula you are using, nor exactly what 
> commands produced the errors above, I can only respond in general.

Sorry, this was in my original mail to the -users list. It's bacula
2.4.4 on debian etch. The above error occured during 2 verify jobs of
the same backup job. Not the same tape, not the same drive.

 
> If you are getting block checkum errors, it means that the data that was read 
> is not the same as the data that was written.  I'd be very surprised if there 
> are not SCSI errors noted in the log.  I'd also be very surprised if there 
> are not errors reported by the drive itself (you should definitely enable 
> alert checking and run manual alert checks on your drive).


No scsi errors in the bacula, syslog or changer log. Neither during
backup nor during verify.

 
> The first thing to do is to do a controlled back (i.e. known files, small 
> number of files).  Verify that there are check sum errors.  Restore the 
> backup (check for check sum errors) and compare the files on disk versus the 
> files restored.

The problem is, that the backup job is ~10 TB large and the checksum
errors didn't occur at the same position or tape. So where to start to
be sure that there was/is no problem. I've to think about it...


> > How can I check the block checksum of a tape (not the whole backup
> > job) with one of the bacula tools?
> 
> Btape scanning reads blocks and does not look at the block data (e.g. the 
> block checksum is in the block header).
> 
> Checksum verification is almost certainly enabled with bextract, bcopy and 
> bscan, bls, ... (in short any program that looks at the contents of the 
> blocks), but that approach seems to me not to be very useful. What counts is 
> whether you get the right data when you restore using Bacula.

I did a complete bscan of the tape where the checksum error occured the
second time. No error this time.

[...]
bscan: bscan.c:410 Record: SessId=20 SessTim=1239118594 FileIndex=443 Stream=2 
len=65536
18-Apr 11:56 bscan JobId 0: End of file 823 on device "ULTRIUM-TD4-D3" 
(/dev/ULTRIUM-TD4-D3), Volume "A00141L4"
18-Apr 11:56 bscan JobId 0: End of Volume at file 823 on device 
"ULTRIUM-TD4-D3" (/dev/ULTRIUM-TD4-D3), Volume "A00141L4" bscan: bscan.c:323-0 
========== JobId=0 ========
18-Apr 11:56 bscan JobId 0: End of all volumes.
bscan: bscan.c:410 Record: SessId=0 SessTim=0 FileIndex=-6 Stream=0 len=0
bscan: bscan.c:637 End of all Volumes. VolFiles=823 VolBlocks=0 
VolBytes=822,020,123,328



> > Is there a way to verify an older backup? AFAIK a verify job only
> > verifies against the last jobid. Now I have the problem, that in the
> > meantime an incremental job finished, so I can't verify the full
> > backup that had the block checksum errors.
> 
> I believe that there is a way to enter the jobid for a "manual" verify as 
> opposed to automatic verify -- read the manual.


Hm, I still think this was a limitation of the verify code and I cant find any
way to tell bacula to verify a given jobid.

Selection aborted, nothing done.
Run Verify job
JobName:     VerifyVU0EM003-FBR
Level:       VolumeToCatalog
Client:      VU0EA003-fd
FileSet:     VU0EM003-FBR
Pool:        2-Month-Full (From Job resource)
Storage:     Neo4100-LTO4-D2 (From Pool resource)
Verify Job:  VU0EM003-FBR
Verify List: 
When:        2009-04-18 12:52:14
Priority:    10
OK to run? (yes/mod/no): 



Thanks for your reply Kern,
Ralf

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to