How do you do, bacula-users.
I've inherited a backup system powered by Bacula (Bacula 1.36.1), it
runs on Solaris 10 for x86 and stores data on a disk array. Previuos
sysadmin installed it, but now he is not accesible anymore. The server
running Bacula is used for some other important things also, so I have
to treat it and reconfigure it with caution.
It worked OK for some while, I'm new to Bacula so I didn't touch a
working software. It starts doing backups at early night and finishes
in the morning. Full backups every sunday and incrementals dayly. But
after some time one of bacula processes started to crash every morning
and 1 or more (or all) jobs were left not done. Such situation last
for some weeks - it become clear to me that I need help.
Here I'll try to describe what's happening.
--------------------------------------------------------------
Normally on the server it looked like this
# ps -ef | grep bacula|grep -v grep
bacula 1362 1 0 10:23:03 ? 0:00
/usr/local/bacula/sbin/bacula-dir -u bacula -g bacula -v -c /usr/local/bacula/e
root 1350 1 0 10:22:40 ? 0:00
/usr/local/bacula/sbin/bacula-fd -u root -g root -v -c /usr/local/bacula/etc/ba
bacula 1348 1 0 10:22:40 ? 0:00
/usr/local/bacula/sbin/bacula-sd -u bacula -g bacula -v -c /usr/local/bacula/et
and every morning Director's process bacula-dir is missing.
Last morning log's end looks like this
# less /var/db/bacula/log
... ... ...
02-Nov 01:50 nfs4p-dir: Start Backup JobId 3959,
Job=BackupCatalog.2005-11-02_01.10.00
02-Nov 01:50 s10-sd: Volume "Vol0086" previously written, moving to end of data.
02-Nov 01:51 nfs4p-dir: Bacula 1.36.1 (26Nov04): 02-Nov-2005 01:51:28
JobId: 3959
Job: BackupCatalog.2005-11-02_01.10.00
Backup Level: Full
Client: s10-fd
FileSet: "Catalog" 2005-02-10 02:39:03
Pool: "Files"
Storage: "File"
Start time: 02-Nov-2005 01:37:20
End time: 02-Nov-2005 01:51:28
FD Files Written: 1
SD Files Written: 1
FD Bytes Written: 186,384,816
SD Bytes Written: 186,384,929
Rate: 219.8 KB/s
Software Compression: 76.7 %
Volume name(s): Vol0086
Volume Session Id: 14
Volume Session Time: 1130830073
Last Volume Bytes: 503,379,443
Non-fatal FD errors: 0
SD Errors: 0
FD termination status: OK
SD termination status: OK
Termination: Backup OK
02-Nov 01:51 nfs4p-dir: Begin pruning Jobs.
02-Nov 01:51 nfs4p-dir: No Jobs found to prune.
02-Nov 01:51 nfs4p-dir: Begin pruning Files.
02-Nov 01:51 nfs4p-dir: No Files found to prune.
02-Nov 01:51 nfs4p-dir: End auto prune.
02-Nov 03:15 nfs4p-dir: Start Backup JobId 3960,
Job=sinux-oracle.2005-11-02_03.15.00
02-Nov 03:15 sinux-fd: ClientRunBeforeJob: -su: line 8: ulimit: max user
processes: cannot modify limit: Operation not permitt
ed
02-Nov 03:19 sinux-fd: ClientRunBeforeJob:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: SQL*Plus: Release 10.1.0.3.0 -
Production on Wed Nov 2 03:19:27 2005
02-Nov 03:19 sinux-fd: ClientRunBeforeJob:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: Copyright (c) 1982, 2004, Oracle.
All rights reserved.
02-Nov 03:19 sinux-fd: ClientRunBeforeJob:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: Connected to:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: Oracle Database 10g Enterprise
Edition Release 10.1.0.3.0 - Production
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: With the Partitioning, OLAP and Data
Mining options
02-Nov 03:19 sinux-fd: ClientRunBeforeJob:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: TO_CHAR(SYSDATE,'YY
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: -------------------
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: 2005-11-02 03:19:28
02-Nov 03:19 sinux-fd: ClientRunBeforeJob:
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: Disconnected from Oracle Database
10g Enterprise Edition Release 10.1.0.3.0 - Produ
ction
02-Nov 03:19 sinux-fd: ClientRunBeforeJob: With the Partitioning, OLAP and Data
Mining options
02-Nov 03:19 s10-sd: Volume "Vol0086" previously written, moving to end of data.
02-Nov 03:21 sinux-fd: ClientRunAfterJob: -su: line 8: ulimit: max user
processes: cannot modify limit: Operation not permitte
d
02-Nov 03:21 nfs4p-dir: Bacula 1.36.1 (26Nov04): 02-Nov-2005 03:21:02
JobId: 3960
Job: sinux-oracle.2005-11-02_03.15.00
Backup Level: Full
Client: sinux-fd
FileSet: "sinux-oracle" 2005-02-10 03:15:02
Pool: "Files"
Storage: "File"
Start time: 02-Nov-2005 03:15:02
End time: 02-Nov-2005 03:21:02
FD Files Written: 5
SD Files Written: 5
FD Bytes Written: 181,931,482
SD Bytes Written: 181,932,043
Rate: 505.4 KB/s
Software Compression: 75.6 %
Volume name(s): Vol0086
Volume Session Id: 15
Volume Session Time: 1130830073
Last Volume Bytes: 685,686,572
Non-fatal FD errors: 0
SD Errors: 0
FD termination status: OK
SD termination status: OK
Termination: Backup OK
02-Nov 03:21 nfs4p-dir: Begin pruning Jobs.
02-Nov 03:21 nfs4p-dir: No Jobs found to prune.
02-Nov 03:21 nfs4p-dir: Begin pruning Files.
02-Nov 03:21 nfs4p-dir: No Files found to prune.
02-Nov 03:21 nfs4p-dir: End auto prune.
02-Nov 07:05 nfs4p-dir: Start Backup JobId 3961,
Job=cgatex-full.2005-11-02_07.05.00
02-Nov 07:05 cgatex-fd-fd: Since time adjusted by 0 seconds.
02-Nov 07:05 s10-sd: Volume "Vol0086" previously written, moving to end of data.
02-Nov 07:06 s10-sd: User defined maximum volume capacity 734,003,200 exceeded
on device /d/0/bacula.
02-Nov 07:06 s10-sd: End of medium on Volume "Vol0086" Bytes=733,941,548
Blocks=11,378 at 02-Nov-2005 07:06.
02-Nov 07:06 nfs4p-dir: Recycled volume "Vol0087"
02-Nov 07:06 s10-sd: Recycled volume "Vol0087" on device "/d/0/bacula", all
previous data lost.
02-Nov 07:06 s10-sd: New volume "Vol0087" mounted on device /d/0/bacula at
02-Nov-2005 07:06.
02-Nov 07:19 s10-sd: User defined maximum volume capacity 734,003,200 exceeded
on device /d/0/bacula.
02-Nov 07:19 s10-sd: End of medium on Volume "Vol0087" Bytes=733,952,971
Blocks=11,377 at 02-Nov-2005 07:19.
02-Nov 07:19 nfs4p-dir: Recycled volume "Vol0090"
02-Nov 07:19 s10-sd: Recycled volume "Vol0090" on device "/d/0/bacula", all
previous data lost.
02-Nov 07:19 s10-sd: New volume "Vol0090" mounted on device /d/0/bacula at
02-Nov-2005 07:19.
02-Nov 07:35 s10-sd: User defined maximum volume capacity 734,003,200 exceeded
on device /d/0/bacula.
02-Nov 07:35 s10-sd: End of medium on Volume "Vol0090" Bytes=733,952,984
Blocks=11,377 at 02-Nov-2005 07:35.
02-Nov 07:35 nfs4p-dir: Recycled volume "Vol0091"
02-Nov 07:35 s10-sd: Recycled volume "Vol0091" on device "/d/0/bacula", all
previous data lost.
02-Nov 07:35 s10-sd: New volume "Vol0091" mounted on device /d/0/bacula at
02-Nov-2005 07:35.
02-Nov 08:04 s10-sd: User defined maximum volume capacity 734,003,200 exceeded
on device /d/0/bacula.
02-Nov 08:04 s10-sd: End of medium on Volume "Vol0091" Bytes=733,952,897
Blocks=11,377 at 02-Nov-2005 08:04.
"That's all, folks!" (c) :-(
I run
# /etc/bacula/bconsole
and see
Connecting to Director 127.0.0.1:9101
1000 OK: nfs4p-dir Version: 1.36.1 (26 November 2004)
Enter a period to cancel a command.
*status 1
Using default Catalog name=MyCatalog DB=bacula
Automatically selected Storage: File
Connecting to Storage daemon File at 10.253.4.15:9103
s10-sd Version: 1.36.1 (26 November 2004) i386-pc-solaris2.10 solaris 5.10
Daemon started 02-Nov-05 20:10, 0 Jobs run since started.
Running Jobs:
No Jobs running.
====
Terminated Jobs:
JobId Level Files Bytes Status Finished Name
======================================================================
3952 Incr 2,462 1,889,217 OK 02-Nov-05 01:21 ns02
3953 Incr 1 33,512,324 OK 02-Nov-05 01:22 sinux
3954 Incr 83 28,008,073 OK 02-Nov-05 01:23 dbh1-matroska
3955 Incr 0 0 OK 02-Nov-05 01:23 dbh1-configs
3956 Incr 0 0 OK 02-Nov-05 01:23 dbh1-home
3957 Incr 1,418 84,707,857 OK 02-Nov-05 01:28 hpov-full
3958 Incr 67 615,990,101 OK 02-Nov-05 01:37 dbh2-full
3959 Full 1 186,384,929 OK 02-Nov-05 01:51 BackupCatalog
3960 Full 5 181,932,043 OK 02-Nov-05 03:21 sinux-oracle
3961 Incr 9,889 2,246,582,808 Cancel 02-Nov-05 10:22 cgatex-full
====
Device status:
Device "/d/0/bacula" is not open.
====
The last job is the most important - it's the mail server... :-(
If I leave this console till next morning and try to enter any command
after the bacula-dir crashes it'll die also being unable to connect to
Director.
I tried to search using quotes from logs and messages I was getting,
but I haven't found somthing that would match my problem. My
colleagues couldn't help me - they haven't seen all this before.
I can surely restart Bacula (with the startup script
/etc/rc3.d/S50bacula with target restart , for example) but the
promlem persists - I see exactly what I just wrote here.
--------------------------------------------------------------
Smart guys that know what to do - please help!
Maybe I should quote some extra logs or some configs or something
else... I really want to ask a good question so a good answer could
be given.
Thanks for your attention. I really need help to make my problem clear
and solve it. Any good advice will move things from bad to good.
Good luck to everybody!
--
SY Vadim A. Umanski
-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users