On Thursday 06 April 2006 23:28, Kern Sibbald wrote:
> There is something wrong with what seems to be a RunAfterJob
> "/usr/local/bacula/scripts/delete_catalog_backup" script.  Perhaps there is
> a problem with command line editing or some other race condition because it
> is passing an invalid address off to the fork().

Hmm...OK.  That script is the standard bacula script straight from source:

#!/bin/sh
#
# This script deletes a catalog dump
#
rm -f /var/bacula/working/bacula.sql

And it has never been an issue before.  That is really weird.  And I don't 
even know why it would fire that off yet, that is a job that has lower 
priority than all the others.

OK, I'll do a run without that job that calls delete_catalog_backup, that is, 
without the bacula CatalogBackup job.  I used this little bit of bash script:

for ii in warbucks fmpserver distance locus elive admin1 communications1 
communications2 communications3 grades1 libro otter records1 registrar1 ruth 
sheri1 textbook1 textbook3 textbook4 curt fiscalpro bob5 idbigblue odonnell 
webmaster webdev1
do echo -e "run $ii\nmod\n1\n2\nyes\nq\n"| ./bconsole
done

All the jobs got started just fine...ran for a bit, and then it crashed again.  
Thread dump below.  This time bsmtp had an invalid mode.  But these are 
standard scripts from the bacula source.  Any ideas?  Disable SMP?  Upgrade 
to 1.38?  Go back to my tape array, with which I never had any problems?

[Thread 213002 (LWP 32696) exited]
Cannot find thread 213002: invalid thread handle
(gdb) thread apply all bt

Thread 23 (Thread 344079 (LWP 32731)):
#0  0xb7e875bb in read () from /lib/i686/libpthread.so.0
#1  0x080f1d7c in ?? ()
#2  0x00000000 in ?? ()
#3  0x0807fac7 in bnet_recv (bsock=0x80ef540) at bnet.c:72
#4  0x08054479 in bget_dirmsg (bs=0x80ef540) at getmsg.c:79
#5  0x0805b670 in msg_thread (arg=0x80ec4d8) at msgchan.c:248
#6  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#7  0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#8  0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 21 (Thread 311310 (LWP 32719)):
#0  0xb7e875bb in read () from /lib/i686/libpthread.so.0
#1  0x080ef81c in ?? ()
#2  0x00000000 in ?? ()
#3  0x0807fac7 in bnet_recv (bsock=0x80df178) at bnet.c:72
#4  0x08054479 in bget_dirmsg (bs=0x80df178) at getmsg.c:79
#5  0x0805b670 in msg_thread (arg=0x80e3a98) at msgchan.c:248
#6  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#7  0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#8  0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 20 (Thread 294925 (LWP 32716)):
#0  0xb7e875bb in read () from /lib/i686/libpthread.so.0
#1  0x080ee61c in ?? ()
#2  0x00000000 in ?? ()
#3  0x0807fac7 in bnet_recv (bsock=0x80e86b8) at bnet.c:72
#4  0x08054479 in bget_dirmsg (bs=0x80e86b8) at getmsg.c:79
#5  0x0805b670 in msg_thread (arg=0x80e87b0) at msgchan.c:248
#6  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#7  0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#8  0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 18 (Thread 262153 (LWP 32706)):
#0  0xb7e875bb in read () from /lib/i686/libpthread.so.0
#1  0x080ed1bc in ?? ()
#2  0x00000000 in ?? ()
#3  0x0807fac7 in bnet_recv (bsock=0x80e1530) at bnet.c:72
#4  0x08054479 in bget_dirmsg (bs=0x80e1530) at getmsg.c:79
---Type <return> to continue, or q <return> to quit---
#5  0x0804cea9 in wait_for_job_termination (jcr=0x80e87b0) at backup.c:243
#6  0x0804dcc0 in do_backup (jcr=0x80e87b0) at backup.c:207
#7  0x08056fc9 in job_thread (arg=0x80e87b0) at job.c:215
#8  0x080592bb in jobq_server (arg=0x80bdc20) at jobq.c:444
#9  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#10 0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#11 0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 16 (Thread 229383 (LWP 32698)):
#0  0xb7e875bb in read () from /lib/i686/libpthread.so.0
#1  0x080eb6dc in ?? ()
#2  0x00000000 in ?? ()
#3  0x0807fac7 in bnet_recv (bsock=0x80efb08) at bnet.c:72
#4  0x08054479 in bget_dirmsg (bs=0x80efb08) at getmsg.c:79
#5  0x0804cea9 in wait_for_job_termination (jcr=0x80e3a98) at backup.c:243
#6  0x0804dcc0 in do_backup (jcr=0x80e3a98) at backup.c:207
#7  0x08056fc9 in job_thread (arg=0x80e3a98) at job.c:215
#8  0x080592bb in jobq_server (arg=0x80bdc20) at jobq.c:444
#9  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#10 0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#11 0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 13 (Thread 180230 (LWP 32690)):
#0  0xb7e875bb in read () from /lib/i686/libpthread.so.0
#1  0x080e769c in ?? ()
#2  0x00000000 in ?? ()
#3  0x0807fac7 in bnet_recv (bsock=0x80f2108) at bnet.c:72
#4  0x08054479 in bget_dirmsg (bs=0x80f2108) at getmsg.c:79
#5  0x0804cea9 in wait_for_job_termination (jcr=0x80ec4d8) at backup.c:243
#6  0x0804dcc0 in do_backup (jcr=0x80ec4d8) at backup.c:207
#7  0x08056fc9 in job_thread (arg=0x80ec4d8) at job.c:215
#8  0x080592bb in jobq_server (arg=0x80bdc20) at jobq.c:444
#9  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#10 0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#11 0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 12 (Thread 163848 (LWP 32687)):
#0  0xb7e875bb in read () from /lib/i686/libpthread.so.0
#1  0x080e1c5c in ?? ()
#2  0x00000000 in ?? ()
---Type <return> to continue, or q <return> to quit---
#3  0x0807fac7 in bnet_recv (bsock=0x80e0988) at bnet.c:72
#4  0x08054479 in bget_dirmsg (bs=0x80e0988) at getmsg.c:79
#5  0x0805b670 in msg_thread (arg=0x80de298) at msgchan.c:248
#6  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#7  0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#8  0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 10 (Thread 131076 (LWP 32681)):
#0  0xb7e8289b in __pthread_fork () from /lib/i686/libpthread.so.0
#1  0xb7ce81a8 in fork () from /lib/i686/libc.so.6
#2  0xb7e82954 in fork () from /lib/i686/libpthread.so.0
#3  0x080818b1 in open_bpipe (
    prog=0x80d9ba8 "/usr/local/bacula/sbin/bsmtp -h mail.uaf.edu -f \"(Bacula) 
[EMAIL PROTECTED]" -s \"Bacula: Backup OK of fmpserver-fd Full\" 
[EMAIL PROTECTED]", wait=120, mode=0x1 <Address 0x1 out of bounds>) at 
bpipe.c:90
#4  0x0808adc9 in open_mail_pipe (jcr=0x80df468, [EMAIL PROTECTED], 
d=0x80dff00) 
at message.c:378
#5  0x0808b1f8 in close_msg (jcr=0x80df468) at message.c:438
#6  0x08084e48 in free_common_jcr (jcr=0x80df468) at jcr.c:300
#7  0x08085bce in b_free_jcr (file=0x809def3 "jobq.c", line=524, 
jcr=0x80df468) at jcr.c:378
#8  0x080593fc in jobq_server (arg=0x80bdc20) at jobq.c:524
#9  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#10 0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#11 0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 8 (Thread 98309 (LWP 32674)):
#0  0xb7e875bb in read () from /lib/i686/libpthread.so.0
#1  0x080deb7c in ?? ()
#2  0x00000000 in ?? ()
#3  0x0807fac7 in bnet_recv (bsock=0x80e2d08) at bnet.c:72
#4  0x08054479 in bget_dirmsg (bs=0x80e2d08) at getmsg.c:79
#5  0x0804cea9 in wait_for_job_termination (jcr=0x80de298) at backup.c:243
#6  0x0804dcc0 in do_backup (jcr=0x80de298) at backup.c:207
#7  0x08056fc9 in job_thread (arg=0x80de298) at job.c:215
#8  0x080592bb in jobq_server (arg=0x80bdc20) at jobq.c:444
#9  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#10 0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#11 0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 4 (Thread 32771 (LWP 32639)):
#0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
#1  0x00000001 in ?? ()
---Type <return> to continue, or q <return> to quit---
#2  0xb7e84188 in __pthread_timedsuspend_new () from /lib/i686/libpthread.so.0
#3  0xb7e803e9 in pthread_cond_timedwait_relative () 
from /lib/i686/libpthread.so.0
#4  0x0809777a in watchdog_thread (arg=0x0) at watchdog.c:289
#5  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#6  0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#7  0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 3 (Thread 16386 (LWP 32638)):
#0  0xb7d174a1 in select () from /lib/i686/libc.so.6
#1  0x0000000b in ?? ()
#2  0x080d66bc in ?? ()
#3  0xb7adf234 in ?? ()
#4  0x00000000 in ?? ()
#5  0x08081167 in bnet_thread_server (addrs=0x0, max_clients=10, 
client_wq=0x80bdda0, 
    handle_client_request=0x8070c70 <handle_UA_client_request>) at 
bnet_server.c:154
#6  0x08070a58 in connect_thread (arg=0x80bff38) at ua_server.c:79
#7  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
#8  0xb7e81591 in pthread_start_thread_event () from /lib/i686/libpthread.so.0
#9  0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 2 (Thread 32769 (LWP 32637)):
#0  0xb7d1529a in poll () from /lib/i686/libc.so.6
#1  0xb7e80f00 in __pthread_manager () from /lib/i686/libpthread.so.0
#2  0xb7e811d5 in __pthread_manager_event () from /lib/i686/libpthread.so.0
#3  0xb7d1e36a in clone () from /lib/i686/libc.so.6

Thread 1 (Thread 16384 (LWP 32633)):
#0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
#1  0x00000000 in ?? ()
#2  0x0807de3c in bmicrosleep (sec=60, usec=-1073744136) at bsys.c:59
#3  0x0805eac0 in wait_for_next_job (one_shot_job_to_run=0x80cbac0 "") at 
scheduler.c:101
#4  0x0804c057 in main (argc=135003960, argv=0x80c0010) at dird.c:244
Segmentation fault



>
> On Friday 07 April 2006 03:57, Joshua Kugler wrote:
> > (gdb) run  -s -f -c ../etc/bacula-dir.conf
> > Starting program: /usr/local/bacula/sbin/bacula-dir -s -f
> > -c ../etc/bacula-dir.conf
> > [Please pardon the top post]
> >
> > OK, I compiled 1.36.3, and ran the director under gdb.  After some normal
> > execution, including some volume purging, I tried to start a bunch of
> > jobs like so:
> >
> > for ii in BackupCatalog fmpserver distance locus community elive admin1
> > communications1 communications2 communications3 grades1 libro otter
> > records1 registrar1 ruth sheri1 textbook1 textbook3 textbook4 curt
> > fiscalpro bob5 idbigblue odonnell webmaster webdev1
> > do echo -e "run $ii\nmod\n1\n2\nyes\nq\n"| ./bconsole
> > done
> >
> > It went on for a few OK, and then it died with the message shown below.
> > BTW, I was able to kill it like this twice.
> >
> > When bacula-dir crashed, it also left a few rows in the db with a client
> > ID of 0.
> >
> > I got this from running bacula-dir inside gdb.  (output from the thread
> > dump below).
> >
> > [Thread debugging using libthread_db enabled]
> > [New Thread 16384 (LWP 19065)]
> > [New Thread 32769 (LWP 19067)]
> > [New Thread 16386 (LWP 19068)]
> > [New Thread 32771 (LWP 19069)]
> > [New Thread 49156 (LWP 19072)]
> > [Thread 49156 (LWP 19072) exited]
> > [New Thread 65540 (LWP 19216)]
> > [New Thread 81925 (LWP 19219)]
> > [New Thread 98310 (LWP 19221)]
> > [Thread 65540 (LWP 19216) exited]
> > [New Thread 114692 (LWP 19234)]
> > [Thread 114692 (LWP 19234) exited]
> > [New Thread 131076 (LWP 19295)]
> > herodotus-dir: dird.c:438 Director's configuration file reread.
> > [Thread 131076 (LWP 19295) exited]
> > [New Thread 147460 (LWP 19300)]
> > [New Thread 163847 (LWP 19302)]
> > [New Thread 180232 (LWP 19305)]
> > [Thread 147460 (LWP 19300) exited]
> > [New Thread 196612 (LWP 19308)]
> > [New Thread 213001 (LWP 19312)]
> > [Thread 180232 (LWP 19305) exited]
> > [New Thread 229384 (LWP 19314)]
> > [New Thread 245770 (LWP 19319)]
> > Detaching after fork from child process 19321.
> > [New Thread 262155 (LWP 19323)]
> > [Thread 213001 (LWP 19312) exited]
> > [New Thread 278537 (LWP 19325)]
> > [New Thread 294924 (LWP 19330)]
> > [New Thread 311309 (LWP 19333)]
> > [Thread 245770 (LWP 19319) exited]
> > Cannot find thread 245770: invalid thread handle
> > (gdb)
> >
> > Here is the output from "thread apply all bt"
> >
> > (gdb) thread apply all bt
> >
> > Thread 30 (Thread 458763 (LWP 19883)):
> > #0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
> > #1  0x00000000 in ?? ()
> > #2  0x0807de3c in bmicrosleep (sec=0, usec=-1251095380) at bsys.c:59
> > #3  0x08057125 in create_unique_job_name (jcr=0x80f5d20,
> > base_name=0x80a59b5 "*Console*") at job.c:658
> > #4  0x08070ae0 in new_control_jcr (base_name=0x80a59b5 "*Console*",
> > job_type=-516) at ua_server.c:101
> > #5  0x08070c97 in handle_UA_client_request (arg=0x80d7518) at
> > ua_server.c:122 #6  0x08098426 in workq_server (arg=0x80bdda0) at
> > workq.c:347
> > #7  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
> > #8  0xb7e81591 in pthread_start_thread_event () from
> > /lib/i686/libpthread.so.0 #9  0xb7d1e36a in clone () from
> > /lib/i686/libc.so.6
> >
> > Thread 29 (Thread 442376 (LWP 19878)):
> > #0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
> > #1  0x00000000 in ?? ()
> > #2  0xb7e84188 in __pthread_timedsuspend_new () from
> > /lib/i686/libpthread.so.0 #3  0xb7e803e9 in
> > pthread_cond_timedwait_relative ()
> > from /lib/i686/libpthread.so.0
> > #4  0x080982e7 in workq_server (arg=0x80bdda0) at workq.c:322
> > #5  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
> > #6  0xb7e81591 in pthread_start_thread_event () from
> > /lib/i686/libpthread.so.0 #7  0xb7d1e36a in clone () from
> > /lib/i686/libc.so.6
> >
> > Thread 16 (Thread 229383 (LWP 19815)):
> > #0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
> > #1  0x00000000 in ?? ()
> > #2  0x0807de3c in bmicrosleep (sec=2, usec=-1236410932) at bsys.c:59
> > #3  0x0805955a in jobq_server (arg=0x80bdc20) at jobq.c:674
> > #4  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
> > #5  0xb7e81591 in pthread_start_thread_event () from
> > /lib/i686/libpthread.so.0 #6  0xb7d1e36a in clone () from
> > /lib/i686/libc.so.6
> >
> > Thread 14 (Thread 196618 (LWP 19807)):
> > #0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
> > #1  0x00000000 in ?? ()
> > #2  0x0807de3c in bmicrosleep (sec=2, usec=-1248997940) at bsys.c:59
> > #3  0x0805955a in jobq_server (arg=0x80bdc20) at jobq.c:674
> > #4  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
> > #5  0xb7e81591 in pthread_start_thread_event () from
> > /lib/i686/libpthread.so.0 #6  0xb7d1e36a in clone () from
> > /lib/i686/libc.so.6
> > ---Type <return> to continue, or q <return> to quit---
> >
> > Thread 11 (Thread 147462 (LWP 19798)):
> > #0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
> > #1  0x00000000 in ?? ()
> > #2  0x0807de3c in bmicrosleep (sec=2, usec=-1234309684) at bsys.c:59
> > #3  0x0805955a in jobq_server (arg=0x80bdc20) at jobq.c:674
> > #4  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
> > #5  0xb7e81591 in pthread_start_thread_event () from
> > /lib/i686/libpthread.so.0 #6  0xb7d1e36a in clone () from
> > /lib/i686/libc.so.6
> >
> > Thread 9 (Thread 114692 (LWP 19778)):
> > #0  0xb7e8289b in __pthread_fork () from /lib/i686/libpthread.so.0
> > #1  0xb7ce81a8 in fork () from /lib/i686/libc.so.6
> > #2  0xb7e82954 in fork () from /lib/i686/libpthread.so.0
> > #3  0x080818b1 in open_bpipe (prog=0x80ccdf8
> > "/usr/local/bacula/scripts/delete_catalog_backup", wait=0,
> >     mode=0x1 <Address 0x1 out of bounds>) at bpipe.c:90
> > #4  0x08056c99 in job_thread (arg=0x80de298) at job.c:262
> > #5  0x080592bb in jobq_server (arg=0x80bdc20) at jobq.c:444
> > #6  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
> > #7  0xb7e81591 in pthread_start_thread_event () from
> > /lib/i686/libpthread.so.0 #8  0xb7d1e36a in clone () from
> > /lib/i686/libc.so.6
> >
> > Thread 7 (Thread 81925 (LWP 19772)):
> > #0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
> > #1  0x00000000 in ?? ()
> > #2  0x0807de3c in bmicrosleep (sec=2, usec=-1232212532) at bsys.c:59
> > #3  0x0805955a in jobq_server (arg=0x80bdc20) at jobq.c:674
> > #4  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
> > #5  0xb7e81591 in pthread_start_thread_event () from
> > /lib/i686/libpthread.so.0 #6  0xb7d1e36a in clone () from
> > /lib/i686/libc.so.6
> >
> > Thread 4 (Thread 32771 (LWP 19615)):
> > #0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
> > #1  0x00000001 in ?? ()
> > #2  0xb7e84188 in __pthread_timedsuspend_new () from
> > /lib/i686/libpthread.so.0 #3  0xb7e803e9 in
> > pthread_cond_timedwait_relative ()
> > from /lib/i686/libpthread.so.0
> > #4  0x0809777a in watchdog_thread (arg=0x0) at watchdog.c:289
> > #5  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
> > #6  0xb7e81591 in pthread_start_thread_event () from
> > /lib/i686/libpthread.so.0 #7  0xb7d1e36a in clone () from
> > /lib/i686/libc.so.6
> > ---Type <return> to continue, or q <return> to quit---
> >
> > Thread 3 (Thread 16386 (LWP 19614)):
> > #0  0xb7d174a1 in select () from /lib/i686/libc.so.6
> > #1  0x0000000b in ?? ()
> > #2  0x080d66bc in ?? ()
> > #3  0xb7adf234 in ?? ()
> > #4  0x00000000 in ?? ()
> > #5  0x08081167 in bnet_thread_server (addrs=0x0, max_clients=10,
> > client_wq=0x80bdda0,
> >     handle_client_request=0x8070c70 <handle_UA_client_request>) at
> > bnet_server.c:154
> > #6  0x08070a58 in connect_thread (arg=0x80bff38) at ua_server.c:79
> > #7  0xb7e81421 in pthread_start_thread () from /lib/i686/libpthread.so.0
> > #8  0xb7e81591 in pthread_start_thread_event () from
> > /lib/i686/libpthread.so.0 #9  0xb7d1e36a in clone () from
> > /lib/i686/libc.so.6
> >
> > Thread 2 (Thread 32769 (LWP 19613)):
> > #0  0xb7d1529a in poll () from /lib/i686/libc.so.6
> > #1  0xb7e80f00 in __pthread_manager () from /lib/i686/libpthread.so.0
> > #2  0xb7e811d5 in __pthread_manager_event () from
> > /lib/i686/libpthread.so.0 #3  0xb7d1e36a in clone () from
> > /lib/i686/libc.so.6
> >
> > Thread 1 (Thread 16384 (LWP 19609)):
> > #0  0xb7e87db6 in nanosleep () from /lib/i686/libpthread.so.0
> > #1  0x00000000 in ?? ()
> > #2  0x0807de3c in bmicrosleep (sec=60, usec=-1073744136) at bsys.c:59
> > #3  0x0805eac0 in wait_for_next_job (one_shot_job_to_run=0x80cbac0 "") at
> > scheduler.c:101
> > #4  0x0804c057 in main (argc=135003960, argv=0x80c0010) at dird.c:244
> > Segmentation fault
> >
> > On Thursday 06 April 2006 17:10, Joshua Kugler wrote:
> > > [Disclaimer: I've searched the archives best I know how.  If you can
> > > point me to docs and/or messages I missed, that'd be great!]
> > >
> > > We've been using Bacula for over a year, and it has run great. 
> > > Recently, we got a nice disk-based 5.1TB array (Coraid AoE if you care)
> > > are working on implementing it with Bacula.  All the configuration has
> > > gone great, and we're going test runs.
> > >
> > > This is where we run into problems.
> > >
> > > If I fire off Full backups of all the clients, it will run OK for a
> > > while. Then at one point, I tried a command on bconsole, and it said
> > >
> > > 06-Apr 15:25 bconsole:  Error: bnet.c:403 Write error sending to
> > > Director daemon:herodotus.cde.uaf.edu:9101: ERR=Broken pipe
> > > [EMAIL PROTECTED] /usr/local/bacula/sbin]# ./bconsole
> > > Connecting to Director herodotus.cde.uaf.edu:9101
> > > 06-Apr 15:25 bconsole:  Fatal error: bnet.c:773 Unable to connect to
> > > Director daemon on herodotus.cde.uaf.edu:9101.
> > >
> > > A ps -Af shows *no* bacula-dir processes left running.  Top shows
> > > bacula-sd still grinding away, as well as some of the SSH tunnels.  I
> > > can still get to the network drive and do things like ls and du, so
> > > it's not lost communication.  Restarting bacula and doing status from
> > > bconsole shows no jobs running, but the database shows a bunch of jobs
> > > in JobStatus "R".
> > >
> > > The bacula (/var/bacula/working/log) shows nothing out of the ordinary.
> > >
> > > This is on Linux, with kernel 2.6.11-12mdksmp, Bacula 1.36.1, 1GB of
> > > memory. There is no dump, stack trace, or e-mail about the crash.
> > >
> > > I know there are more recent versions.  I don't have time right now to
> > > upgrade all my clients.  Should I try 1.36.3 before I throw in the
> > > towel? Any other ideas?  Am I hitting the race condition noted here:
> > > http://article.gmane.org/gmane.comp.sysutils.backup.bacula.general/1684
> > >2
> > >
> > > j----- k-----

-- 
Joshua Kugler                 PGP Key: http://pgp.mit.edu/
CDE System Administrator             ID 0xDB26D7CE
http://distance.uaf.edu/


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to