Hello,
As you know, current job scheduling has a few deficiencies, particular if for
some reason your backups get blocked (a bad tape driver or operator
intervention required), which can lead to a big pile of duplicate jobs being
scheduled.
We have previously discussed ways of fixing this, with some really good ideas.
I am now ready to take a stab at implementing it, and would like to present
the current design and let some of you help in the design process. I am
currently pretty busy with my own project and helping with two major projects
that are making very nice progress, so I would appreciate some input.
My current idea is to create a new "DuplicateJobs" resource and a new
Duplicate Jobs directive which would point to the duplicate jobs resource.
The reason for the resource is that there are just too many different
variations that it would require a lot of new directives, and it seems a
shame to add them to every Job.
My current design calls for a Duplicate Jobs resource that looks something
like the following:
DuplicateJobs {
Name = "xxx"
Allow = yes|no (no = default)
AllowHigherLevel = yes|no (no)
AllowLowerLevel = yes|no (no)
AllowSameLevel = yes|no
Cancel = Running | New (no)
CancelledStatus = Fail | Skip (fail)
Job Proximity = <time-interval> (0)
}
The first "Allow" directive is probably not needed, but it does make it more
complete. If this directive is set to yes, all the other directives would be
ignored, which would be the same as today and with no Duplicate Jobs
directive in the Job resource.
The AllowXXX directives are to try to define what job will be allowed to
continue when there is one job running or waiting and a new one arrives.
For example AllowHigherLevel = yes, would mean to allow the higher level job
to continue.
The Cancel directive specifies which job to cancel (the new job or the job
already there. I think there is probably a logic conflict between this
directive and the AllowXXX directives, but I have not thought this through
carefully enough.
The CancelledStatus is an attempt to tell Bacula to either fail one of the two
jobs or to Skip it, which means to kill it but without a lot of noise. Some
options I could think of here that are not yet clearly specified are:
Do not kill a running job in favor of a newly scheduled job.
Do not print any messages about cancelling a job (I don't particularly
like this idea).
Do not record any cancelled job in the catalog
...
Finally Job Proximity is to allow a bit of overlap. For example, if a job has
been running 20 minutes or ran 20 minutes ago, you might want to not apply
the rules.
As you can see, there is a lot of room for clarification of what should be
done, and also a need for a bit more functionality ... -- in other words a
bit more design is needed before beginning the implementation.
Comments?
Best regards,
Kern
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users