On Monday 25 February 2008 18.47:50 [EMAIL PROTECTED] wrote:
> In the message dated: Sat, 23 Feb 2008 12:40:43 +0100,
> Kern Sibbald used the subject line
> <[Bacula-users] Improving job scheduling flexibility>
> and wrote:
>
> => Hello,
> =>
> => As you know, current job scheduling has a few deficiencies, particular
> if for => some reason your backups get blocked (a bad tape driver or
> operator => intervention required), which can lead to a big pile of
> duplicate jobs being => scheduled.
>
> Or if a job takes so long that it is still running when the next instance
> of the same job is launched (ie., a backup that takes more than 24 hours).
>
> [SNIP!]
> =>
> => My current idea is to create a new "DuplicateJobs" resource and a new
> => Duplicate Jobs directive which would point to the duplicate jobs
> resource.
>
> Sounds great!
>
> => The reason for the resource is that there are just too many different
> => variations that it would require a lot of new directives, and it seems a
> => shame to add them to every Job.
> =>
> => My current design calls for a Duplicate Jobs resource that looks
> something => like the following:
> =>
> => DuplicateJobs {
>
> [SNIP!]
>
> =>
> => Job Proximity = <time-interval> (0)
> =>
> => }
> =>
>
> [SNIP!]
>
> =>
> => Finally Job Proximity is to allow a bit of overlap. For example, if a
> job has => been running 20 minutes or ran 20 minutes ago, you might want to
> not apply => the rules.
>
> Could you elaborate on what this means to you a bit more?
I think I was confused and stated it backwards. Anyway, the Job Proximity
directive was proposed by David Boyes, so perhaps he could give us a
definitive definition :-)
>
> I see the distinction here being mainly in terms of jobs that take a "long"
> time vrs a "short" time. If the entire job normally takes 30 minutes, I
> don't really care whether there's a duplicate, and it doesn't matter to me
> if the duplicate starts 1 minute after the original or 29 minutes after.
>
> However, if the job normally takes 18 hours, then the conditions are very
> different. In this case, I really, really, really don't want a duplicate
> running if there's a lot of overlap--this would have a major effect on disk
> loads on the client, on network traffic, and on disk/cpu/media resource on
> the bacula server. However, if the original job is almost near completion
> when the duplicate is launched, then I don't want to cancel the duplicate.
> In this case, the reasoning is that canceling the duplicate would result in
> a long window with no backups, in an effort to close a small window of
> duplicate (simultaneous) backups running.
I can see the usefulness of the above, and don't want to rule it out, but for
this cut, it probably requires more time to implement than I have for the
current enhancement. This go around, I am really targetting the problem of
multiple jobs being scheduled and piling up waiting execution due to
something "blocking" or taking too long.
>
> Here's a very complicated proposal, which will almost certainly be
> rejected, that really leverages Bacula's database backend and gives a
> really powerful feature:
>
> if the job historically takes over $DURATION [minutes|hours|days]
> and the current job is at least $PERCENTAGE complete, then allow the
> duplicate to run, otherwise kill the duplicate
>
> in this case, $DURATION would be determined from database stats,
> as an average of previous runs of the same job at the same
> level.
>
> I could also see an algorithm that
> gives more weight to the duration of the most recent backups if
> the
> standard deviation of the average vrs. the most recent backups
> is
> greater than a specified value. This is because a given backup
> is
> more likely to take "almost as much" time as the most recent
> backup
> of the same level than as much time as a much earlier backup.
>
> similarly, the $PERCENTAGE value could be expressed as a range,
> incorporating the standard deviation in the backup duration
>
I think you have something there, so you might want to put the above into a
Feature Request. I don't think it will get implemented in the near future due
to the long list of big, important projects that we have, but it would be a
good way to ensure that the idea is not lost.
>
>
> [As an aside, I'd like to see this kind of predictive/AI capability put
> into more of bacula, particularly in the scheduling. It would be wonderful
> to use the historic records to allow bacula to schedule jobs most
> efficiently, in a way similar to Amanda, rather than hard-coding specific
> times in each job resource.]
Virtually everyone that I have talked to especially in companies says that
they do not like Amanda's way of scheduling jobs. That said, I don't rule
out doing something like they do, and certainly the new "Max Full Age"
directive goes in that direction.
However, at the current time, I would suggest if you would like AI features,
by all means turn of Bacula scheduling and implement a Perl script that does
the scheduling. After you have a bit of experience with your system, I would
be really interested in hearing about it. I suspect that you will find that
it takes a lot of work and many iterations to get AI type features working
correctly -- at least that would be the case for me.
Best regards,
Kern
>
> =>
> => As you can see, there is a lot of room for clarification of what should
> be => done, and also a need for a bit more functionality ... -- in other
> words a => bit more design is needed before beginning the implementation.
> =>
> => Comments?
> =>
> => Best regards,
> =>
> => Kern
> =>
>
> ----
> Mark Bergman [EMAIL PROTECTED] 215-662-7310
> System Administrator Section of Biomedical Image Analysis
> Department of Radiology University of Pennsylvania
> PGP Key at: https://www.rad.upenn.edu/sbia/bergman
>
>
>
> The information contained in this e-mail message is intended only for the
> personal and confidential use of the recipient(s) named above. If the
> reader of this message is not the intended recipient or an agent
> responsible for delivering it to the intended recipient, you are hereby
> notified that you have received this document in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> us immediately by e-mail, and delete the original message.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users