Hi,
You are forgetting that in real Grid deployments, the majority of the
wait time will be in queue wait times in batch schedulers. For example,
in some logs I looked at from 2005 from SDSC, I recall seeing queue wait
times of 6 hours on average over a 1 year period. So, having some extra
latency on the order of 1~60 seconds is not a big deal when your average
job lengths are hours, or more. Now, what you are asking for is
interactive response times (ideally <1sec). The only way you will
achieve that kind of response time is through multi-level scheduling, or
via dedicated resources (where the resources are always on and ready to
serve your requests). The multi-level scheduling is referring to
acquiring resources in bulk, where the latency is not so critical, but
then managing those resources with more latency sensitive techniques.
In our work with Falkon, we are able to get sub 1 second latencies for
fine grained applications via this multi-level scheduling approach.
Others probably have other similar techniques to enable this. With the
high cost of submitting a GRAM job, from the GT security overheads, to
the polling intervals of GRAM, to the batch scheduler overheads, to the
polling intervals of the LRM, to the queue times due to contention, I
don't believe you will be able to use GRAM in a naive sense for
interactive applications, where the response you need is in the sub 1
sec range. If you want more info on Falkon, see
http://dev.globus.org/wiki/Incubator/Falkon.
Ioan
Arthur Carlson wrote:
In the thread "Globus not for real-time application?", a number of
users discuss whether it is realistic or not to get latencies below 1
second. Sounds like paradise. I am seeing latencies of up to a minute!
My workstation, gavosrv1.mpe.mpg.de, not the newest anymore, has GTK
4.0.5 installed. When I use globusrun-ws to go from this machine back
to itself, ... but just look:
[EMAIL PROTECTED] ~]$ time globusrun-ws -submit -s -F gavosrv1 -c
/bin/true
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:52f0f962-54e1-11dd-a56f-0007e914d571
Termination time: 07/19/2008 15:51 GMT
Current job state: Active
Current job state: CleanUp-Hold
Current job state: CleanUp
Current job state: Done
Destroying job...Done.
Cleaning up any delegated credentials...Done.
real 0m24.327s
user 0m1.242s
sys 0m0.113s
Note that "user" and "sys" times are reasonable. Almost all of this
time passes between "CleanUp" and "Done". It can't just be checking
credentials because gsissh is done in a jiffy:
[EMAIL PROTECTED] ~]$ time gsissh -p 2222 gavosrv1
/bin/true
real 0m0.649s
user 0m0.134s
sys 0m0.020s
Maybe that is already enough for someone to see where the problem
lies. I can also point out that all (at least many) of the machines in
our grid (AstroGrid-D) seem to be affected, but to varying degrees.
Here is a little matrix of tests:
from gavosrv1.mpe.mpg.de to gavosrv1.mpe.mpg.de: 0m27.235s
from gavosrv1.mpe.mpg.de to titan.ari.uni-heidelberg.de: 0m14.324s
from gavosrv1.mpe.mpg.de to udo-gt03.grid.tu-dortmund.de: 0m8.823s
from titan to gavosrv1.mpe.mpg.de: 0m57.208s
from titan to titan.ari.uni-heidelberg.de: 0m16.875s
from titan to udo-gt03.grid.tu-dortmund.de: 0m27.225s
from udo-gt03 to gavosrv1.mpe.mpg.de: 1m5.221s
from udo-gt03 to titan.ari.uni-heidelberg.de: 0m12.905s
from udo-gt03 to udo-gt03.grid.tu-dortmund.de: 0m6.952s
Please tell me I am doing something really stupid. For production of
my application even a minute of latency is not a big deal, but it's a
pain during development and debugging. Right now I am using gsissh
instead of globusrun-ws just to work around this.
Thank for the lift,
Art Carlson
AstroGrid-D Project
Max-Planck-Institute für extraterrestrische Physik, Garching, Germany
--
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: [EMAIL PROTECTED]
Web: http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================