Beowulf - Basics

CJ O'Reilly Sat, 03 Nov 2012 17:00:39 -0700

! type fast indeed:p

Thank you for the detailed explanation.


I'll have to look more into the processing we're doing and it's
requirements before proceeding:)
Your information has be extremely helpful:)


On Sun, Nov 4, 2012 at 7:42 AM, Mark Hahn <h...@mcmaster.ca> wrote:

> Thanks, infoative: p
>> I'll consider your advice.
>>
>> If i read correctly, it seems the answer to the question about programming
>> was: yes, a program must be written to accommodate a cluster. Did i get
>> you
>> right?
>>
>
> it depends what you mean.  if you have a program which is written
> so that it can be run from a script, then a cluster can immediately
> let you run lots of them.  if you're expecting a cluster to speed
> up a single instance, then you'll probably be disappointed.
>
> in short, clustering doesn't speed up any of the computers in the cluster.
> it just makes it more convenient to get multiple computers working.
> if you want multiple computers to work on the same program, then someone
> has to make it happen: divide up the work so each computer
> and put together the results.
>
> suppose you're trying to detect a particular face in all your images.
> you could have once machine searching an image, then going onto the next.
> basically, that one node is running a simple scheduler that runs jobs:
>         lookforface face.png image0.png
>         lookforface face.png image1.png
>         lookforface face.png image2.png
>         ...
>
> if you want, you can divide up the work - send every other image to a
> second machine.  in general, this would mean that a scheduler reads from
> that same list and dispatches one line (job) at a time to any
> node that isn't already busy.  when a job completes, that node gets
> another job, and eventually all the work is done.
>
> "embarassingly parallel" just means you have enough images to keep all your
> machines busy this way.
>
> if you don't have that many images, you might want to try to get more than
> one machine working on the same image.  a simple way to do that would be
> to (imaginarily) divide each image into, say, quadrants, so 4 machines can
> work on the same image (each getting a quarter of the image - with some
> overlap so targets along the border don't get missed.)  to be specific,
> your list of jobs could be like this:
>         lookforface face.png image0.png 0
>         lookforface face.png image0.png 1
>         lookforface face.png image0.png 2
>         lookforface face.png image0.png 3
>         lookforface face.png image1.png 0
>         lookforface face.png image1.png 1
>         ...
> where 'lookforface' only looks for the face in the specified quadrant of
> the input image.  the most obvious problem with this approach is that
> 1-quadrant search may take too little time relative to the overhead of
> setting up each job.  which includes accessing face.png and image0.png,
> even if only a quadrant of the latter is used.  in general, this kind of
> issue is called "load balance", and is really the single most fundamental
> issue in HPC.
>
> if you wanted to pursue this direction, you could optimize by reducing
> the cost of distributing the images.  if image0.png is quite large,
> then access through a shared filesystem might be efficient (if the FS
> block size is comparable to 1/2 the width of one image row.)  if image0.png
> is smaller, then you could distribute that information "manually" by
> running
> a job which reads the image one one node and distributes quadrants to
> other nodes.  the obvious way to do this would be via MPI, which is pretty
> friendly to matrices like decompressed images.  this could even
> operate on pieces smaller than a quadrant - in fact, you could divide the
> work however finely you like.  though as before, divide it too fine, and the
> per-chunk overhead dominates your cost, destroying efficiency.
>
> note that this refinement has merely changed who/how the work is being
> divided and data being communicated.  in the simple case, work was divided
> at the command/job/scheduler level and data transmitted by file.  the more
> fine-grained approach has subsumed some scheduling into your program, and
> is communicating the data explicitly over MPI.
>
> basically, someone has to divide up work, and data has to flow to where
> it's
> used.  you could take this further: a single MPI program that runs on all
> nodes of the cluster at once and distributes work among MPI ranks.  this
> would be the most programming effort, but would quite possibly be the most
> efficient.  often, the amount of time needed to perform one unit of work
> is not constant - this can cause problems if your division of labor is too
> rigid.  (consider the MPI-searches-4-quadrants approach: if one quadrant
> takes very little time, then the CPU associated with that quadrant will be
> twiddling its thumbs while the other quadrants get done.)
>
> I have, of course, completely fabricated this whole workflow.  it becomes
> more interesting when the work has other dimensions - for instance, if you
> are searching 1M images for any of 1k faces.  or if you are really hot to
> use a convolution approach so will be fourier-transforming all the images
> before performing any matching.  or if you want to use GPUs, etc.
>
> TL;DR it's a good thing I type fast ;)
>
> in any case, your first step should be to look at the time taken to get
> inputs to a node, and then how long it takes to do the computation.
> life is easy if setup is fast and compute is long.  that stuff is far more
> important than choosing a particular scheduler or cluster package.
>
> regards, mark hahn.
>
>
>  ? 2012-11-4 ??6:11?"Mark Hahn" <h...@mcmaster.ca>???
>>
>>  I am currently researching the feasibility and process of establishing a
>>>
>>>> relatively small HPC cluster to speed up the processing of large amounts
>>>> of
>>>> digital images.
>>>>
>>>>
>>> do you mean that smallness is a goal?  or that you don't have a large
>>> budget?
>>>
>>>  After looking at a few HPC computing software solutions listed on the
>>>
>>>> Wikipedia comparison of cluster software page (
>>>> http://en.wikipedia.org/wiki/****Comparison_of_cluster_**software<http://en.wikipedia.org/wiki/**Comparison_of_cluster_software>
>>>> <http://en.wikipedia.**org/wiki/Comparison_of_**cluster_software<http://en.wikipedia.org/wiki/Comparison_of_cluster_software>>)
>>>> I still have
>>>>
>>>> only a rough understanding of how the whole system works.
>>>>
>>>>
>>> there are several discrete functionalities:
>>> - shared filesystem (if any)
>>> - scheduling
>>> - intra-job communication (if any; eg MPI)
>>> - management/provisioning/****monitoring of nodes
>>>
>>>
>>> IMO, anyone who claims to have "best practices" in this field is lying.
>>> there are particular components that have certain strengths, but none of
>>> them are great, and none universally appropriate.  (it's also common
>>> to conflate or "integrate" the second and fourth items - for that matter,
>>> monitoring is often separated from provisioning.)
>>>
>>>  1. Do programs you wish to use via HPC platforms need to be written to
>>>
>>>> support HPC, and further, to support specific middleware using parallel
>>>> programming or something like that?
>>>>
>>>>
>>> "middleware" is generally a term from the enterprise computing
>>> environment.
>>> it basically means "get someone else to take responsibility for hard
>>> bits",
>>> and is a form of the classic commercial best practice of CYA.  from an
>>> HPC
>>> perspective, there's the application and everything else.  if you really
>>> want, you can call the latter "middleware", but doing so is
>>> uninformative.
>>>
>>> HPC covers a lot of ground.  usually, people mean jobs will execute in a
>>> batch environment (started from a commandline/script).  OTOH HPC
>>> sometimes
>>> means what you might call "personal supercomputing", where an interactive
>>> application runs in a usually-dedicated cluster (shared clusters tend to
>>> have scheduling response times that make interactive use problematic.)
>>> (shared clusters also give rise to the single most important value of
>>> clusters: that they can interleave bursty demand.  if everyone in your
>>> department shares a cluster, it can be larger than any one group can
>>> afford, and therefore all groups will be able to burst to higher
>>> capacity.
>>> this is why large, shared clusters are so successful.  and, for that
>>> matter,
>>> why cloud services are successful.)
>>>
>>> you can do HPC with very little overhead.  you will generally want a
>>> shared
>>> filesystem - potentially just a NAS box or existing server.  you may not
>>> bother with scheduling at all - let users pick which machine to run on,
>>> for instance.  that sounds crazy, but if you're the only one using it,
>>> why
>>> bother with a scheduler?  HPC can also be done without inter-job
>>> communication - if your jobs are single-node serial or threaded, for
>>> instance.  and you may not need any sort of management/provisioning,
>>> depending on the stability of your nodes, environment, expected lifetime,
>>> etc.
>>>
>>> in short, slapping linux onto a few boxes, set up ssh keys or hostbased
>>> trust, have one or more of them NFS out some space, and you're cooking.
>>>
>>>  OR
>>>
>>>> Can you run any program on top of the HPC cluster and have it's workload
>>>> effectively distributed? --> How can this be done?
>>>>
>>>>
>>> this is a common newbie question.  a naive program (probably serial or
>>> perhaps
>>> multithreaded) will see no benefit from a cluster.  clusters are just
>>> plain
>>> old machines.  the benefit comes if you want throughput (jobs per time)
>>> or
>>> specifically program for distributed computation (classically with MPI).
>>> it's common to use infiniband to accelerate this kind of job (as well as
>>> provide the fastest possible IO.)
>>>
>>>  2. For something like digital image processing, where a huge amount of
>>>
>>>> relatively large images (14MB each) are being processed, will network
>>>>
>>>>
>>> the main question is how much work a node will be doing per image.
>>>
>>> suppose you had an infinitely fast fileserver and gigabit connected
>>> nodes:
>>> transferring the image would take 10-15ms, so you would ideally spend
>>> about the same amount of time processing an image.  but in this case, you
>>> should probably ask whether you can simply store images on the nodes in
>>> the
>>> first place.  if you haven't thought about where the inputs are and how
>>> fast they
>>> can be gotten, then that will probably be your bottleneck.
>>>
>>>  speed, or processing power be more of a limiting factor? Or would a
>>>
>>>> gigabit
>>>> network suffice?
>>>>
>>>>
>>> how long does a prospective node take to complete one work unit,
>>> and how long does it take to transfer the files for one?
>>> your speedup will be limited by whatever resource saturates first
>>> (possibly your fileserver.)
>>>
>>>  3. For a relatively easy HPC platform what would you recommend?
>>>
>>>>
>>>>
>>> they are all crap.  you should try not to spend on crap you don't need,
>>> but ultimately it depends on how much expertise you have and/or how much
>>> you value your time.  any idiot can build a cluster from scratch using
>>> fundamental open-source components, eventually.  but if said idiot has to
>>> learn filesystems, scheduling, provisioning, etc from scratch, it could
>>> take quite a while.  when you buy, you are buying crap, but it's crap
>>> that may save you some time.
>>>
>>> don't count on commercial support being more than crappy.
>>>
>>> you should probably consider using a cloud service - this is just
>>> commercial
>>> outsourcing - more crap, but perhaps of value if, for instance, you don't
>>> want to get your hands dirty hosting machines (amazon), etc.
>>>
>>> anything commercial in this space tends to be expensive.  the license to
>>> cover a crappy scheduler for a few hundred nodes, for instance will be
>>> pretty
>>> close to an FTE-year.  renting a node from a cloud provider for a year
>>> costs
>>> about as much as buying a new node each year, etc.
>>>
>>>  Again, I hope this is an ok place to ask such a question, if not please
>>>
>>>>
>>>>
>>> this is the place.  though there are some fringe sects of HPC who tend to
>>> subsist on more and/or different crap (such as clusters running windows.)
>>> beowulf tends towards the low-crap end of things (linux, open packages.)
>>>
>>> regards, mark hahn.
>>>
>>>
>>
> --
> operator may differ from spokesperson.              h...@mcmaster.ca
>



--

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Digital Image Processing via HPC/Cluster/Beowulf - Basics

Reply via email to