! type fast indeed:p Thank you for the detailed explanation.
I'll have to look more into the processing we're doing and it's requirements before proceeding:) Your information has be extremely helpful:) On Sun, Nov 4, 2012 at 7:42 AM, Mark Hahn <h...@mcmaster.ca> wrote: > Thanks, infoative: p >> I'll consider your advice. >> >> If i read correctly, it seems the answer to the question about programming >> was: yes, a program must be written to accommodate a cluster. Did i get >> you >> right? >> > > it depends what you mean. if you have a program which is written > so that it can be run from a script, then a cluster can immediately > let you run lots of them. if you're expecting a cluster to speed > up a single instance, then you'll probably be disappointed. > > in short, clustering doesn't speed up any of the computers in the cluster. > it just makes it more convenient to get multiple computers working. > if you want multiple computers to work on the same program, then someone > has to make it happen: divide up the work so each computer > and put together the results. > > suppose you're trying to detect a particular face in all your images. > you could have once machine searching an image, then going onto the next. > basically, that one node is running a simple scheduler that runs jobs: > lookforface face.png image0.png > lookforface face.png image1.png > lookforface face.png image2.png > ... > > if you want, you can divide up the work - send every other image to a > second machine. in general, this would mean that a scheduler reads from > that same list and dispatches one line (job) at a time to any > node that isn't already busy. when a job completes, that node gets > another job, and eventually all the work is done. > > "embarassingly parallel" just means you have enough images to keep all your > machines busy this way. > > if you don't have that many images, you might want to try to get more than > one machine working on the same image. a simple way to do that would be > to (imaginarily) divide each image into, say, quadrants, so 4 machines can > work on the same image (each getting a quarter of the image - with some > overlap so targets along the border don't get missed.) to be specific, > your list of jobs could be like this: > lookforface face.png image0.png 0 > lookforface face.png image0.png 1 > lookforface face.png image0.png 2 > lookforface face.png image0.png 3 > lookforface face.png image1.png 0 > lookforface face.png image1.png 1 > ... > where 'lookforface' only looks for the face in the specified quadrant of > the input image. the most obvious problem with this approach is that > 1-quadrant search may take too little time relative to the overhead of > setting up each job. which includes accessing face.png and image0.png, > even if only a quadrant of the latter is used. in general, this kind of > issue is called "load balance", and is really the single most fundamental > issue in HPC. > > if you wanted to pursue this direction, you could optimize by reducing > the cost of distributing the images. if image0.png is quite large, > then access through a shared filesystem might be efficient (if the FS > block size is comparable to 1/2 the width of one image row.) if image0.png > is smaller, then you could distribute that information "manually" by > running > a job which reads the image one one node and distributes quadrants to > other nodes. the obvious way to do this would be via MPI, which is pretty > friendly to matrices like decompressed images. this could even > operate on pieces smaller than a quadrant - in fact, you could divide the > work however finely you like. though as before, divide it too fine, and the > per-chunk overhead dominates your cost, destroying efficiency. > > note that this refinement has merely changed who/how the work is being > divided and data being communicated. in the simple case, work was divided > at the command/job/scheduler level and data transmitted by file. the more > fine-grained approach has subsumed some scheduling into your program, and > is communicating the data explicitly over MPI. > > basically, someone has to divide up work, and data has to flow to where > it's > used. you could take this further: a single MPI program that runs on all > nodes of the cluster at once and distributes work among MPI ranks. this > would be the most programming effort, but would quite possibly be the most > efficient. often, the amount of time needed to perform one unit of work > is not constant - this can cause problems if your division of labor is too > rigid. (consider the MPI-searches-4-quadrants approach: if one quadrant > takes very little time, then the CPU associated with that quadrant will be > twiddling its thumbs while the other quadrants get done.) > > I have, of course, completely fabricated this whole workflow. it becomes > more interesting when the work has other dimensions - for instance, if you > are searching 1M images for any of 1k faces. or if you are really hot to > use a convolution approach so will be fourier-transforming all the images > before performing any matching. or if you want to use GPUs, etc. > > TL;DR it's a good thing I type fast ;) > > in any case, your first step should be to look at the time taken to get > inputs to a node, and then how long it takes to do the computation. > life is easy if setup is fast and compute is long. that stuff is far more > important than choosing a particular scheduler or cluster package. > > regards, mark hahn. > > > ? 2012-11-4 ??6:11?"Mark Hahn" <h...@mcmaster.ca>??? >> >> I am currently researching the feasibility and process of establishing a >>> >>>> relatively small HPC cluster to speed up the processing of large amounts >>>> of >>>> digital images. >>>> >>>> >>> do you mean that smallness is a goal? or that you don't have a large >>> budget? >>> >>> After looking at a few HPC computing software solutions listed on the >>> >>>> Wikipedia comparison of cluster software page ( >>>> http://en.wikipedia.org/wiki/****Comparison_of_cluster_**software<http://en.wikipedia.org/wiki/**Comparison_of_cluster_software> >>>> <http://en.wikipedia.**org/wiki/Comparison_of_**cluster_software<http://en.wikipedia.org/wiki/Comparison_of_cluster_software>>) >>>> I still have >>>> >>>> only a rough understanding of how the whole system works. >>>> >>>> >>> there are several discrete functionalities: >>> - shared filesystem (if any) >>> - scheduling >>> - intra-job communication (if any; eg MPI) >>> - management/provisioning/****monitoring of nodes >>> >>> >>> IMO, anyone who claims to have "best practices" in this field is lying. >>> there are particular components that have certain strengths, but none of >>> them are great, and none universally appropriate. (it's also common >>> to conflate or "integrate" the second and fourth items - for that matter, >>> monitoring is often separated from provisioning.) >>> >>> 1. Do programs you wish to use via HPC platforms need to be written to >>> >>>> support HPC, and further, to support specific middleware using parallel >>>> programming or something like that? >>>> >>>> >>> "middleware" is generally a term from the enterprise computing >>> environment. >>> it basically means "get someone else to take responsibility for hard >>> bits", >>> and is a form of the classic commercial best practice of CYA. from an >>> HPC >>> perspective, there's the application and everything else. if you really >>> want, you can call the latter "middleware", but doing so is >>> uninformative. >>> >>> HPC covers a lot of ground. usually, people mean jobs will execute in a >>> batch environment (started from a commandline/script). OTOH HPC >>> sometimes >>> means what you might call "personal supercomputing", where an interactive >>> application runs in a usually-dedicated cluster (shared clusters tend to >>> have scheduling response times that make interactive use problematic.) >>> (shared clusters also give rise to the single most important value of >>> clusters: that they can interleave bursty demand. if everyone in your >>> department shares a cluster, it can be larger than any one group can >>> afford, and therefore all groups will be able to burst to higher >>> capacity. >>> this is why large, shared clusters are so successful. and, for that >>> matter, >>> why cloud services are successful.) >>> >>> you can do HPC with very little overhead. you will generally want a >>> shared >>> filesystem - potentially just a NAS box or existing server. you may not >>> bother with scheduling at all - let users pick which machine to run on, >>> for instance. that sounds crazy, but if you're the only one using it, >>> why >>> bother with a scheduler? HPC can also be done without inter-job >>> communication - if your jobs are single-node serial or threaded, for >>> instance. and you may not need any sort of management/provisioning, >>> depending on the stability of your nodes, environment, expected lifetime, >>> etc. >>> >>> in short, slapping linux onto a few boxes, set up ssh keys or hostbased >>> trust, have one or more of them NFS out some space, and you're cooking. >>> >>> OR >>> >>>> Can you run any program on top of the HPC cluster and have it's workload >>>> effectively distributed? --> How can this be done? >>>> >>>> >>> this is a common newbie question. a naive program (probably serial or >>> perhaps >>> multithreaded) will see no benefit from a cluster. clusters are just >>> plain >>> old machines. the benefit comes if you want throughput (jobs per time) >>> or >>> specifically program for distributed computation (classically with MPI). >>> it's common to use infiniband to accelerate this kind of job (as well as >>> provide the fastest possible IO.) >>> >>> 2. For something like digital image processing, where a huge amount of >>> >>>> relatively large images (14MB each) are being processed, will network >>>> >>>> >>> the main question is how much work a node will be doing per image. >>> >>> suppose you had an infinitely fast fileserver and gigabit connected >>> nodes: >>> transferring the image would take 10-15ms, so you would ideally spend >>> about the same amount of time processing an image. but in this case, you >>> should probably ask whether you can simply store images on the nodes in >>> the >>> first place. if you haven't thought about where the inputs are and how >>> fast they >>> can be gotten, then that will probably be your bottleneck. >>> >>> speed, or processing power be more of a limiting factor? Or would a >>> >>>> gigabit >>>> network suffice? >>>> >>>> >>> how long does a prospective node take to complete one work unit, >>> and how long does it take to transfer the files for one? >>> your speedup will be limited by whatever resource saturates first >>> (possibly your fileserver.) >>> >>> 3. For a relatively easy HPC platform what would you recommend? >>> >>>> >>>> >>> they are all crap. you should try not to spend on crap you don't need, >>> but ultimately it depends on how much expertise you have and/or how much >>> you value your time. any idiot can build a cluster from scratch using >>> fundamental open-source components, eventually. but if said idiot has to >>> learn filesystems, scheduling, provisioning, etc from scratch, it could >>> take quite a while. when you buy, you are buying crap, but it's crap >>> that may save you some time. >>> >>> don't count on commercial support being more than crappy. >>> >>> you should probably consider using a cloud service - this is just >>> commercial >>> outsourcing - more crap, but perhaps of value if, for instance, you don't >>> want to get your hands dirty hosting machines (amazon), etc. >>> >>> anything commercial in this space tends to be expensive. the license to >>> cover a crappy scheduler for a few hundred nodes, for instance will be >>> pretty >>> close to an FTE-year. renting a node from a cloud provider for a year >>> costs >>> about as much as buying a new node each year, etc. >>> >>> Again, I hope this is an ok place to ask such a question, if not please >>> >>>> >>>> >>> this is the place. though there are some fringe sects of HPC who tend to >>> subsist on more and/or different crap (such as clusters running windows.) >>> beowulf tends towards the low-crap end of things (linux, open packages.) >>> >>> regards, mark hahn. >>> >>> >> > -- > operator may differ from spokesperson. h...@mcmaster.ca > --
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf