I forgot to add that power capacity per rack might have something to do with it, too. I don't remember the number of PDUs in those racks, or the power input to each one (1-phase 60-amp, etc.)

On 10/10/19 1:09 PM, Prentice Bisbal wrote:

It's 4 racks of 10, and one rack of 6. For a total of 5 racks, not counting the storage system.

I believe  this is because of power/cooling limitations of the air-cooled systems. We have water-cooled rear-door heat exchangers, but they're only good up to about 35 KW/rack. Since we have 4 GPUs per server these things are consuming more power and putting out more heat than your average 1U pizza-box or blade server. Bill can answer more authoritatively, since he was involved in those discussions.

--
Prentice

On 10/10/19 12:57 PM, Scott Atchley wrote:
That is better than 80% peak, nice.

Is it three racks of 15 nodes? Or two racks of 18 and 9 in the third rack?

You went with a single-port HCA per socket and not the shared, dual-port HCA in the shared PCIe slot?

On Thu, Oct 10, 2019 at 8:48 AM Bill Wichser <b...@princeton.edu <mailto:b...@princeton.edu>> wrote:

    Thanks for the kind words.  Yes, we installed more like a
    mini-Sierra
    machine which is air cooled.  There are 46 nodes of the IBM
    AC922, two
    socket, 4 V100 where each socket uses the SMT threading x4. So
    two 16
    core chips, 32/node, 128 threads per node.  The GPUs all use NVLink.

    There are two EDR connections per host, each tied to a CPU, 1:1
    per rack
    of 12 and 2:1 between racks.  We have a 2P scratch filesystem
    running
    GPFS.  Each node also has a 3T NVMe card as well for local scratch.

    And we're running Slurm as our scheduler.

    We'll see if it makes the top500 in November.  It fits there
    today but
    who knows what else got on there since June.  With the help of
    nVidia we
    managed to get 1.09PF across 45 nodes.

    Bill

    On 10/10/19 7:45 AM, Michael Di Domenico wrote:
    > for those that may not have seen
    >
    >
    
https://insidehpc.com/2019/10/traverse-supercomputer-to-accelerate-fusion-research-at-princeton/
    >
    > Bill Wischer and Prentice Bisbal are frequent contributors to the
    > list, Congrats on the acquisition.  Its nice to see more HPC
    expansion
    > in our otherwise barren hometown... :)
    >
    > Maybe one of them will pass along some detail on the machine...
    > _______________________________________________
    > Beowulf mailing list, Beowulf@beowulf.org
    <mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
    > To change your subscription (digest mode or unsubscribe) visit
    https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
    >
    _______________________________________________
    Beowulf mailing list, Beowulf@beowulf.org
    <mailto:Beowulf@beowulf.org> sponsored by Penguin Computing
    To change your subscription (digest mode or unsubscribe) visit
    https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list,Beowulf@beowulf.org  sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) 
visithttps://beowulf.org/cgi-bin/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to