A friend ask me to pass this along. Figured some folks on this list
might be interested.
https://broadinstitute.avature.net/en_US/careers/JobDetail/HPC-Principal-System-Engineer/17773
-Paul Edmon-
___
Beowulf mailing list, Beowulf@beowulf.org
Does anyone have a handy script or epilog script that you run to clean
up fuse mounts that users may have made during a job?
-Paul Edmon-
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest
with the current version of slurm it should be much
faster as things have really come a long way over the past decade with
Slurm.
-Paul Edmon-
On 3/10/2022 11:39 AM, Lohit Valleru via Beowulf wrote:
Hello Everyone,
I wanted to ask if there is anyone who could explain me the benefits
of movi
/#Slurm_partitions
-Paul Edmon-
On 1/24/2022 2:59 PM, Tom Harvill via Beowulf wrote:
Thank you Mr. Edmon,
The link you provided is comprehensive and well-written. However, I
don't see the scheduler configured half-life time length. Do you know
what it is? And what is your clusters' maximum j
Here is our fairshare policy doc:
https://docs.rc.fas.harvard.edu/kb/fairshare/ We use the classic
fairshare here.
-Paul Edmon-
On 1/24/2022 2:17 PM, Tom Harvill wrote:
Hello,
We use a 'fair share' feature of our scheduler (SLURM) and have our
decay half-life (the time
Oh you will also need a IB subnet manager (opensm) running since you
have an unmanaged switch. You can start this on one of the compute
nodes. I would probably start up 2 so you have redundancy.
-Paul Edmon-
On 10/20/2021 6:08 AM, leo camilo wrote:
I have recently acquired a few ConnectX
u can provide commandline options to ensure that.
5. Test and verify it is working.
-Paul Edmon-
On 10/20/2021 6:08 AM, leo camilo wrote:
I have recently acquired a few ConnectX-3 cards and an unmanaged IB
switch (IS5022) to upgrade my department's beowulf cluster.
Thus far, I have be
I guess the question is for a parallel filesystem how do you make sure
you have 0'd out the file with out borking the whole filesystem since
you are spread over a RAID set and could be spread over multiple hosts.
-Paul Edmon-
On 9/29/2021 10:32 AM, Scott Atchley wrote:
For our users that
Yeah, that's what we were surmising. But paranoia and compliance being
what it is we were curious what others were doing.
-Paul Edmon-
On 9/29/2021 10:32 AM, Renfro, Michael wrote:
I have to wonder if the intent of the DUA is to keep physical media
from winding up in the wrong hands.
The former. We are curious how to selectively delete data from a
parallel filesystem. For example we commonly use Lustre, ceph, and
Isilon in our environment. That said if other types allow for easier
destruction of selective data we would be interested in hearing about it.
-Paul Edmon
es of filesystems to people generally use for this and how do
people ensure destruction? Do these types of DUA's preclude certain
storage technologies from consideration or are there creative ways to
comply using more common scalable filesystems?
Thanks in advance for the info.
-
https://www.rc.fas.harvard.edu/about/employment/
-Paul Edmon-
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo
We haven't had any problems with plugging FDR stuff into and EDR
switch. It does downgrade the connections but still works.
-Paul Edmon-
On 3/2/2021 6:44 AM, Darren Wise wrote:
Heya,
I do very much the same QSFP28-EDR 100G adapter cards into QSFP+-FDR
56G just with the use of a cab
Slurm is definitely still under active development with a vibrant
community. SchedMD is the one mainly driving its development. In fact
the 20.11 version just came out with some nice new features like
scrontab, which I am super excited for.
-Paul Edmon-
On 11/29/2020 11:26 PM, Lux, Jim (US
fpsync
for all our large scale data movement here and Globus for external
transfers.
-Paul Edmon-
On 1/2/20 10:45 AM, Joe Landman wrote:
On 1/2/20 10:26 AM, Michael Di Domenico wrote:
does anyone know or has anyone gotten rsync to push wire speed
transfers of big files over 10G links? i
card running simultaneously?
-Paul Edmon-
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
with
Fortran.
-Paul Edmon-
On 11/29/18 10:09 AM, Nathan Moore wrote:
I've probably mentioned this before. If a student only has one
programming course, teaching fortran feels like malpractice, however,
this book is awesome!
Classical Fortran, Kupferschmid
https://www.crcpress.com
tool that's best for the job. That's the moral of
the story.
-Paul Edmon-
On 11/28/2018 12:17 PM, Robert G. Brown wrote:
On Wed, 28 Nov 2018, Paul Edmon wrote:
Once C has native arrays and orders them properly, then we can talk :).
Yeah, like this. That's really the big diffe
does well it does very well, and it still does very
well.
Once C has native arrays and orders them properly, then we can talk :).
-Paul Edmon-
On 11/28/18 11:36 AM, Peter St. John wrote:
Maybe I'm being too serious but in the old days, Fortran was the most
mature, maintained compil
Fortran is and remains an awesome language. More people should use it:
https://wordsandbuttons.online/fortran_is_still_a_thing.html
-Paul Edmon-
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription
cific architectures we tell them to start up an
interactive session on the hardware they want to run on to build.
-Paul Edmon-
On 10/23/18 12:15 PM, Ryan Novosielski wrote:
Hi there,
I realize this may not apply to all cluster setups, but I’m curious what other
sites do with regard to sof
an problem anymore. However it has
made us very gun shy about trying Gluster again. Instead we've decided
to use Ceph as we've gained a bunch of experience with Ceph in our
OpenNebula installation.
-Paul Edmon-
On 07/24/2018 11:02 AM, John Hearns via Beowulf wrote:
Paul, thanks fo
nd in ones own
environment.
-Paul Edmon-
On 07/24/2018 10:31 AM, John Hearns via Beowulf wrote:
Forgive me for saying this, but the philosophy for software defined
storage such as CEPH and Gluster is that forklift style upgrades
should not be necessary.
When a storage server is to be retire
you also have to have the budget to buy the new hardware.
Right now we are just exploring our options.
-Paul Edmon-
On 07/24/2018 04:52 AM, Jörg Saßmannshausen wrote:
Hi Paul,
with a file system being 93% full, in my humble opinion it would make sense to
increase the underlying hardware c
el for the IEEL appliance we have been running.
Odds are you systems are fine as they aren't taking quite the pounding
ours is. The problem doesn't happen that frequently.
-Paul Edmon-
On 07/23/2018 02:03 PM, Michael Di Domenico wrote:
On Mon, Jul 23, 2018 at 1:34 PM, Paul Edmon wro
scale linearly with size. They all have the
same hardware.
The head nodes are Dell R620's while the shelves are M3420 (mds) and
M3260 (oss). The MDT is 2.2T with 466G used and 268M inodes used. Each
OST is 30T with each OSS hosting 6. The filesystem itself is 93% full.
-Paul Edmon-
On
est the upgrade path
before committing to upgrading our larger systems. One of the questions
we had though was if we needed to run e2fsck before/after the upgrade as
that could add significant time to the outage for that to complete.
-Paul Edmon-
On 07/23/2018 01:18 PM, Jeff Johnson wrote:
My apologies I meant 2.5.34 not 2.6.34. We'd like to get up to 2.10.4
which is what our clients are running. Recently we upgraded our cluster
to CentOS7 which necessitated the client upgrade. Our storage servers
though stayed behind on 2.5.34.
-Paul Edmon-
On 07/23/2018 01:00 PM,
your wisdom.
-Paul Edmon-
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
d
on your single node/core users tolerances for being requeued.
-Paul Edmon-
On 06/08/2018 03:55 AM, John Hearns via Beowulf wrote:
Chris, good question. I can't give a direct asnwer there, but let me
share my experiences.
In the past I managed SGI ICE clusters and a large memory UV sy
So for general monitoring of the cluster usage we use:
https://github.com/fasrc/slurm-diamond-collector
and pipe to Graphana. We also use XDMod:
http://open.xdmod.org/7.0/index.html
As for specific node alerting, we use the old standby of Nagios.
-Paul Edmon-
On 10/7/2017 8:21 AM, Josh
We run both CentOS 6 and 7 here for our install of slurm. There has
been no problems with using slurm on either simultaneously.
-Paul Edmon-
On 09/18/2017 09:11 AM, Mikhail Kuzminsky wrote:
In message from Christopher Samuel (Mon, 18
Sep 2017 16:03:47 +1000):
...
The best info is in the
We have a number of openings here at Harvard FAS RC. If you are
interested please check out our employment page for details:
https://www.rc.fas.harvard.edu/about/employment/
-Paul Edmon-
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by
33 matches
Mail list logo