this might sound like a bit of an oddify, but does anyone know if
there's a library out there that will let me override malloc calls to
memory and direct them to a filesystem instead? ie using the
filesystem as memory instead of ram for a program. ideally something
i can LD_PRELOAD on top of a st
sadly most people still use printf's to debug C code. but there are
some parallel debuggers on the market like Totalview, but it's pricey
depending on how man ranks you want to spin up under the debugger
On Thu, Jan 16, 2025 at 7:48 AM Alexandre Ferreira Ramos via Beowulf
wrote:
>
> Hi all, I ho
ere doing around the time when the hang occurred. This is
> expensive and you'll need to make sure you disable your changelogs after
> the fact or you'll drive your MDS out of space in the long-term.
>
> Best,
>
> ellis
>
> On 7/15/24 11:01, Michael DiDomenico wrot
e()? When the processes
>> hang, have you tried using something like py-spy and/or gdb to get a stack
>> trace of where in the software stack it’s hung?
>>
>> > Date: Thu, 11 Jul 2024 12:25:05 -0400
>> > From: Michael DiDomenico
>> > To: Beowulf
t torch.save()? When the processes
> hang, have you tried using something like py-spy and/or gdb to get a stack
> trace of where in the software stack it’s hung?
>
> > Date: Thu, 11 Jul 2024 12:25:05 -0400
> > From: Michael DiDomenico
> > To: Beowulf Mailing List
>
this might be a little out of left field, but for those supporting
large machines (ie 3-5k+ nodes) and multi-generational, how much
storage space do you allocate for on site spares?
my org is in the process of designing a new data center space and i
find myself having to fight for every sq ft of s
i have a strange problem, but honestly i'm not sure where the issue
is. we have users running LLM models through pytorch. part of the
process saves off checkpoints at periodic intervals. when the
checkpoint files are being written we can see in the logs the pytorch
writing out the save files fro
aybe it's not even a power limit per se, but DLC
is pretty complicated with all the piping/manifolds/connectors/CDU's, does
there come a point where its just not worth it unless it's a big custom
solution like the HPE stuff
On Sun, Mar 24, 2024 at 1:46 PM Scott Atchley
wrote
to answer some of my own questions :) and anyone else interested
https://dug.com/dug-cool/
https://dug.com/wp-content/uploads/2024/03/DUG-Cool-spec-sheet_240319.pdf
On Sat, Mar 23, 2024 at 10:17 AM Michael DiDomenico
wrote:
> i caught this on linkedin the other day. i'm not su
i caught this on linkedin the other day. i'm not sure if Dr Midgely is
still on the list or not. If he is, i was wondering if he could shed some
technical details on the installation and since it's been a few years since
DUG first started with immersion what his thoughts are now versus then
http
Maybe we should come with some kind of standard/wording/what-have-you to
post such. I have some open positions as well. might liven the list up a
little too... :)
On Thu, Feb 22, 2024 at 7:45 PM Douglas Eadline
wrote:
>
> > I've always thought employment opps were fine, but e-mails trying to
>
On Mon, 13 Nov 2023 at 15:35, Michael DiDomenico
> wrote:
>
>> unfortunately, it looks like registration is full... :(
>>
>>
>> On Mon, Nov 13, 2023 at 4:34 AM Jörg Saßmannshausen <
>> sassy-w...@sassy.formativ.net> wrote:
>>
>>> Dear all,
&g
unfortunately, it looks like registration is full... :(
On Mon, Nov 13, 2023 at 4:34 AM Jörg Saßmannshausen <
sassy-w...@sassy.formativ.net> wrote:
> Dear all,
>
> just in case you are interested, there is a EESSI online tutorial coming
> up. EESSI is a way to share microarchitecture-specific an
does anyone have teaming w/lacp between cisco switches (ios) and
redhat9 working? i config'd the switch and setup teaming through
network manager. i can see the LACP pkts flowing between the switch
and server after the link goes up, but then 45secs or so later
something decides the LACP link coul
Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> || \\of NJ | Office of Advanced Research Computing - MSB A555B, Newark
> `'
>
> On Sep 20, 2023, at 14:49, Michael DiDomenico wrote:
>
> mean
; can get neighbours.
>> A few years since I used it.
>>
>> On Tue, 19 Sep 2023, 19:03 Michael DiDomenico,
>> wrote:
>>>
>>> does anyone know if there's a simple command to pull the neighbor of
>>> the an ib port? for instance, this horri
does anyone know if there's a simple command to pull the neighbor of
the an ib port? for instance, this horrible shell command line
# for x in `ibstat | awk -F \' '/^CA/{print $2}'`; do iblinkinfo -C
${x} -n 1 -l | grep `hostname -s`; done
0x08006900fbcc "SwitchX - Mellanox Technologies" 41
i would definitely look more at tuning nfs/backend disks rather then
going down the rabbit hole of gluster/lustre/beegfs. you only have
five nodes. nfs is a hog, but you're not likely to bottleneck the nfs
protocol with only five nodes
but for anyone here to give you better advice you'd have to
gt;
>
>
> On Fri, Jul 21, 2023 at 9:12 AM Michael DiDomenico
> wrote:
>>
>> ugh, as someone who worked the front lines in the 00's i got front row
>> seat to the interconnect mud slinging... but franky if they're going
>> to come out of the gate with
ugh, as someone who worked the front lines in the 00's i got front row
seat to the interconnect mud slinging... but franky if they're going
to come out of the gate with a product named "Ultra Ethernet", i smell
a loser... :) (sarcasm...)
https://www.nextplatform.com/2023/07/20/ethernet-consortium
not sure i understand suse's move there. they can't run two competing
linux ventures. people are going to be pretty apprehensive about
investing time in a forked rhel clone, i would think even more so one
run by a competing distro.
i've been watching this play out in the media and how redhat kee
ics
> https://wisecorp.co.uk, .us & .ru
>
> On 23/03/2023 16:51, Michael DiDomenico wrote:
> > does anyone happen to have an old sgi / supermicro bios for an
> > X9DRG-QF+ motherboard squirreled away somewhere? sgi is long gone,
> > hpe might have something s
does anyone happen to have an old sgi / supermicro bios for an
X9DRG-QF+ motherboard squirreled away somewhere? sgi is long gone,
hpe might have something still but who knows where. i reached out to
supermicro, but i suspect they'll say no.
___
Beowulf
no doubts from me. thanks for the info Kilian. unfortunately
sometimes purchasing out paces infrastructure. fortunately nothings
set in stone so we'll see what can be changed
On Wed, Jun 29, 2022 at 10:02 AM Joe Landman wrote:
>
> Egads ... if you are still running a 3 series kernel in product
milan cpu's aren't officially supported on less then rhel8.3. but
there's anecdotal evidence that rhel7 will run on milan cpu's. if the
evidence is true, is anyone on the list doing so and can confirm?
___
Beowulf mailing list, Beowulf@beowulf.org spons
it might be worthwhile to start with a note to the award committee and
see if his name was left off intentionally because of some criteria or
maybe it was just an oversight
On Mon, Feb 28, 2022 at 11:34 AM Prentice Bisbal via Beowulf
wrote:
>
> Is this where we start a change.org petition to get
in case you missed it, apparently beowulf computing is being inducted
into the space technologies hall of fame. in other news apparently
there's a space technologies hall of fame...
https://www.hpcwire.com/off-the-wire/beowulf-computing-cluster-will-be-inducted-into-the-space-technologies-hall-of
27 matches
Mail list logo