-relies-entirely-on-artificial-intelligence-for-its-future
-- Original Message --
Received: 09:07 AM CST, 11/11/2023
From: "Douglas Eadline"
To: "Joshua Mora" Cc: beowulf@beowulf.org
Subject: Re: [Beowulf] And wearing another hat ...
>
> I was talking to a wr
It would be good to track how governments try to "regulate"
technologies/materials/processes that have an impact on HPC (AI at scale fits
into HPC) for good and for bad.
It could be for instance as convoluted as DC emissions cap aligning to a
climate policy.
Joshua
-- Original Message --
Reach out AMD,
they have specific instructions (including BIOS/OS settings) and even binaries
on how to get the best performance.
Dont go try and error as is very time consuming.
BLIS has also multiple parameters as it has nested loops, so you could also
have to try multiple configurations to get t
IMHO, it is about the competitive scaleout solutions they can put together: HW
-> interconnect, fault tolerance and QOS. SW -> communication
libraries(accelerated collectives) and application optimization with scalable
frameworks.
And to protect their current world wide portfolio of customers (HPE,
Buy server grade not consumer grade.
Also use trim.
Joshua
-- Original Message --
Received: 11:30 PM PDT, 07/26/2018
From: Jonathan Engwall
To: Beowulf Mailing List
Subject: [Beowulf] Fwd: SSD performance
> While tarring files after cloning the drive became hopeless, I realized all
>
need to do the review on AMZN web site.
Joshua Mora
-- Original Message --
Received: 08:19 AM CST, 11/23/2015
From: "Douglas Eadline"
To: "John Hearns" Cc: "beowulf@beowulf.org"
Subject: [Beowulf] Question for the community
>
>
> >
>
Without giving specific names of technologies,
hypervisors can reduce around 5-10% the performance.
I tested linpack and stream and I got small reduction ~5%.
For networking: bandwidth about 5%, for latency about 10% reduction.
On hyperconverged, since data is replicated, the write performance is h
Economic solutions will compromise those network pipes, so for I/O intensive
solutions, you have to be careful on the setups. Therefore understanding the
I/O requirements of the applications is fundamental to understand if the
hyperconverged solution of choice is going to choke.
Best regards,
Joshu
Hello Jonathan.
Here it is a good document to get you thinking.
http://www.cs.berkeley.edu/~rxin/db-papers/WarehouseScaleComputing.pdf
Although Doug said "Oh, and Hadoop clusters are not going to supplant your HPC
cluster"
I believe that there is an ongoing effort to converge Cloud computing (eg.
> This is something China readily admits, and is
> working to address.* At the pace their, moving though, I imagine it
> won't be long before this is fixed, but a cultural change like that
> would still probably take some time, I'd say 10 or more years.
I read on an interview to a scientist on
The codesign effort pushed by the new requirements/constraints (power and
performance) is shaking design decision of existing SW frameworks, hence
forcing to get rewritten overtime to add new fundamental functionality (eg.
progress threads for asynchronous communication and fault tolerance).
The
s.
Joshua
-- Original Message --
Received: 10:53 AM PDT, 08/12/2014
From: "C. Bergström"
To: Joshua Mora Cc: doug.latt...@l-3com.com,
beowulf@beowulf.org
Subject: Re: [Beowulf] 8p 16 core x86_64 systems
> On 08/12/14 11:57 PM, Joshua Mora wrote:
> > Hello Doug.
> > AMD
://www.numascale.com
Best regards,
Joshua Mora.
-- Original Message --
Received: 09:02 AM PDT, 08/12/2014
From:
To:
Subject: [Beowulf] 8p 16 core x86_64 systems
> Does anyone know of any manufactures who build an 8 processor (8-way)
motherboard which can utilize 16 core opteron ch
My 2 cents.
For CFD: From math point of view: hybrid Eulerian-Lagrangian formulation,
hybrid numerical + analytic models, automatic differentiation, interval
arithmetic for sensitive analysis, machine learning (artificial
intelligence).
For HPC (that includes CFD): From sw implementation point of
Hi Joe.
I don´t think this is such an innovative thing.
Isn´t the government already applying these concepts ?
I mean spending the money they do not have before hand ?
Joshua
-- Original Message --
Received: 08:43 AM PDT, 04/01/2014
From: Joe Landman
To: beowulf@beowulf.org
Subject: [Beo
Sandybridge and Ivybridge do not have AVX2 extensions.
Haswell does.
Therefore SB and IB do 8DP FLOPs/clk/core
HW does 16DP FLOPs/clk/core
AMD processors Interlagos,Abudhabi support FMA4 and FMA3/4 respectively. They
are capable as well of 8DP FLOPs/clk/core.
floating point operations in single
DR on gen2" about 3 years
ago.
Eugen, a link to the post would have been sufficient.
Joshua Mora.
-- Original Message --
Received: 02:26 AM PDT, 06/19/2013
From: Eugen Leitl
To: Beowulf@beowulf.org, i...@postbiota.org
Subject: [Beowulf] breaking Amdahl's law
>
> http
Some apps scale upto several hundreds of thousands of cores. These are Gordon
Bell award apps, with sustained levels around the PF.
See this link :
http://www.ncsa.illinois.edu/News/Stories/BW1year/apps.pdf
Another one not in this list is DCA++, which has been ported also to GPUs.
Jaguar had five
Search on the web for instance "PGAS over Ethernet" to get an idea of where
_some_ of those things are headed.
Joshua
-- Original Message --
Received: 05:50 PM CEST, 04/18/2013
From: "Douglas Eadline"
To: "Hearns, John" Cc: "beowulf@beowulf.org"
Subject: Re: [Beowulf] Register article o
Thanks for the pointer.
It seems rather complete but it is missing an important or fundamental topic
for high performance computing: profiling, at least at introductory level.
Joshua
-- Original Message --
Received: 04:45 PM CEST, 04/10/2013
From: Eugen Leitl
To: Beowulf@beowulf.org, i..
Sorry, I did not read in order the post. Brian Dobbins made the same point few
hours earlier in the very same way.
Joshua
-- Original Message --
Received: 05:52 AM CEST, 04/06/2013
From: "Joshua Mora"
To: Beowulf Mailing List
Subject: Re: [Beowulf] Often favorable t
Similar rational arguments apply to why you want to invest into a good
compiler.
I find though easier to justify these things in terms of the metric money
rather than performance metrics. Just HPC people are not that used to use the
metric money.
In other words it becomes a business decision.
Exam
It would be good to know what were the levels of efficiency of the
applications wrt FLOP/s and GB/s and the typical node count for the runs.
Then compare that against the current PF/s systems.
Joshua
-- Original Message --
Received: 05:49 PM CEST, 04/05/2013
From: Eugen Leitl
To: Beowulf
I can't wait to read the postmortem report if that becomes publicly available
(ie. lessons learned).
Joshua Mora.
-- Original Message --
Received: 12:46 PM CEST, 04/01/2013
From: John Hearns
To: Beowulf Mailing List
Subject: [Beowulf] Roadrunner shutdown
> I now we have all s
Good comments.
My comments inline.
Joshua
-- Original Message --
Received: 11:02 PM CDT, 03/11/2013
From: Brendan Moloney
To: Joshua mora acosta Cc: Vincent Diepeveen
, Mark Hahn , Beowulf List
Subject: Re: [Beowulf] difference between accelerators and co-processors
> I think t
See this paper
http://synergy.cs.vt.edu/pubs/papers/daga-saahpc11-apu-efficacy.pdf
While discrete GPUs underperform wrt APU on host to/from device transfers in a
ratio of ~2X, it compensates by far the computing power and local bandwidth
~8-10X.
You can cook though a test where you do little comp
How about: "Methodologies for a sanity check in 30 minutes of your HPC
solution ? "
Joshua
-- Original Message --
Received: 11:36 AM CST, 03/03/2013
From: Andrew Holway
To: Bewoulf
Subject: [Beowulf] Themes for a talk on beowulf clustering
> Hello all,
>
> I am giving a talk on beowul
Sorry I forgot to attach the file related with the comparison of
performance/dollar of 6200 vs E2600.
here it is.
Thanks,
Joshua
<>___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mo
f Intel went to
match the lines of AMD, ie. to become Perf/USD competitive or on par without
having to discount on AMD.
Best regards, Joshua Mora
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your sub
It is at the planetarium, walking distance from Convention Center.
Joshua
-- Original Message --
Received: 09:42 PM MST, 11/07/2012
From: "Douglas Eadline"
To: "Ellis H. Wilson III" Cc: beowulf@beowulf.org
Subject: Re: [Beowulf] Any beowulfers attending SC12?
>
>
> >
> >> Has anyone f
,
Joshua Mora.
___
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
The most exceptional people I have met on any field did not had a formal
educational process. Formal education though sets in most cases a foundation
to start building the professional skills.
These professionals are good because of learning as they needed with high
motivation or better said with p
If a program scales in parallel you should be able to see a reduction of the
elapsed time or number of clocks of the entire code. Look at your code for
both the total of your code and the key functions that are scaling. By scaling
I mean that you would take the analysis of 1 core,2 cores,4 cores,..
I agree with Joe.
Plus I know that most of us, if not all, truly want to share knowledge, and
why not, opinions as well based on personal experiences as long as "we all do
the effort to be respectful with both the individual and the technology and
being open /receptive to be criticized as well".
T
Do you mean IB over QPI ?
Either way, High Node Count Coherence will be an issue.
In any case, by acquiring their IP it is a step forward towards SoC (System on
Chip). A preliminary step (building block) for the Exascale strategy and for
low cost enterprise/cloud solutions.
Joshua
-- Original
_3D_ FFT scaling will allow you to see how well balanced is the system.
Joshua
-- Original Message --
Received: 07:40 PM CDT, 04/06/2011
From: Mark Hahn
To: Beowulf Mailing List
Subject: Re: [Beowulf] Westmere EX
> > http://www.theregister.co.uk/2011/04/05/intel_xeon_e7_launch/
> >
> >
-to-high-performance-scientific-computing/14408128
This is really a second chance to learn HPC.
BTW, I learn from the author about 12 years ago in solvers.
I am sure he will value direct feedback from the folks in this list.
Best regards,
Joshua Mora
Hi Gabriel.
If your app is something single threaded (ie. runs on single core) that works
on a per frame basis and it is fairly cache friendly, then the more cores the
better from ecconomical point of view without hurting necessarily on
performance. A fat node would do as well as a bunch of tiny no
chnologies that will allow you to get to the next computational/science
challenge.
Best regards,
Joshua Mora.
-- Original Message --
Received: 08:53 PM CDT, 10/04/2010
From: Mark Hahn
To: gabriel lorenzo Cc: beowulf@beowulf.org
Subject: Re: [Beowulf] Begginers question # 1
> > IN
MC 12core at 2.2GHz: 91% on die, 86.7% on node 2 socket , above 82% on
cluster.
Joshua
-- Original Message --
Received: 10:34 AM CDT, 07/06/2010
From: Mark Hahn
To: Beowulf Mailing List
Subject: [Beowulf] HPL efficiency on Magny-Cours and Westmere?
> Hi all,
> can anyone tell me what k
the caches.
Joshua
-- Original Message --
Received: 10:12 AM CDT, 06/30/2010
From: Bill Rankin
To: Joshua mora acosta , Rahul Nabar ,
Beowulf Mailing List
Subject: RE: [Beowulf] dollars-per-teraflop : any lists like the Top500?
> > I think the money part will be difficult to get (it
I think the money part will be difficult to get (it is like a politically
incorrect question).
Nevertheless, you can split the money in two parts: purchase (which I am sure
you will never get) and electric bill for kipping the system up and running
while you run HPL and when you run stream.
Then yo
onfigured as low power
consumption where you would downclock the cores as much as you can without
affecting the others.
And before I forget, every device hanging from chipset (Eth, IB NICs,GPUs) can
be also virtualized thanks to IOMMU features.
Best regards,
Joshua Mora.
-- Original Message ---
It does not make sense to come up with a general/wide statement of product A
better than product B and or product C.
Each architecture/solution has its strong points and its weak points wrt
others _for_a_given_feature.
There is also certain level of overlapping of features between those
solutions,
Hi,
I think you misinterpreted the tittle. It is what it is, "HPC for dummies".
Enough to expose in a plain way to anyone what HPC is, which may not be that
easy to make a good summary of such a broad topic in 46 pages.
It would be great to see though a tittle like "HPC for the next decade" or
"bey
Just try it and you'll understand what it means communication overhead
most of these apps are network latency dominated: small messages but lots
because of i) many neighbor processors involved and iterative process.
Packing all the faces that need to be exchanges is the right way to go.
You can
n
stresses that specific component of the cluster (eg. processor, networking,
storage, OS, settings), or sw tools for management/debugging of the HW+SW
clustered solution, among many other things...
Best regards,
Joshua Mora.
-- Original Message --
Received: 11:42 PM CEST, 08/27/2009
From:
You can run HPL bound to a specific socket maximizing also the memory
associated to that socket in order to try to shutdown it because of reaching
the "hardware thermal control" due to lack of cooling.
On BIOS you can also have HW monitoring to tell you speed of fans and perhaps
detect the diff of
Hello
I am trying to get to work NWChem 5.1.1 + Infiniband OFED 1.3 + GA 4.1.1 using
HPMPI (or any other MPI) and PGI or any other MPI.
I get the well known problem of not working over the network.
Here you have the configuration I am using just in case someone spots the
error.
It runs fine in node
From: Håkon Bugge
To: Craig Tierney Cc: Joshua mora acosta
, dphu...@uncg.edu,beowulf@beowulf.org
Subject: Re: [Beowulf] Lowered latency with multi-rail IB?
> On Mar 27, 2009, at 18:20 , Craig Tierney wrote:
>
> > What about using multi-rail to increase message rate? That isn
The only way I got under 1usec in PingPong test or with
ib_[write/send/read]_lat is with QDR and back to back (ie. no switch).
With switch I get 1.1[3-7]usec [HP-MPI, OpenMPI, MVAPICH].
It does not matter the MPI although I have to agree with Greg that multirail
also increases latency.
Multirail is
Hi Joe.
Could you please get some dd to either read or write through NFS with lots of
small chunks (ie. high request rate rather than high throughput rate) in order
to find out how it correlates with the higher latency wrt Infiniband ?
Thanks,
Joshua
-- Original Message --
Received: 11:02
Answers inline.
Joshua
-- Original Message --
Received: 12:49 AM CST, 01/23/2009
From: amjad ali
To: Beowulf Mailing List
Subject: [Beowulf] programming guidence request
> Hello All,
> I am developing my parallel CFD code on a small cluster. My system has
> openmpi installed based on g
Hi Joe.
I guess it would be straight forward to get an openMP version run.
Can you please share your results on 1,2,4,8 threads ?
Use HT off on Nehalem.
Use thread affinity through environment variables or explicitly in the code.
Power management enabled or disabled, but disclosed.
Use SSE3 (Shangh
Comments inline.
Joshua
-- Original Message --
Received: Mon, 18 Aug 2008 11:29:22 PM PDT
From: "amjad ali" <[EMAIL PROTECTED]>
To: "Beowulf Mailing List"
Subject: [Beowulf] MPI build with different compilers
> Hi,
> Please reply me about followings:
>
>
> 1) Is there any significant
.
Joshua
-- Original Message --
Received: Tue, 19 Aug 2008 12:37:17 AM PDT
From: "Joshua mora acosta" <[EMAIL PROTECTED]>
To: "amjad ali" <[EMAIL PROTECTED]>, "Beowulf Mailing List"
Subject: Re: [Beowulf] MPI build with different compilers
> Comm
Hola Javier.
For each node
/etc/ssh/sshd_config
AllowUsers root sgeadmin
/etc/init.d/sshd restart
On SGE disable interactive access to the queues.
Salu2
Joshua.
-- Original Message --
Received: Thu, 15 May 2008 09:32:42 AM PDT
From: "Javier Lazaro" <[EMAIL PROTECTED]>
To: beowulf@beow
It means NorthBridge
-- Original Message --
Received: Fri, 09 May 2008 01:09:37 PM PDT
From: Jan Heichler <[EMAIL PROTECTED]>
To: "Joshua mora acosta" <[EMAIL PROTECTED]>Cc: Mark Hahn
<[EMAIL PROTECTED]>, Tom Elken <[EMAIL PROTECTED]>, Beowulf Mail
If you had a 2.3GHz at 2.0GHz NB you would get 17.5GB/sec.
Joshua
-- Original Message --
Received: Thu, 08 May 2008 02:18:30 PM PDT
From: Mark Hahn <[EMAIL PROTECTED]>
To: Tom Elken <[EMAIL PROTECTED]>Cc: Beowulf Mailing List
Subject: RE: Re[2]: [Beowulf] Recent comparisons of 1600 MHz e
800MHz isn't there but I
can say despite its improvement on multiple directions it does not close the
huge gap on those memory intensive applications.
Joshua Mora.
-- Original Message --
Received: Wed, 07 May 2008 01:47:40 PM PDT
From: Bill Johnstone <[EMAIL PROTECTED]>
Does anyone know what is the detailed plan for building that thing with 200
people in just 1 day?
I am very curious to understand what things can be done in parallel, what
things are serialized from the point of view of installation, testing and
evaluation/assesment.
Even monitoring the progress,id
Get for AMD based systems ACML and gcc,pgi or pathscale
Get for Intel based systems MKL and intel compiler
run N problem size around 90% workload.is, 1.8GB per core memory footprint.
Run NB 192 on AMD, I don't know the best blocking factor for MKL. I've tried
the same 192 and does fairly well.
Set
faced this type of problem and know a
solution/workaround to it.
Best regards,
Joshua Mora.
___
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
63 matches
Mail list logo