Pleas tell us the history of the overall system. Was it bought as hardware only from a supplier? Or was it delivered as an already set up system with operating system, applications, Infiniband drivers etc?
I would also look at Qlustar https://www.qlustar.com/book/qlustar/summary and Bright https://www.brightcomputing.com/ Bright will certainly give you excellent support. On Thu, 2 May 2019 at 17:02, John Hearns <hear...@googlemail.com> wrote: > You ask some damned good questions there. > I will try to answer them from the point of view of someone who has worked > as an HPC systems integrator and supported HPC systems, > both for systems integrators and within companies. > > We will start with HP. Did you buy those systems direct from HP as > servers, or did you buy a configured HPC system, > complete with Infiniband networking and with a software stack? > If you bought bare metal servers then you are out of luck regarding > support, other than hardware failures. > HP now incorporate SGI, and their support is fantastic. Great people work > for HP and SGI. But they aren't responsible for your install. > > If however you bought an integrated HPC system this will normally be > integrated by a smaller company, usually in your country. > Is this the case here? Then yes the integrator should be providing > support. > HOWEVER you have elected to remove their installed OS and upgrade by > yourself. If I was the integrator I would give advice, > but refuse to support the upgrade unless it was recommended by us, and you > have a continuing support contract. > > You are using CentOS. The CentOS team are great guys - I know the founder > quite well, and know people who work for RedHat. > You have chosen CentOS - Community Supported Operating System. Join the > CentOS HPC SIG perhaps and ask for help. > But you don't get support from RedHat - as you are not using Redhat > Enterprise Linux. > > Now we come to Mellanox. Mellanox support is fantastic. Formally, to open > a support ticket with them you will need a support agreement > on your switch. You HAVE got a support agreement - right? > If not I have found that informal requests for support are often answered > by Mellanox support. > > Failing all of those you could hire me! > (I am being semi-serious here - I am a permanent employee at the moment, > but I have worked as an HPC contractor int he past, > and if I could justify it I would prefer to do HPC support on a contract > basis). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 2 May 2019 at 16:45, Faraz Hussain <i...@feacluster.com> wrote: > >> Thanks. Before I go down the path of installing things willy-nilly, is >> there some guide I should be following instead? I obviously have a >> problem with my mellanox drivers combined with "user error".. >> >> So should I be paying Mellanox to help? Or is it a RedHat issue? Or is >> it our harware vendor, HP who should be involved?? >> >> Looks like I need support on how to get support :-) >> >> >> Quoting Christopher Samuel <ch...@csamuel.org>: >> >> >> root@lustwzb34:/root # systemctl status rdma >> >> Unit rdma.service could not be found. >> > >> > You're missing this RPM then, which might explain a lot: >> > >> > $ rpm -qi rdma-core >> > Name : rdma-core >> > Version : 17.2 >> > Release : 3.el7 >> > Architecture: x86_64 >> > Install Date: Tue 04 Dec 2018 03:58:16 PM AEDT >> > Group : Unspecified >> > Size : 107924 >> > License : GPLv2 or BSD >> > Signature : RSA/SHA256, Tue 13 Nov 2018 01:45:22 AM AEDT, Key ID >> > 24c6a8a7f4a80eb5 >> > Source RPM : rdma-core-17.2-3.el7.src.rpm >> > Build Date : Wed 31 Oct 2018 07:10:24 AM AEDT >> > Build Host : x86-01.bsys.centos.org >> > Relocations : (not relocatable) >> > Packager : CentOS BuildSystem <http://bugs.centos.org> >> > Vendor : CentOS >> > URL : https://github.com/linux-rdma/rdma-core >> > Summary : RDMA core userspace libraries and daemons >> > Description : >> > RDMA core userspace infrastructure and documentation, including >> initscripts, >> > kernel driver-specific modprobe override configs, IPoIB network scripts, >> > dracut rules, and the rdma-ndd utility. >> > >> > -- >> > Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA >> > _______________________________________________ >> > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin >> Computing >> > To change your subscription (digest mode or unsubscribe) visit >> > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >> >> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >> >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf