Re: [Beowulf] Large amounts of data to store and process

Rémy Dernat Mon, 04 Mar 2019 05:18:34 -0800

Hi,

I don't know exactly what you would like to do with these datas, but if I were you, I would take a close look at elasticsearch, spark or even hdf5, depending on what your analysis software looks like (is coded with...), to see if these technologies could save up some time. Moreover, I won't dismiss NFS easily, espacially if your infrastructure already use it.


Elasticsearch can be plugged easily with kibana.

Best regards,


On 04/03/2019 08:04, Jonathan Aquilina wrote:

These would be numerical data such as integers or floating point numbers.

-----Original Message-----
From: Tony Brian Albers <t...@kb.dk>
Sent: 04 March 2019 08:04
To: beowulf@beowulf.org; Jonathan Aquilina <jaquil...@eagleeyet.net>
Subject: Re: [Beowulf] Large amounts of data to store and process

Hi Jonathan,

 From my limited knowledge of the technologies, I would say that HBase with 
file pointers to the files placed on HDFS would suit you well.

But if the files are log files, consider some tools that are suited for 
analyzing those like Kibana.

/tony

On Mon, 2019-03-04 at 06:55 +0000, Jonathan Aquilina wrote:

Hi Tony,

Sadly I cant go into much detail due to me being under an NDA. At this
point with the prototype we have around 250gb of sample data but again
this data is dependent on the type of air craft. Larger aircraft and
longer flights will generate a lot more data as they have  more
sensors and will log more data than the sample data that I have. The
sample data is 250gb for 35 aircraft of the same type.

Regards,
Jonathan

-----Original Message-----
From: Tony Brian Albers <t...@kb.dk>
Sent: 04 March 2019 07:48
To: beowulf@beowulf.org; Jonathan Aquilina <jaquil...@eagleeyet.net>
Subject: Re: [Beowulf] Large amounts of data to store and process

On Mon, 2019-03-04 at 06:38 +0000, Jonathan Aquilina wrote:

Good Morning all,

I am working on a project that I sadly cant go into much detail but
there will be quite large amounts of data that will be ingested by
this system and would need to be efficiently returned as output to
the end user in around 10 min or so. I am in discussions with
another partner involved in this project about the best way forward
on this.

For me given the amount of data (and it is a huge amount of data)
that an RDBMS such as postgresql would be a major bottle neck.
Another thing that was considered flat files, and I think the best
for that would be a Hadoop cluster with HDFS. But in the case of HPC
how can such an environment help in terms of ingesting and analytics
of large amounts of data? Would said flat files of data be put on a
SAN/NAS or something and through an NFS share accessed that way for
computational purposes?

Regards,
Jonathan
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin
Computing To change your subscription (digest mode or unsubscribe)
visit http:/ /www.beowulf.org/mailman/listinfo/beowulf

Good morning,

Around here, we're using HBase for similar purposes. We have a bunch
of smaller nodes storing the data and all the management nodes(ambari,
HDFS namenodes etc.) are vm's.

Our nodes are configured so that we have a maximum of 2 cores per disk
spindle and 4G of memory for each core. This seems to do the trick and
is pretty responsive.

But to be able to provide better advice, you will probably need to go
into a bit more detail about what types of data you will be storing
and which kind of calculations you want to perform.

/tony


--
Tony Albers - Systems Architect - IT Development Royal Danish Library,
Victor Albecks Vej 1, 8000 Aarhus C, Denmark
Tel: +45 2566 2383 - CVR/SE: 2898 8842 - EAN: 5798000792142

--
Tony Albers - Systems Architect - IT Development Royal Danish Library, Victor 
Albecks Vej 1, 8000 Aarhus C, Denmark
Tel: +45 2566 2383 - CVR/SE: 2898 8842 - EAN: 5798000792142
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf


--
Rémy Dernat
Ingénieur Système/Calcul
Plateforme MBB
Institut des Sciences de l'Evolution - Montpellier

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Large amounts of data to store and process

Reply via email to