Jonathan, I am going to stick my neck out here. I feel that HDFS was a 'thing of its time' - people are slavishly building clusters with local SATA drives to follow that recipe. Current parallel filesystems have adapters which make them behave like HDFS http://docs.ceph.com/docs/master/cephfs/hadoop/ https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_hadoopconnector.htm http://wiki.lustre.org/index.php/Running_Hadoop_with_Lustre
Also you all know what is coming next. Julia. (Sorry all!) https://wilmott.com/big-time-series-analysis-with-juliadb/ (I guess this is specific to finance) https://juliacomputing.github.io/JuliaDB.jl/latest/out_of_core/ On Mon, 4 Mar 2019 at 11:16, Jonathan Aquilina <jaquil...@eagleeyet.net> wrote: > I read though that postgres can handle time shift data no problem. I am > just concerned if the clients would want to do complex big data analytics > on the data. At this stage we are just prototyping but things are very up > in the air at this point I am wondering though if sticking with HDFS and > Hadoop is the best way to go for this in terms of performance and over all > analytical capabilities. > > What I am trying to understand is how Hadoop being written in java is so > performant. > > Regards, > Jonathan > > On 04/03/2019, 12:11, "Beowulf on behalf of Fred Youhanaie" < > beowulf-boun...@beowulf.org on behalf of f...@anydata.co.uk> wrote: > > Hi Jonathan > > I have used PostgreSQL for collecting data, but there's nothing there > that would be of use to you! > > A few years ago I set up a similar system (in a hurry) in a small > company. The bulk data was compressed and it was made available to the > applications via NFS (IPoIB). The applications were responsible > for decompressing and pre/post-processing the data. Later, one of the > developers created a PostgreSQL based system to hold all the data, he used > C++ for all the data handling. That system was never > used, even though all the historical data was loaded into the database! > > Your choice of components is going to depend on how your analytics > software are going to access the data. If the data are being read and > processed once, then loading into a database, then querying it > once may not pay off. > > Cheers, > Fred > > On 04/03/2019 09:24, Jonathan Aquilina wrote: > > Hi Fred, > > > > I and my colleague had done some research and found an extension for > postgresql called timescaleDB, but then upon further research postgres on > its own is good for such data as well. The thing is these are not going to > be given to use as the data is coming in but in bulk at the end from the > parent company. > > > > Have you used postgresql for such type's of data and how has it > performed? > > > > Regards, > > Jonathan > > > > On 04/03/2019, 10:19, "Beowulf on behalf of Fred Youhanaie" < > beowulf-boun...@beowulf.org on behalf of f...@anydata.co.uk> wrote: > > > > Hi Jonathan, > > > > It seems you're collecting metrics and time series data. > Perhaps a time series database (TSDB) is an option for you. There are a few > of these out there, but I don't have any personal recommendation. > > > > Cheers, > > Fred > > > > On 04/03/2019 07:04, Jonathan Aquilina wrote: > > > These would be numerical data such as integers or floating > point numbers. > > > > > > -----Original Message----- > > > From: Tony Brian Albers <t...@kb.dk> > > > Sent: 04 March 2019 08:04 > > > To: beowulf@beowulf.org; Jonathan Aquilina < > jaquil...@eagleeyet.net> > > > Subject: Re: [Beowulf] Large amounts of data to store and > process > > > > > > Hi Jonathan, > > > > > > From my limited knowledge of the technologies, I would say > that HBase with file pointers to the files placed on HDFS would suit you > well. > > > > > > But if the files are log files, consider some tools that are > suited for analyzing those like Kibana. > > > > > > /tony > > > > > > > > > On Mon, 2019-03-04 at 06:55 +0000, Jonathan Aquilina wrote: > > >> Hi Tony, > > >> > > >> Sadly I cant go into much detail due to me being under an > NDA. At this > > >> point with the prototype we have around 250gb of sample data > but again > > >> this data is dependent on the type of air craft. Larger > aircraft and > > >> longer flights will generate a lot more data as they have > more > > >> sensors and will log more data than the sample data that I > have. The > > >> sample data is 250gb for 35 aircraft of the same type. > > >> > > >> Regards, > > >> Jonathan > > >> > > >> -----Original Message----- > > >> From: Tony Brian Albers <t...@kb.dk> > > >> Sent: 04 March 2019 07:48 > > >> To: beowulf@beowulf.org; Jonathan Aquilina < > jaquil...@eagleeyet.net> > > >> Subject: Re: [Beowulf] Large amounts of data to store and > process > > >> > > >> On Mon, 2019-03-04 at 06:38 +0000, Jonathan Aquilina wrote: > > >>> Good Morning all, > > >>> > > >>> I am working on a project that I sadly cant go into much > detail but > > >>> there will be quite large amounts of data that will be > ingested by > > >>> this system and would need to be efficiently returned as > output to > > >>> the end user in around 10 min or so. I am in discussions > with > > >>> another partner involved in this project about the best way > forward > > >>> on this. > > >>> > > >>> For me given the amount of data (and it is a huge amount of > data) > > >>> that an RDBMS such as postgresql would be a major bottle > neck. > > >>> Another thing that was considered flat files, and I think > the best > > >>> for that would be a Hadoop cluster with HDFS. But in the > case of HPC > > >>> how can such an environment help in terms of ingesting and > analytics > > >>> of large amounts of data? Would said flat files of data be > put on a > > >>> SAN/NAS or something and through an NFS share accessed that > way for > > >>> computational purposes? > > >>> > > >>> Regards, > > >>> Jonathan > > >>> _______________________________________________ > > >>> Beowulf mailing list, Beowulf@beowulf.org sponsored by > Penguin > > >>> Computing To change your subscription (digest mode or > unsubscribe) > > >>> visit http:/ /www.beowulf.org/mailman/listinfo/beowulf > > >> > > >> Good morning, > > >> > > >> Around here, we're using HBase for similar purposes. We have > a bunch > > >> of smaller nodes storing the data and all the management > nodes(ambari, > > >> HDFS namenodes etc.) are vm's. > > >> > > >> Our nodes are configured so that we have a maximum of 2 > cores per disk > > >> spindle and 4G of memory for each core. This seems to do the > trick and > > >> is pretty responsive. > > >> > > >> But to be able to provide better advice, you will probably > need to go > > >> into a bit more detail about what types of data you will be > storing > > >> and which kind of calculations you want to perform. > > >> > > >> /tony > > >> > > >> > > >> -- > > >> Tony Albers - Systems Architect - IT Development Royal > Danish Library, > > >> Victor Albecks Vej 1, 8000 Aarhus C, Denmark > > >> Tel: +45 2566 2383 - CVR/SE: 2898 8842 - EAN: 5798000792142 > > > > > > -- > > > Tony Albers - Systems Architect - IT Development Royal Danish > Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark > > > Tel: +45 2566 2383 - CVR/SE: 2898 8842 - EAN: 5798000792142 > > > _______________________________________________ > > > Beowulf mailing list, Beowulf@beowulf.org sponsored by > Penguin Computing > > > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin > Computing > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin > Computing > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf