Talking about missing values... Joe Landman is sure to school me again for this one (owwwccchhh) https://docs.julialang.org/en/v1/manual/missing/index.html
Going back to the hardware, a 250Gbyte data size is not too large to hold in RAM. This might be a good use case for Intel Optane persistent memory - I dont know exactly how this works when used in a memory mode as opposed to a block device mode. The Diablo memory was supposed to migrate cold pages down to the lower, slower memory. Does Optane function similarly? On Tue, 5 Mar 2019 at 01:02, Lux, Jim (337K) via Beowulf < beowulf@beowulf.org> wrote: > I'm munging through not very much satellite telemetry (a few GByte), using > sqlite3.. > Here's some general observations: > 1) if the data is recorded by multiple sensor systems, the clocks will > *not* align - sure they may run NTP, but.... > 2) Typically there's some sort of raw clock being recorded with the data > (in ticks of some oscillator, typically) - that's what you can use to put > data from a particular batch of sources into a time order. And then you > have the problem of reconciling the different clocks. > 3) Watch out for leap seconds in time stamps - some systems have them > (UTC), some do not (GPS, TAI) - a time of 23:59:60 may be legal. > 4) you need to have a way to deal with "missing" data, whether it's time > tags, or actual measurements - as well as "gaps in the record" > 5) Be aware of the need to de-dupe data - same telemetry records from > multiple sources. > > > Jim Lux > (818)354-2075 (office) > (818)395-2714 (cell) > > > -----Original Message----- > From: Beowulf [mailto:beowulf-boun...@beowulf.org] On Behalf Of Jonathan > Aquilina > Sent: Monday, March 04, 2019 1:24 AM > To: Fred Youhanaie <f...@anydata.co.uk>; beowulf@beowulf.org > Subject: Re: [Beowulf] Large amounts of data to store and process > > Hi Fred, > > I and my colleague had done some research and found an extension for > postgresql called timescaleDB, but then upon further research postgres on > its own is good for such data as well. The thing is these are not going to > be given to use as the data is coming in but in bulk at the end from the > parent company. > > Have you used postgresql for such type's of data and how has it performed? > > Regards, > Jonathan > > On 04/03/2019, 10:19, "Beowulf on behalf of Fred Youhanaie" < > beowulf-boun...@beowulf.org on behalf of f...@anydata.co.uk> wrote: > > Hi Jonathan, > > It seems you're collecting metrics and time series data. Perhaps a > time series database (TSDB) is an option for you. There are a few of these > out there, but I don't have any personal recommendation. > > Cheers, > Fred > > On 04/03/2019 07:04, Jonathan Aquilina wrote: > > These would be numerical data such as integers or floating point > numbers. > > > > -----Original Message----- > > From: Tony Brian Albers <t...@kb.dk> > > Sent: 04 March 2019 08:04 > > To: beowulf@beowulf.org; Jonathan Aquilina <jaquil...@eagleeyet.net> > > Subject: Re: [Beowulf] Large amounts of data to store and process > > > > Hi Jonathan, > > > > From my limited knowledge of the technologies, I would say that > HBase with file pointers to the files placed on HDFS would suit you well. > > > > But if the files are log files, consider some tools that are suited > for analyzing those like Kibana. > > > > /tony > > > > > > On Mon, 2019-03-04 at 06:55 +0000, Jonathan Aquilina wrote: > >> Hi Tony, > >> > >> Sadly I cant go into much detail due to me being under an NDA. At > this > >> point with the prototype we have around 250gb of sample data but > again > >> this data is dependent on the type of air craft. Larger aircraft and > >> longer flights will generate a lot more data as they have more > >> sensors and will log more data than the sample data that I have. The > >> sample data is 250gb for 35 aircraft of the same type. > >> > >> Regards, > >> Jonathan > >> > >> -----Original Message----- > >> From: Tony Brian Albers <t...@kb.dk> > >> Sent: 04 March 2019 07:48 > >> To: beowulf@beowulf.org; Jonathan Aquilina <jaquil...@eagleeyet.net > > > >> Subject: Re: [Beowulf] Large amounts of data to store and process > >> > >> On Mon, 2019-03-04 at 06:38 +0000, Jonathan Aquilina wrote: > >>> Good Morning all, > >>> > >>> I am working on a project that I sadly cant go into much detail but > >>> there will be quite large amounts of data that will be ingested by > >>> this system and would need to be efficiently returned as output to > >>> the end user in around 10 min or so. I am in discussions with > >>> another partner involved in this project about the best way forward > >>> on this. > >>> > >>> For me given the amount of data (and it is a huge amount of data) > >>> that an RDBMS such as postgresql would be a major bottle neck. > >>> Another thing that was considered flat files, and I think the best > >>> for that would be a Hadoop cluster with HDFS. But in the case of > HPC > >>> how can such an environment help in terms of ingesting and > analytics > >>> of large amounts of data? Would said flat files of data be put on a > >>> SAN/NAS or something and through an NFS share accessed that way for > >>> computational purposes? > >>> > >>> Regards, > >>> Jonathan > >>> _______________________________________________ > >>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin > >>> Computing To change your subscription (digest mode or unsubscribe) > >>> visit http:/ /www.beowulf.org/mailman/listinfo/beowulf > >> > >> Good morning, > >> > >> Around here, we're using HBase for similar purposes. We have a bunch > >> of smaller nodes storing the data and all the management > nodes(ambari, > >> HDFS namenodes etc.) are vm's. > >> > >> Our nodes are configured so that we have a maximum of 2 cores per > disk > >> spindle and 4G of memory for each core. This seems to do the trick > and > >> is pretty responsive. > >> > >> But to be able to provide better advice, you will probably need to > go > >> into a bit more detail about what types of data you will be storing > >> and which kind of calculations you want to perform. > >> > >> /tony > >> > >> > >> -- > >> Tony Albers - Systems Architect - IT Development Royal Danish > Library, > >> Victor Albecks Vej 1, 8000 Aarhus C, Denmark > >> Tel: +45 2566 2383 - CVR/SE: 2898 8842 - EAN: 5798000792142 > > > > -- > > Tony Albers - Systems Architect - IT Development Royal Danish > Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark > > Tel: +45 2566 2383 - CVR/SE: 2898 8842 - EAN: 5798000792142 > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin > Computing > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf