Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Lux, Jim (337K) via Beowulf
I'm munging through not very much satellite telemetry (a few GByte), using sqlite3.. Here's some general observations: 1) if the data is recorded by multiple sensor systems, the clocks will *not* align - sure they may run NTP, but 2) Typically there's some sort of raw clock being recorded

Re: [Beowulf] New job for Scientific HPC Engineer

2019-03-04 Thread Lux, Jim (337K) via Beowulf
Machine Learning, perhaps? Norbert Weiner described the possibility of future intelligent computers needing instruction in the same way as humans.. Jim Lux (818)354-2075 (office) (818)395-2714 (cell) From: Beowulf [mailto:beowulf-boun...@beowulf.org] On Behalf Of John Hearns via Beowulf Sent

[Beowulf] Application independent checkpoint/resume?

2019-03-04 Thread Christopher Samuel
Hi folks, Just wondering if folks here have recent experiences here with application independent checkpoint/resume mechanisms like DMTCP or CRIU? Especially interested for MPI uses, and extra bonus points for experiences on Cray. :-) From what I can see CRIU doesn't seem to support MPI at a

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Michael Di Domenico
On Mon, Mar 4, 2019 at 8:18 AM Jonathan Aquilina wrote: > > As previously mentioned we don’t really need to have anything indexed so I am > thinking flat files are the way to go my only concern is the performance of > large flat files. potentially, there are many factors in the work flow that u

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Ellis H. Wilson III
On 3/4/19 1:38 AM, Jonathan Aquilina wrote: Good Morning all, I am working on a project that I sadly cant go into much detail but there will be quite large amounts of data that will be ingested by this system and would need to be efficiently returned as output to the end user in around 10 min

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Douglas Eadline
> I read though that postgres can handle time shift data no problem. I am > just concerned if the clients would want to do complex big data analytics > on the data. At this stage we are just prototyping but things are very up > in the air at this point I am wondering though if sticking with HDFS a

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Douglas Eadline
> Good Morning all, > > I am working on a project that I sadly cant go into much detail but there > will be quite large amounts of data that will be ingested by this system > and would need to be efficiently returned as output to the end user in > around 10 min or so. I am in discussions with anot

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Jonathan Engwall
I think you are asking more than one question. I think you need real time communication, fast reliable storage, analytics and presentation for investors. Making your needs clear will help people help you. On Mon, Mar 4, 2019, 6:28 AM Joe Landman wrote: > > On 3/4/19 1:55 AM, Jonathan Aquilina wr

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Joe Landman
On 3/4/19 1:55 AM, Jonathan Aquilina wrote: Hi Tony, Sadly I cant go into much detail due to me being under an NDA. At this point with the prototype we have around 250gb of sample data but again this data is dependent on the type of air craft. Larger aircraft and longer flights will generate

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Jonathan Engwall
What does your overall design look like? On Mon, Mar 4, 2019, 5:19 AM Jonathan Aquilina wrote: > Hi Michael, > > As previously mentioned we don’t really need to have anything indexed so I > am thinking flat files are the way to go my only concern is the performance > of large flat files. Isnt th

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Jonathan Aquilina
Hi Michael, As previously mentioned we don’t really need to have anything indexed so I am thinking flat files are the way to go my only concern is the performance of large flat files. Isnt that what HDFS is for to deal with large flat files. On 04/03/2019, 14:13, "Beowulf on behalf of Michael

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Rémy Dernat
Hi, I don't know exactly what you would like to do with these datas, but if I were you, I would take a close look at elasticsearch, spark or even hdf5, depending on what your analysis software looks like (is coded with...), to see if these technologies could save up some time. Moreover, I won

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Michael Di Domenico
even though you've alluded to this being time series data. is there a requirement that you have to index into the data or is just read the data end-to-end and do some calculations. i routinely face these kind of issues, but we're not indexing into the data, so having things in hdfs or rdbms doesn

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread John Hearns via Beowulf
Jonathan, I am going to stick my neck out here. I feel that HDFS was a 'thing of its time' - people are slavishly building clusters with local SATA drives to follow that recipe. Current parallel filesystems have adapters which make them behave like HDFS http://docs.ceph.com/docs/master/cephfs/hadoo

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Jonathan Aquilina
I read though that postgres can handle time shift data no problem. I am just concerned if the clients would want to do complex big data analytics on the data. At this stage we are just prototyping but things are very up in the air at this point I am wondering though if sticking with HDFS and Had

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Fred Youhanaie
Hi Jonathan I have used PostgreSQL for collecting data, but there's nothing there that would be of use to you! A few years ago I set up a similar system (in a hurry) in a small company. The bulk data was compressed and it was made available to the applications via NFS (IPoIB). The applications

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Jonathan Aquilina
Hi Fred, I and my colleague had done some research and found an extension for postgresql called timescaleDB, but then upon further research postgres on its own is good for such data as well. The thing is these are not going to be given to use as the data is coming in but in bulk at the end from

Re: [Beowulf] Large amounts of data to store and process

2019-03-04 Thread Fred Youhanaie
Hi Jonathan, It seems you're collecting metrics and time series data. Perhaps a time series database (TSDB) is an option for you. There are a few of these out there, but I don't have any personal recommendation. Cheers, Fred On 04/03/2019 07:04, Jonathan Aquilina wrote: These would be numeri