I think you are asking more than one question. I think you need real time communication, fast reliable storage, analytics and presentation for investors. Making your needs clear will help people help you.
On Mon, Mar 4, 2019, 6:28 AM Joe Landman <joe.land...@gmail.com> wrote: > > On 3/4/19 1:55 AM, Jonathan Aquilina wrote: > > Hi Tony, > > > > Sadly I cant go into much detail due to me being under an NDA. At this > point with the prototype we have around 250gb of sample data but again this > data is dependent on the type of air craft. Larger aircraft and longer > flights will generate a lot more data as they have more sensors and will > log more data than the sample data that I have. The sample data is 250gb > for 35 aircraft of the same type. > > > You need to return your answers in ~10m or 600s, with an assumed data > set size of 250GB or more (assuming you meant GB and not Gb). Depending > upon the nature of the calculation, whether or not you can perform the > calculations on subsets, or if it requires multiple passes through the > data in order to calculate. > > I've noticed some recommendations popping up ahead of understanding what > the rate limiting factors for returning the results from calculations > based upon this data set. I'd suggest focusing on the analysis needs to > start, as this will provide some level of guidance on the system(s) > design required to meet your objectives. > > First off, do you know whether your code will meet this 600s response > time with this 250GB data set? I am assuming this is unknown at this > moment, but if you have response time data for smaller data sets, you > could construct a rough scaling study and build a simple predictive model. > > Second, do you need the entire bolus of data, all 250GB, in order to > generate a response to within required accuracy? If not, great, and > what size do you need? > > Third, will this data set grow over time (looking at your writeup, it > looks like this is a definite "yes")? > > Fourth, does the code require physical access to all of the data bolus > (what is needed for the calculation) locally in order to correctly operate? > > Fifth, will the data access patterns for the code be streaming, > searching, or random? In only one of these cases would a database (SQL > or noSQL) be a viable option. > > Sixth, is your working data set size comparable to the bolus size (e.g. > 250GB)? > > Seventh, can your code work correctly with sharded data (variation on > second point)? > > > Now some brief "data physics". > > a) (data on durable storage) 250GB @ 1GB/s -> 250s to read, once, > assuming large block sequential read. For a 600s response time, that > leaves you with 350s to calculate. Is this enough time? Is a single > pass (streaming) workable? > > b) (data in ram) 250GB/s @ 100GB/s -> 2.5s to walk through once in > parallel amongst multiple cores. If multiple/many passes through data > are required, this strongly suggests a large memory machine (512GB or > larger). > > c) if your data is shardable, and you can distribute it amongst N > machines, the above analyses still hold, replacing the 250GB with the > size of the shards. If you can do this, how much information does your > code need to share amongst the worker nodes in order to effect the > calculation? This will provide guidance on interconnect choices. > > > Basically, I am advocating focusing on the analysis needs, how the > scale/grow, and your near/medium/long term goals with this, before you > commit to a specific design/implementation. Avoid the "if all you have > is a hammer, every problem looks like a nail" view as much as possible. > > > -- > Joe Landman > e: joe.land...@gmail.com > t: @hpcjoe > w: https://scalability.org > g: https://github.com/joelandman > l: https://www.linkedin.com/in/joelandman > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf