On Wednesday, August 15, 2012 4:06:38 PM UTC-7, Dave Mandelin wrote: > On Wednesday, August 15, 2012 2:03:38 PM UTC-7, Taras Glek wrote: > > > Hi, > > > > > > According to metrics we have about 1TB of telemetry data in hadoop. This > > > is almost a year worth of telemetry data. Our telemetry ping packets > > > keep growing as we add more probes. As the hadoop database gets bigger, > > > query times get worse, etc. We need to decide on what data we can throw > > > away and when. > > > > Most of this sounds fine to me, except I wanna ask: how much does it cost to > store 1TB of data? Next to nothing, right? I'd say move it out of the primary > database to an archive area if you need to for performance, but why not keep > all the archives? > > > > Dave
There is a cost beyond the cost of raw storage. There is a cost in the maintenance of whatever archival and in making sure people can retrieve and make use of that data if it is asked for. More important to me though is it is potentially at odds with the privacy principle of keeping data for only as long as it is known to have value. Saving it for a rainy day isn't good enough, we should have a clear understanding of how it might be used in the future and then come to an agreement over whether the potential value is worth the cost of keeping it. _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform