On Wednesday, August 15, 2012 4:06:38 PM UTC-7, Dave Mandelin wrote:
> On Wednesday, August 15, 2012 2:03:38 PM UTC-7, Taras Glek wrote:
> 
> > Hi,
> 
> > 
> 
> > According to metrics we have about 1TB of telemetry data in hadoop. This 
> 
> > is almost a year worth of telemetry data. Our telemetry ping packets 
> 
> > keep growing as we add more probes. As the hadoop database gets bigger, 
> 
> > query times get worse, etc. We need to decide on what data we can throw 
> 
> > away and when.
> 
> 
> 
> Most of this sounds fine to me, except I wanna ask: how much does it cost to 
> store 1TB of data? Next to nothing, right? I'd say move it out of the primary 
> database to an archive area if you need to for performance, but why not keep 
> all the archives?
> 
> 
> 
> Dave

There is a cost beyond the cost of raw storage.  There is a cost in the 
maintenance of whatever archival and in making sure people can retrieve and 
make use of that data if it is asked for.

More important to me though is it is potentially at odds with the privacy 
principle of keeping data for only as long as it is known to have value.  
Saving it for a rainy day isn't good enough, we should have a clear 
understanding of how it might be used in the future and then come to an 
agreement over whether the potential value is worth the cost of keeping it.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to