Someone somewhere in beowulf land has certainly dealt with this before ... looking for a clue, tip or URL pointer if possible
I'm trying to stay on top of capacity planning and generating useful reports on how an expensive scale-out NAS volume is being used month-to-month. The 200TB namespace is filling up fast and I'm particularly interested in tracking about 30TB in genomic data that is growing at about 1.5TB/mo with both instrument and human generated data. For well curated directories that use year/month/day in the directory names a simple "du -mcs" recursing to a certain depths works fine for printing out a CSV with a directory name, a size in MB and the year/month it was last modified. That's all I'm really looking for. I want to show growth month-by-month and year-by-year and break it down by top-level directories that match either genome sequencing platform types or big project names... The manual / hacked / du methods are starting to fall over and/or just take too much human time to deal with. Anyone aware of an open source or freely available system for reporting on NAS usage trends? Something that can dump into a database so I could do custom reporting off of the results? I figure someone somewhere has written a smart and efficient filesystem trawler that can dump into a mySQL table or similar. I could hack something together myself but even a quick look at the CLI tools and misc perl/python modles for file and dir statistics seems to indicate that there are a lot of possibilities to make a dumb novice mistakes in the traversal, the size summing or the reporting. I'd like to avoid my own bad coding if at all possible. Anyone aware of systems that do something like this? Regards, Chris _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf