Hi, One of the arguments for keeping more apache logs is that web analysis tools need them. Here is a survey of things I found in Debian and what they might need. First some general comments:
* ideally the analyzer scans the log, generates stats, and then stores the stats and doesn't need the log anymore. There still might be sensitive data in the results, but the data stored is greatly reduced. * even if logfile analyzers store results and are run often, we need to consider the case of machines with power management that might mean they are asleep at times. Hopefully anacron will help deal with that, but we probably want to be conservative and make sure they have plenty of opportunity to scan the logs before they are rotated. analog ====== Has the ability to do incremental processing with a cache file http://www.analog.cx/docs/cache.html debian/rules sets CACHEDIR (is that sufficient?) (only DNS?) Doesn't seem to run via cron. awffull ======= Fork of webalizer, see below. awstats ======= Gathers stats from logs every 10mins, updates html once a day. goaccess ======== Interactive user tool. Doesn't appear to store results and so probably needs the logs to be useful? lire ==== Appears to use a database, has daily/weekly/monthly cronjobs. Need more info about if it stores stats. visitors ======== Doesn't appear to store stats, probably requires logs? webalizer ========= Keeps track of where it is in the log files, maintains a an incremental history file, and runs via cron once a day. For the things that do store data, I think 7 days should be enough to ensure that they have a chance to process the logs before they get rotated. The above might be interesting for nginx log retention as well. -- Matt Taggart tagg...@debian.org -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org