Hello, Le Monday 25 August 2008 17:33:44 Kjetil Torgrim Homme, vous avez écrit : > I wanted to make a du(1) style report for Bacula, but was a bit > surprised to see that this information is not readily available in the > File table -- it's encoded as quasi-base64 in the LStat column. I > modified base64.sql[1] to support Bacula's format, but it's running > too slow to be useful, ie. less than 10k filesizes extracted per > second on my relatively beefy database server. For a largish fileset > this means a couple of minutes CPU time will be spent on that alone, > so it's impractical to do it without heavy caching -- and then we > might as well add the information while doing the backup.
The bweb interface have an interface like "filelight" tool that display used space with circular graphs. http://www.methylblue.com/filelight/ (bfileview module from bweb isn't so beautiful :) To have good performance, i've used a new table that store directory size and I have also a PL procedure that extract file size from the "base64" field (much faster than a piece of C that retrieves all rows). I can do something like SELECT SUM(extract_lstat('size', LStat)), filename from File .... > Does anyone else thinks it's a good idea to extend the table? Extend the file table isn't always possible, in this case you will add 8 bytes per row, in my case 500,000,000*8 bytes (near from 4GB). We will modify the file table in the next major version (to increase the FileId size), and perhaps we will add a Size field, that need some tests and reflexion. > A graphical disk usage browser would make it easier to visualise which > directories are big, or are growing the fastest -- or to spot files > which should be omitted, or even directories which are inadvertently > missing. > > Here's the current definition: > > +------------+------------------+------+-----+---------+----------------+ > > | Field | Type | Null | Key | Default | Extra | > > +------------+------------------+------+-----+---------+----------------+ > > | FileId | int(10) unsigned | NO | PRI | NULL | auto_increment | > | FileIndex | int(10) unsigned | YES | | 0 | | > | JobId | int(10) unsigned | NO | MUL | NULL | | > | PathId | int(10) unsigned | NO | MUL | NULL | | > | FilenameId | int(10) unsigned | NO | MUL | NULL | | > | MarkId | int(10) unsigned | YES | | 0 | | > | LStat | tinyblob | NO | | NULL | | > | MD5 | tinyblob | YES | | NULL | | > > +------------+------------------+------+-----+---------+----------------+ I see that you are using mysql, you won't be able to use the bweb module :( (at this time mysql don't permit this kind of feature) > As you can see, the MD5 sum is already stored there, and as a bonus > the combination of the file size and MD5 would make it possible to > implement incremental storage of files which grows by appending (logs, > mbox). Just calculate the MD5 sum of the first N bytes, and if it > matches, don't store the start of the file (this needs a new record > type, too). You don't even waste CPU time, since the calculation has > to be done anyway. Append-only files may be too rare to be worth the > special case code, though. IHMO, it's a bit too specific, but quite interesting :) Bye ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
