Hello,

First, thanks! I've been using GNU tar (and most of the other GNU lib) for 
almost 2 decades now (I'm 36). Even though GNU isn't unix, many of the best 
things about unix are really thanks to GNU.

Now...


Problem:
I frequently run into situations where I need to update archives. I of course 
also want to conserve space so I use compression. These two desires are not 
directly supported in the current version of tar. 

I am clearly no alone here:
http://www.google.com/search?q=tar+update+compressed+archive


I understand the challenge is that it's really more that the compression 
programs that are not supporting updating, rather than the fault of tar. ...but 
that is only because we have boxed ourselves into a corner by assuming that the 
compression algorithm is something that we pipe the regular tar output through. 
There is another way that tar can leverage compression.

Solution:
The solution requires two parts of the code to be modified:
 1) Compress each file before adding it to the archive.
 2) Upgrade the tar section of meta about each file in the archive to provide 
storage space for specifying what compression algorithm/program is used for 
that file (if any).

Interface changes:
 * There would need to be a new flag (-p, --pre_compress)

Cons
 * The resulting files would not be quite as small as if the total archive were 
compressed.
 * This is not a small code change.

Pros
 * Tar could then support the full set of options for updating (adding, 
replacing, removing) individual files from the archive.
 * Not all files need to be compressed. I frequently create back-ups of 
directories that contain compressed files. Tar could detect that files ending 
in .t?gz, or .bzip\d? are already compressed. Different files could be 
compressed using different algorithms.

I have a hackish Perl package/script that we use at Bee for creating archives 
that work this way. I also happen to know that at least a couple tech groups 
within IBM also have the convention/code for working with archives in this 
fashion.

I imagine that the toughest part would be the change to the headers. I would 
highly suggest moving to a named-element header format, so that the headers can 
be expanded upon later without much work.

Thanks,

-Carl

 
Carl Eklof
President @ BeeSoftware
[email protected] | p: 424.888.4BEE | f: 801.439.4213 | http://beesw.com/

Reply via email to