When I first scratched the itch of calculating checksums for every file in a tar archive, this was my original intention. When I decided I want the script in bash for simplicity, I forfeited the idea and settled for extracting the files and then going over all the files to calculate their checksum value.
So when Jon Flowers asked in the comments of the original
tarsum post about the possibility of getting the checksums of files in the tar file without extracting all the archive, I’ve decided to re-tackle the problem.
This time I’ve chose python and by using the
hashlib modules I came up with a solution that allowed me to go over tar files to calculate the checksum values without extracting all of them to the disk. However some sacrifices where made in the form of back-compatibility of the output. I’ve tried to make the interface similar to the old one, and have kept all the command line options. Instead of specifying a program name to calculate the checksum values (such as
sha1sum) as argument to
--checksum you specify the name of the checksum algorithm such as md5, sha1, sha256, sha512 (or any other supported by
Other changes where made so tar files can be piped directly into
tarsum (which also works transparently with bzip2 and gzip compression).
tarsum < sometarfile.tar.gz > sometarfile.tar.gz.md5
Performance-wise, according to some tests I’ve carried out, the new version is faster with big tar files than the old one, but it’s the other way around with small archives (which I find less important).
Update 2009-08-12: Removed excess argument to
tarsum() and switched the
r:*). Bumped version string.