<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Guy Rutenberg &#187; tarsum</title>
	<atom:link href="http://www.guyrutenberg.com/tag/tarsum/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.guyrutenberg.com</link>
	<description>Keeping track of what I do</description>
	<lastBuildDate>Sat, 14 Jan 2012 11:30:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>tarsum-0.2 &#8211; A read only version of tarsum</title>
		<link>http://www.guyrutenberg.com/2009/04/29/tarsum-02-a-read-only-version-of-tarsum/</link>
		<comments>http://www.guyrutenberg.com/2009/04/29/tarsum-02-a-read-only-version-of-tarsum/#comments</comments>
		<pubDate>Wed, 29 Apr 2009 06:58:31 +0000</pubDate>
		<dc:creator>Guy</dc:creator>
				<category><![CDATA[Projects]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[tarsum]]></category>

		<guid isPermaLink="false">http://www.guyrutenberg.com/?p=315</guid>
		<description><![CDATA[When I first scratched the itch of calculating checksums for every file in a tar archive, this was my original intention. When I decided I want the script in bash for simplicity, I forfeited the idea and settled for extracting the files and then going over all the files to calculate their checksum value. So [...]]]></description>
			<content:encoded><![CDATA[<p>When I first scratched the itch of calculating checksums for every file in a tar archive, this was my original intention. When I decided I want the script in bash for simplicity, I forfeited the idea and settled for extracting the files and then going over all the files to calculate their checksum value.</p>
<p>So when <a href="/2008/10/24/tarsum-calculate-checksum-for-files-inside-tar-archive/#comment-19087">Jon Flowers asked</a> in the comments of the original <a href="/2008/10/24/tarsum-calculate-checksum-for-files-inside-tar-archive/"><code>tarsum</code> post</a> about the possibility of getting the checksums of files in the tar file without extracting all the archive, I&#8217;ve decided to re-tackle the problem.</p>
<p><span id="more-315"></span></p>
<p>This time I&#8217;ve chose python and by using the <code>tarfile</code> and <code>hashlib</code> modules I came up with a solution that allowed me to go over tar files to calculate the checksum values without extracting all of them to the disk. However some sacrifices where made in the form of back-compatibility of the output. I&#8217;ve tried to make the interface similar to the old one, and have kept all the command line options. Instead of specifying a program name to calculate the checksum values (such as <code>sha1sum</code>) as argument to <code>--checksum</code> you specify the name of the checksum algorithm such as md5, sha1, sha256, sha512 (or any other supported by <code>hashlib</code>).</p>
<p>Other changes where made so tar files can be piped directly into <code>tarsum</code> (which also works transparently with bzip2 and gzip compression).</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tarsum &lt; sometarfile.tar.gz &gt; sometarfile.tar.gz.md5</pre></div></div>

<p>Performance-wise, according to some tests I&#8217;ve carried out, the new version is faster with big tar files than the old one, but it&#8217;s the other way around with small archives (which I find less important).</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">#! /usr/bin/env python</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># Copyright (C) 2008-2009 by Guy Rutenberg</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># This program is free software; you can redistribute it and/or modify</span>
<span style="color: #808080; font-style: italic;"># it under the terms of the GNU General Public License as published by</span>
<span style="color: #808080; font-style: italic;"># the Free Software Foundation; either version 2 of the License, or</span>
<span style="color: #808080; font-style: italic;"># (at your option) any later version.</span>
<span style="color: #808080; font-style: italic;">#</span>
<span style="color: #808080; font-style: italic;"># This program is distributed in the hope that it will be useful,</span>
<span style="color: #808080; font-style: italic;"># but WITHOUT ANY WARRANTY; without even the implied warranty of</span>
<span style="color: #808080; font-style: italic;"># MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the</span>
<span style="color: #808080; font-style: italic;"># GNU General Public License for more details.</span>
<span style="color: #808080; font-style: italic;">#</span>
<span style="color: #808080; font-style: italic;"># You should have received a copy of the GNU General Public License</span>
<span style="color: #808080; font-style: italic;"># along with this program; if not, write to the</span>
<span style="color: #808080; font-style: italic;"># Free Software Foundation, Inc.,</span>
<span style="color: #808080; font-style: italic;"># 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">import</span> hashlib
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">tarfile</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> tarsum<span style="color: black;">&#40;</span>input_file, <span style="color: #008000;">hash</span>, output_file<span style="color: black;">&#41;</span>:
        <span style="color: #483d8b;">&quot;&quot;&quot;
        input_file  - A FILE object to read the tar file from.
        hash - The name of the hash to use. Must be supported by hashlib.
        output_file - A FILE to write the computed signatures to.
        &quot;&quot;&quot;</span>
        tar = <span style="color: #dc143c;">tarfile</span>.<span style="color: #008000;">open</span><span style="color: black;">&#40;</span>mode=<span style="color: #483d8b;">&quot;r|*&quot;</span>, fileobj=input_file<span style="color: black;">&#41;</span>
&nbsp;
        chunk_size = <span style="color: #ff4500;">100</span><span style="color: #66cc66;">*</span><span style="color: #ff4500;">1024</span>
        store_digests = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">for</span> member <span style="color: #ff7700;font-weight:bold;">in</span> tar:
            <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> member.<span style="color: black;">isfile</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">continue</span>
            f = tar.<span style="color: black;">extractfile</span><span style="color: black;">&#40;</span>member<span style="color: black;">&#41;</span>
            h = hashlib.<span style="color: #dc143c;">new</span><span style="color: black;">&#40;</span><span style="color: #008000;">hash</span><span style="color: black;">&#41;</span>
            data = f.<span style="color: black;">read</span><span style="color: black;">&#40;</span>chunk_size<span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">while</span> data:
                h.<span style="color: black;">update</span><span style="color: black;">&#40;</span>data<span style="color: black;">&#41;</span>
                data = f.<span style="color: black;">read</span><span style="color: black;">&#40;</span>chunk_size<span style="color: black;">&#41;</span>
            output_file.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;%s  %s<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>h.<span style="color: black;">hexdigest</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>, member.<span style="color: black;">name</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> main<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
    <span style="color: #dc143c;">parser</span> = OptionParser<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
    version=<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;%prog 0.2.1<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>
             <span style="color: #483d8b;">&quot;Copyright (C) 2008-2009 Guy Rutenberg &lt;http://www.guyrutenberg.com/contact-me&gt;&quot;</span><span style="color: black;">&#41;</span>
    usage=<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;%prog [options] TARFILE<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>
           <span style="color: #483d8b;">&quot;Print a checksum signature for every file in TARFILE.<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>
           <span style="color: #483d8b;">&quot;With no FILE, or when FILE is -, read standard input.&quot;</span><span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">parser</span> = OptionParser<span style="color: black;">&#40;</span>usage=usage, version=version<span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">parser</span>.<span style="color: black;">add_option</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;-c&quot;</span>, <span style="color: #483d8b;">&quot;--checksum&quot;</span>, dest=<span style="color: #483d8b;">&quot;checksum&quot;</span>, <span style="color: #008000;">type</span>=<span style="color: #483d8b;">&quot;string&quot;</span>,
        <span style="color: #008000;">help</span>=<span style="color: #483d8b;">&quot;use HASH as for caclculating the checksums. [default: %default]&quot;</span>, metavar=<span style="color: #483d8b;">&quot;HASH&quot;</span>,
        default=<span style="color: #483d8b;">&quot;md5&quot;</span><span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">parser</span>.<span style="color: black;">add_option</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;-o&quot;</span>, <span style="color: #483d8b;">&quot;--output&quot;</span>, dest=<span style="color: #483d8b;">&quot;output&quot;</span>, <span style="color: #008000;">type</span>=<span style="color: #483d8b;">&quot;string&quot;</span>,
        <span style="color: #008000;">help</span>=<span style="color: #483d8b;">&quot;save signatures to FILE.&quot;</span>, metavar=<span style="color: #483d8b;">&quot;FILE&quot;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: black;">&#40;</span>option, args<span style="color: black;">&#41;</span> = <span style="color: #dc143c;">parser</span>.<span style="color: black;">parse_args</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
    output_file = <span style="color: #dc143c;">sys</span>.<span style="color: black;">stdout</span>
    <span style="color: #ff7700;font-weight:bold;">if</span> option.<span style="color: black;">output</span>:
        output_file = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span>option.<span style="color: black;">output</span>, <span style="color: #483d8b;">&quot;w&quot;</span><span style="color: black;">&#41;</span>
&nbsp;
    input_file = <span style="color: #dc143c;">sys</span>.<span style="color: black;">stdin</span>
    <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>args<span style="color: black;">&#41;</span>==<span style="color: #ff4500;">1</span> <span style="color: #ff7700;font-weight:bold;">and</span> args<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: #66cc66;">!</span>=<span style="color: #483d8b;">&quot;-&quot;</span>:
        input_file = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span>args<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>, <span style="color: #483d8b;">&quot;r&quot;</span><span style="color: black;">&#41;</span>
&nbsp;
    tarsum<span style="color: black;">&#40;</span>input_file, option.<span style="color: black;">checksum</span>, output_file<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">&quot;__main__&quot;</span>:
    <span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">optparse</span> <span style="color: #ff7700;font-weight:bold;">import</span> OptionParser
    <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span>
    main<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># vim: ai ts=4 sts=4 et sw=4</span></pre></div></div>

<p>You can get a wget&#8217;able version here: <a href="/wp-content/uploads/2009/04/tarsum-0.2.bz2">tarsum-0.2.bz2</a>.</p>
<p><strong>Update 2009-08-12:</strong> Removed excess argument to <code>tarsum()</code> and switched the <code>filemode</code> to <code>r|*</code> (from <code>r:*</code>). Bumped version string.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.guyrutenberg.com/2009/04/29/tarsum-02-a-read-only-version-of-tarsum/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>tarsum &#8211; Calculate Checksum for Files inside Tar Archive</title>
		<link>http://www.guyrutenberg.com/2008/10/24/tarsum-calculate-checksum-for-files-inside-tar-archive/</link>
		<comments>http://www.guyrutenberg.com/2008/10/24/tarsum-calculate-checksum-for-files-inside-tar-archive/#comments</comments>
		<pubDate>Fri, 24 Oct 2008 20:02:05 +0000</pubDate>
		<dc:creator>Guy</dc:creator>
				<category><![CDATA[Bash]]></category>
		<category><![CDATA[Projects]]></category>
		<category><![CDATA[tarsum]]></category>

		<guid isPermaLink="false">http://www.guyrutenberg.com/?p=149</guid>
		<description><![CDATA[Update: I&#8217;ve released tarsum-0.2, a new version of tarsum. Some time ago, I got back a hard disk back from data recovery. One of the annoying issues I encountered with the recovered data was corrupted files. Some files looked like they were recovered successfully but their content was corrupted. The ones that were configuration files, [...]]]></description>
			<content:encoded><![CDATA[<p><b>Update</b>: I&#8217;ve released <a href="/2009/04/29/tarsum-02-a-read-only-version-of-tarsum/">tarsum-0.2</a>, a new version of <code>tarsum</code>.</p>
<p>Some time ago, I got back a hard disk back from data recovery. One of the annoying issues I encountered with the recovered data was corrupted files. Some files looked like they were recovered successfully but their content was corrupted. The ones that were configuration files, where usually easy to detect, as it raised errors in programs that tried to use them. But when such error occurs in some general text file, (or inside the data of an SQL dump), the file may seem correctly fine unless closely inspected.</p>
<p>I have an habit of storing old backups on CDs (they are initially made to online storage), I do it in order to reduce backup costs. But the recovered/corrupted data issue raised some concerns about my ability to recover using this disks. Assuming that I have a disk failure, and I couldn&#8217;t recover from my online backups for reason, how can I check the integrity of my CD backups?</p>
<p>Only storing and comparing hash signature for the whole archive, is almost useless. It allows you to validate whether all the files are probably fine, but it can&#8217;t tell apart one corrupted file in the archive from a completed corrupted archive. My idea was to calculate checksum (hash) for each file in the data and store the signature in a way that would allow me to see which individual files are corrupted.</p>
<p>This is where <code>tarsum</code> comes to the rescue. As it&#8217;s name applies it calculate checksum for each file in the archive. You can download <code>tarsum</code> from <a href="/wp-content/uploads/2008/10/tarsum.gz">here</a>.</p>
<p>Using tarsum is pretty straight forward.</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tarsum backup.tar &gt; backup.tar.md5</pre></div></div>

<p>Calculates the MD5 checksums of the files. You can specify other hashes as well, by passing a tool that calculates it (it must work like <code>md5sum</code>).</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tarsum --checksum=sha256sum backup.tar &gt; backup.tar.sha256</pre></div></div>

<p>To verify the integrity of the files inside the archive we use the <code>diff</code> command:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">tarsum backup.tar | diff backup.tar.md5 -</pre></div></div>

<p>where <code>backup.tar.md5</code> is the original signature file we created. This is possible because the signatures are sorted alphabetically by the file name inside the archive, so it the order of the files is always the same.</p>
<p>Note that if you use an updated version of GNU tar, <code>tarsum</code> can also operate directly on compressed archives (e.g. tar.bz2, tar.gz).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.guyrutenberg.com/2008/10/24/tarsum-calculate-checksum-for-files-inside-tar-archive/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.335 seconds -->

