<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Guy Rutenberg &#187; optimization</title>
	<atom:link href="http://www.guyrutenberg.com/tag/optimization/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.guyrutenberg.com</link>
	<description>Keeping track of what I do</description>
	<lastBuildDate>Wed, 16 Jun 2010 19:53:40 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Optimizing for Loops: Reverse Loops</title>
		<link>http://www.guyrutenberg.com/2007/11/07/optimizing-for-loops-reverse-loops/</link>
		<comments>http://www.guyrutenberg.com/2007/11/07/optimizing-for-loops-reverse-loops/#comments</comments>
		<pubDate>Wed, 07 Nov 2007 14:26:24 +0000</pubDate>
		<dc:creator>Guy</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[optimization]]></category>

		<guid isPermaLink="false">http://www.guyrutenberg.com/2007/11/07/optimizing-for-loops-reverse-loops/</guid>
		<description><![CDATA[for loops are basic language constructs in many languages. One of the first thing to look at when optimizing code is the loops, as they do considerable amounts of work (like going through a very large amount of data), in very little code.
If you go use for loop, but you don&#8217;t really care about the [...]]]></description>
			<content:encoded><![CDATA[<p><code>for</code> loops are basic language constructs in many languages. One of the first thing to look at when optimizing code is the loops, as they do considerable amounts of work (like going through a very large amount of data), in very little code.</p>
<p>If you go use <code>for</code> loop, but you don&#8217;t really care about the order in which the loop is executed, to be more precise, if you can afford reversing to loop, you can save quite some time. By reversing the loop I mean instead of giving the index values from 0 to 10 for example, you go from 10 downward to zero. This doesn&#8217;t seem like a big change, but when being carefully implemented this can easily upgrade the performance of your <code>for</code> loops.<br />
<span id="more-27"></span><br />
Take a look at the following <code>for</code> loop implementation:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i<span style="color: #000080;">=</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i<span style="color: #000080;">&lt;</span><span style="color: #0000dd;">10</span><span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
	<span style="color: #666666;">//some work</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>This loop goes from 0 to 10 and does some work. If the kind of work done allows one to go from 9 to 0, a <code>for</code> loop with the same functionality can be implemented like this:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i<span style="color: #000080;">=</span><span style="color: #0000dd;">9</span><span style="color: #008080;">;</span> i<span style="color: #000040;">--</span><span style="color: #008080;">;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
	<span style="color: #666666;">//some work</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>Notice the difference between the two implementations. The first one, needs to compare the index to the stopping value, and then increases the index for the next iteration. On the other hand, the second implementation just checks that the index isn&#8217;t zero and increases it in the same statement.</p>
<p>To check if the theory behind this optimization is right, I&#8217;ve put a short piece of code to check it.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #666666;">// fortest.cpp</span>
<span style="color: #339900;">#include &lt;iostream&gt;</span>
<span style="color: #339900;">#include &lt;time.h&gt;</span>
<span style="color: #0000ff;">using</span> <span style="color: #0000ff;">namespace</span> std<span style="color: #008080;">;</span>
&nbsp;
timespec diff<span style="color: #008000;">&#40;</span>timespec start, timespec end<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">int</span> main<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
	timespec time1, time2<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">unsigned</span> <span style="color: #0000ff;">int</span> i, temp <span style="color: #000080;">=</span> <span style="color: #0000dd;">1</span><span style="color: #008080;">;</span>
	clock_gettime<span style="color: #008000;">&#40;</span>CLOCK_PROCESS_CPUTIME_ID, <span style="color: #000040;">&amp;</span>time1<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span>i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;=</span> <span style="color: #0000dd;">2420000000</span><span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span>
		temp<span style="color: #000040;">+</span><span style="color: #000080;">=</span>temp<span style="color: #008080;">;</span>
	clock_gettime<span style="color: #008000;">&#40;</span>CLOCK_PROCESS_CPUTIME_ID, <span style="color: #000040;">&amp;</span>time2<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000dd;">cout</span><span style="color: #000080;">&lt;&lt;</span>diff<span style="color: #008000;">&#40;</span>time1,time2<span style="color: #008000;">&#41;</span>.<span style="color: #007788;">tv_sec</span><span style="color: #000080;">&lt;&lt;</span><span style="color: #FF0000;">&quot;:&quot;</span><span style="color: #000080;">&lt;&lt;</span>diff<span style="color: #008000;">&#40;</span>time1,time2<span style="color: #008000;">&#41;</span>.<span style="color: #007788;">tv_nsec</span><span style="color: #000080;">&lt;&lt;</span>endl<span style="color: #008080;">;</span>
	temp <span style="color: #000080;">=</span> <span style="color: #0000dd;">1</span><span style="color: #008080;">;</span>
	clock_gettime<span style="color: #008000;">&#40;</span>CLOCK_PROCESS_CPUTIME_ID, <span style="color: #000040;">&amp;</span>time1<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span>i <span style="color: #000080;">=</span> <span style="color: #0000dd;">2420000000</span><span style="color: #008080;">;</span> i<span style="color: #000040;">--</span><span style="color: #008080;">;</span> <span style="color: #008000;">&#41;</span>
		temp<span style="color: #000040;">+</span><span style="color: #000080;">=</span>temp<span style="color: #008080;">;</span>
	clock_gettime<span style="color: #008000;">&#40;</span>CLOCK_PROCESS_CPUTIME_ID, <span style="color: #000040;">&amp;</span>time2<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
	<span style="color: #0000dd;">cout</span><span style="color: #000080;">&lt;&lt;</span>diff<span style="color: #008000;">&#40;</span>time1,time2<span style="color: #008000;">&#41;</span>.<span style="color: #007788;">tv_sec</span><span style="color: #000080;">&lt;&lt;</span><span style="color: #FF0000;">&quot;:&quot;</span><span style="color: #000080;">&lt;&lt;</span>diff<span style="color: #008000;">&#40;</span>time1,time2<span style="color: #008000;">&#41;</span>.<span style="color: #007788;">tv_nsec</span><span style="color: #000080;">&lt;&lt;</span>endl<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">return</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
timespec diff<span style="color: #008000;">&#40;</span>timespec start, timespec end<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
	timespec temp<span style="color: #008080;">;</span>
	<span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span><span style="color: #008000;">&#40;</span>end.<span style="color: #007788;">tv_nsec</span><span style="color: #000040;">-</span>start.<span style="color: #007788;">tv_nsec</span><span style="color: #008000;">&#41;</span><span style="color: #000080;">&lt;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
		temp.<span style="color: #007788;">tv_sec</span> <span style="color: #000080;">=</span> end.<span style="color: #007788;">tv_sec</span><span style="color: #000040;">-</span>start.<span style="color: #007788;">tv_sec</span><span style="color: #000040;">-</span><span style="color: #0000dd;">1</span><span style="color: #008080;">;</span>
		temp.<span style="color: #007788;">tv_nsec</span> <span style="color: #000080;">=</span> <span style="color: #0000dd;">1000000000</span><span style="color: #000040;">+</span>end.<span style="color: #007788;">tv_nsec</span><span style="color: #000040;">-</span>start.<span style="color: #007788;">tv_nsec</span><span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span> <span style="color: #0000ff;">else</span> <span style="color: #008000;">&#123;</span>
		temp.<span style="color: #007788;">tv_sec</span> <span style="color: #000080;">=</span> end.<span style="color: #007788;">tv_sec</span><span style="color: #000040;">-</span>start.<span style="color: #007788;">tv_sec</span><span style="color: #008080;">;</span>
		temp.<span style="color: #007788;">tv_nsec</span> <span style="color: #000080;">=</span> end.<span style="color: #007788;">tv_nsec</span><span style="color: #000040;">-</span>start.<span style="color: #007788;">tv_nsec</span><span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
	<span style="color: #0000ff;">return</span> temp<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>To compile it use <code>g++ -lrt fortest.cpp -o fortest</code> (don&#8217;t turn on, yet, any kind of compiler optimization). The program prints two line, one for every kind of <code>for</code> loop. Each line states the time it took for the for loop to complete in a seconds:nanoseconds format.</p>
<p>A typical run of the program on my machine resulted in:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">10:338986608
9:728866372</pre></div></div>

<p>The difference is about 0.5 seconds, which isn&#8217;t very small. On the other hand, it&#8217;s pretty small if taking into account that we did very little work in every iteration. But nonetheless it a speed gain you can easily achieve by just reversing the <code>for</code> loop code.</p>
<p>By the way, if you do use optimization, the speed gain is smaller, but compared to the runtime of the loop it can improve runtime by up to 70% (all my test showed an improvement of at least 50%).</p>
<p>For example one the same code compiled with the &#8220;-O2&#8243; optimization flag, I got the following output:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">0:873
0:255</pre></div></div>

<p>Which is a big improvement when considering the the total runtime of the traditional loop.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.guyrutenberg.com/2007/11/07/optimizing-for-loops-reverse-loops/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>The Revised String Iteration Benchmark</title>
		<link>http://www.guyrutenberg.com/2007/09/26/the-revised-string-iteration-benchmark/</link>
		<comments>http://www.guyrutenberg.com/2007/09/26/the-revised-string-iteration-benchmark/#comments</comments>
		<pubDate>Wed, 26 Sep 2007 16:44:42 +0000</pubDate>
		<dc:creator>Guy</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[optimization]]></category>

		<guid isPermaLink="false">http://www.guyrutenberg.com/2007/09/26/the-revised-string-iteration-benchmark/</guid>
		<description><![CDATA[In this post I&#8217;m going to discuss again the string benchmark I did before to find out what is the fastest way to iterate over an std::string. If you haven&#8217;t read the previous post on this subject go a head and read it as it covers the basic idea behind this benchmark. As I did [...]]]></description>
			<content:encoded><![CDATA[<p>In this post I&#8217;m going to discuss again the <a href="/2007/08/30/what-is-the-fastest-method-to-iterate-over-a-string/">string benchmark</a> I did before to find out what is the fastest way to iterate over an <code>std::string</code>. If you haven&#8217;t read the previous post on this subject go a head and read it as it covers the basic idea behind this benchmark. As I did the last time I did the benchmark, I check 5 ways of iteration:<br />
<span id="more-20"></span></p>
<ul>
<li>Iterating using the native indexes.</li>
<li>Iterating using the <code>string::at</code> method.</li>
<li>Iterating using indexes over the C string representation of the string.</li>
<li>Iterating using pointers over the C string representation of the string.</li>
<li>Iterating using the STL iterators.</li>
</ul>
<p>The basic operation that tests the performance of each kind of iteration hasn&#8217;t been changed, and as in the last benchmark I used a test string of 50,000,000 characters. Instead I&#8217;ve altered the timing mechanism. In the first test, each one of the iteration methods had it&#8217;s one executable and the performance was timed as the execution time timed by the <code>time</code> command for each executable, this methods has a problematic overhead in the form of the time it takes to load to memory each one of the 50MB binaries. In this test I&#8217;ve  fixed this problem and now the tests timed using <a href="/2007/09/22/profiling-code-using-clock_gettime/"><code>clock_gettime</code></a>. This allowed me to preform all tests from within the same executable and without the overhead of the actual loading time. As a bonus the timer in <code>clock_gettime</code> has much higher resolution, so the results should be much more accurate.</p>
<p>To preform the test download the benchmark&#8217;s source code &#8211; <a href='http://www.guyrutenberg.com/wp-content/uploads/2007/09/stringbenchmark2.tar.gz' title='stringbenchmark2.tar.gz'>stringbenchmark2.tar.gz</a>. After you download it go to the directory where you saved the archive and execute<br />
<code><br />
tar -zxvf stringbenchmark2.tar.gz<br />
cd stringbenchmark2<br />
make<br />
./benchmark<br />
</code><br />
The last command is the actual benchmark. It will ran each of the tests and print its timing in the format of seconds:nanoseconds. For example a typical output of the benchmark on my system looks like this:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">String iteration benchmark
String length: 50000000
iteration using indexes
0:588484755
&nbsp;
iteration using indexes over C string
0:617691596
&nbsp;
iteration using string::at()
0:615612772
&nbsp;
iteration using pointers over C string
0:515342342
&nbsp;
iteration using iterators
2:166078723</pre></div></div>

<p>As you can see the results (in the relative speeds between the iteration methods) remained the same as in the last benchmark. Iterating using pointers over the C string representation is the fastest method by a small margin and using iterators is the slowest by far. This time the results are much more accurate and the difference between closely performing methods such as <code>string::at</code> and using index over the C string repesentation is easily noticeable.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.guyrutenberg.com/2007/09/26/the-revised-string-iteration-benchmark/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Profiling Code Using clock_gettime</title>
		<link>http://www.guyrutenberg.com/2007/09/22/profiling-code-using-clock_gettime/</link>
		<comments>http://www.guyrutenberg.com/2007/09/22/profiling-code-using-clock_gettime/#comments</comments>
		<pubDate>Sat, 22 Sep 2007 08:55:16 +0000</pubDate>
		<dc:creator>Guy</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[optimization]]></category>

		<guid isPermaLink="false">http://www.guyrutenberg.com/2007/09/22/profiling-code-using-clock_gettime/</guid>
		<description><![CDATA[After raising the issue of the low resolution problem of the timer provided by clock() in Resolution Problems in clock(), I&#8217;ve ended the post by mentioning to two more functions that should provide high-resolution timers suitable for profiling code. In this post I will discuss one of them, clock_gettime().

The clock_gettime() provides access to several useful [...]]]></description>
			<content:encoded><![CDATA[<p>After raising the issue of the low resolution problem of the timer provided by <code>clock()</code> in <a href="">Resolution Problems in <code>clock()</code></a>, I&#8217;ve ended the post by mentioning to two more functions that should provide high-resolution timers suitable for profiling code. In this post I will discuss one of them, <code>clock_gettime()</code>.<br />
<span id="more-18"></span><br />
The <code>clock_gettime()</code> provides access to several useful timers with the resolution of nanoseconds. First, the prototype for the function is as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">int</span> clock_gettime<span style="color: #009900;">&#40;</span>clockid_t clk_id<span style="color: #339933;">,</span> <span style="color: #993333;">struct</span> timespect <span style="color: #339933;">*</span>tp<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>The <code>clk_id</code> allows us to select a specific clock from the several<br />
offered by the system, which includes:</p>
<ul>
<li>
	<code>CLOCK_REALTIME</code>, a system-wide realtime clock.
	</li>
<li>
	<code>CLOCK_PROCESS_CPUTIME_ID</code>, high-resolution timer provided by the CPU for each process.
	</li>
<li>
	<code>CLOCK_THREAD_CPUTIME_ID</code>, high-resolution timer provided by the CPU for each of the threads.
	</li>
</ul>
<p>Usually, there are more clocks provided, but I find these three the most useful as they allow to get the execution time spent in the system level, process level and the thread level.</p>
<p>The current clock time, for the chosen clock is stored int the struct provided by the <code>*tp</code> pointer. The <code>timespec</code> struct is defined as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">struct</span> timespec <span style="color: #009900;">&#123;</span>
	time_t tv_sec<span style="color: #339933;">;</span> <span style="color: #808080; font-style: italic;">/* seconds */</span>
	<span style="color: #993333;">long</span> tv_nsec<span style="color: #339933;">;</span> <span style="color: #808080; font-style: italic;">/* nanoseconds */</span>
<span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></div></div>

<p>To time the processing time some function took, one should call <code>clock_gettime()</code> twice. Once before the function call and once right after it and subtract the returned timings to get the actual runtime.</p>
<p>Getting the difference between two timespec structs isn&#8217;t very complicated and can be acheived using the function <code>diff()</code> defined bellow:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">timespec diff<span style="color: #009900;">&#40;</span>timespec start<span style="color: #339933;">,</span> timespec end<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
	timespec temp<span style="color: #339933;">;</span>
	<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>end.<span style="color: #202020;">tv_nsec</span><span style="color: #339933;">-</span>start.<span style="color: #202020;">tv_nsec</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">&lt;</span><span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		temp.<span style="color: #202020;">tv_sec</span> <span style="color: #339933;">=</span> end.<span style="color: #202020;">tv_sec</span><span style="color: #339933;">-</span>start.<span style="color: #202020;">tv_sec</span><span style="color: #339933;">-</span><span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
		temp.<span style="color: #202020;">tv_nsec</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">1000000000</span><span style="color: #339933;">+</span>end.<span style="color: #202020;">tv_nsec</span><span style="color: #339933;">-</span>start.<span style="color: #202020;">tv_nsec</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span> <span style="color: #b1b100;">else</span> <span style="color: #009900;">&#123;</span>
		temp.<span style="color: #202020;">tv_sec</span> <span style="color: #339933;">=</span> end.<span style="color: #202020;">tv_sec</span><span style="color: #339933;">-</span>start.<span style="color: #202020;">tv_sec</span><span style="color: #339933;">;</span>
		temp.<span style="color: #202020;">tv_nsec</span> <span style="color: #339933;">=</span> end.<span style="color: #202020;">tv_nsec</span><span style="color: #339933;">-</span>start.<span style="color: #202020;">tv_nsec</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
	<span style="color: #b1b100;">return</span> temp<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Now let&#8217;s move to some real example:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#include &lt;iostream&gt;</span>
<span style="color: #339933;">#include &lt;time.h&gt;</span>
using namespace std<span style="color: #339933;">;</span>
&nbsp;
timespec diff<span style="color: #009900;">&#40;</span>timespec start<span style="color: #339933;">,</span> timespec end<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
	timespec time1<span style="color: #339933;">,</span> time2<span style="color: #339933;">;</span>
	<span style="color: #993333;">int</span> temp<span style="color: #339933;">;</span>
	clock_gettime<span style="color: #009900;">&#40;</span>CLOCK_PROCESS_CPUTIME_ID<span style="color: #339933;">,</span> <span style="color: #339933;">&amp;</span>time1<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	<span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> i <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span> i<span style="color: #339933;">&lt;</span> <span style="color: #0000dd;">242000000</span><span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span>
		temp<span style="color: #339933;">+=</span>temp<span style="color: #339933;">;</span>
	clock_gettime<span style="color: #009900;">&#40;</span>CLOCK_PROCESS_CPUTIME_ID<span style="color: #339933;">,</span> <span style="color: #339933;">&amp;</span>time2<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	cout<span style="color: #339933;">&lt;&lt;</span>diff<span style="color: #009900;">&#40;</span>time1<span style="color: #339933;">,</span>time2<span style="color: #009900;">&#41;</span>.<span style="color: #202020;">tv_sec</span><span style="color: #339933;">&lt;&lt;</span><span style="color: #ff0000;">&quot;:&quot;</span><span style="color: #339933;">&lt;&lt;</span>diff<span style="color: #009900;">&#40;</span>time1<span style="color: #339933;">,</span>time2<span style="color: #009900;">&#41;</span>.<span style="color: #202020;">tv_nsec</span><span style="color: #339933;">&lt;&lt;</span>endl<span style="color: #339933;">;</span>
	<span style="color: #b1b100;">return</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
timespec diff<span style="color: #009900;">&#40;</span>timespec start<span style="color: #339933;">,</span> timespec end<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
	timespec temp<span style="color: #339933;">;</span>
	<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>end.<span style="color: #202020;">tv_nsec</span><span style="color: #339933;">-</span>start.<span style="color: #202020;">tv_nsec</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">&lt;</span><span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		temp.<span style="color: #202020;">tv_sec</span> <span style="color: #339933;">=</span> end.<span style="color: #202020;">tv_sec</span><span style="color: #339933;">-</span>start.<span style="color: #202020;">tv_sec</span><span style="color: #339933;">-</span><span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
		temp.<span style="color: #202020;">tv_nsec</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">1000000000</span><span style="color: #339933;">+</span>end.<span style="color: #202020;">tv_nsec</span><span style="color: #339933;">-</span>start.<span style="color: #202020;">tv_nsec</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span> <span style="color: #b1b100;">else</span> <span style="color: #009900;">&#123;</span>
		temp.<span style="color: #202020;">tv_sec</span> <span style="color: #339933;">=</span> end.<span style="color: #202020;">tv_sec</span><span style="color: #339933;">-</span>start.<span style="color: #202020;">tv_sec</span><span style="color: #339933;">;</span>
		temp.<span style="color: #202020;">tv_nsec</span> <span style="color: #339933;">=</span> end.<span style="color: #202020;">tv_nsec</span><span style="color: #339933;">-</span>start.<span style="color: #202020;">tv_nsec</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
	<span style="color: #b1b100;">return</span> temp<span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>To use <code>clock_gettime</code> you need to include <code>time.h</code> and to link to <code>librt.a</code>. If you use <code>gcc</code> just make sure you add <code>-lrt</code> to your list of arguments.</p>
<p>Play a bit with the length of the for loop. As you can see <code>clock_gettime</code> provides much more accurate results and can register very short processing time too. Just remember that as the case with any profiling functions, this function adds a little overhead to your program, so make sure you disable the profiling code, using preprocessor commands for example, in the production release.</p>
<h4>26/9/2007 &#8211; Update</h4>
<p>You may want to take a look at <a href="/2007/09/26/the-revised-string-iteration-benchmark/">The Revised String Iteration Benchmark</a> post for another, larger, example of using <code>clock_gettime</code> to time performance of code.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.guyrutenberg.com/2007/09/22/profiling-code-using-clock_gettime/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
		<item>
		<title>What is the Fastest Method to Iterate Over a String?</title>
		<link>http://www.guyrutenberg.com/2007/08/30/what-is-the-fastest-method-to-iterate-over-a-string/</link>
		<comments>http://www.guyrutenberg.com/2007/08/30/what-is-the-fastest-method-to-iterate-over-a-string/#comments</comments>
		<pubDate>Thu, 30 Aug 2007 15:57:18 +0000</pubDate>
		<dc:creator>Guy</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[optimization]]></category>

		<guid isPermaLink="false">http://www.guyrutenberg.com/2007/08/30/what-is-the-fastest-method-to-iterate-over-a-string/</guid>
		<description><![CDATA[Few days ago I decided to check what is really the fastest method to iterate over strings in C++. As a string class I chose string class from STL as it is very popular and provides a couple of ways to iterate it. So how can one iterate over an std::string?

By using indexes. E.g. str[i] [...]]]></description>
			<content:encoded><![CDATA[<p>Few days ago I decided to check what is really the fastest method to iterate over strings in C++. As a string class I chose string class from STL as it is very popular and provides a couple of ways to iterate it. So how can one iterate over an <code>std::string</code>?</p>
<ol>
<li>By using indexes. E.g. <code>str[i]</code> and running <code>i</code> from zero to the length of the string.</li>
<li>By using the <code>at</code> method. <code>string::at(size_t pos)</code> provides similar interface to indexes with the exceptions that it checks whether the given position is past the end of the string and then throws an exception. One may see it as the safe version of the regular index.</li>
<li>Treating the string as a sequence of characters and and iterate over it using iterators.</li>
<li>Using <code>string::c_str()</code> to get a pointer to a regular C string representation of the string stored in the <code>std::string</code> and treating it as array, e.g. using indexes to go over it.</li>
<li>The last way to iterate over the string is to get a pointer to a C string representation using <code>string::c_str()</code> and advancing the pointer itself to iterate over the string.</li>
</ol>
<p>The third method is the native method of iterating over objects in STL, and like the last two it can&#8217;t be used if the iteration changes the string itself (e.g. inserting or deleting characters). The first and second method are similar to the fourth (treating the pointer to the C string as an array), except that they aren&#8217;t so problematic as the latter when changing the string. The second method is the safest as it&#8217;s the only one that does range checks and throws exception if trying to access positions which are outside the string.</p>
<p>To benchmark and find out which method is the fastest method to iterate over a string I&#8217;ve created a huge string of random characters ranging from &#8216;a&#8217; to &#8216;z&#8217; and five executables, each one implementing one of the above iteration methods to do a simple task<br />
(count the number of occurrences of each letter). The string is fifty million characters long which, as the longer the string the less important the overhead becomes. </p>
<p>The executables for the benchmark of every version were compiled with the default setting of <code>g++</code> (without optimization as the compiler might change the iteration methods when optimizing). The benchmark executables where timed by using the <code>time</code> command and redirecting the executables output to <code>/dev/null</code>. The tests were run both on 64bit Gentoo (with 1 GB RAM) and on 32bit Kubuntu (with 512 MB RAM), to make sure the overall results (which method it better not the runtime itself) isn&#8217;t system depended.</p>
<p><span id="more-11"></span></p>
<p>Now to the result itself. After running the benchmark couple of times I came up with the following conclusions: Don&#8217;t use iterators for string iteration. Iterators came last on every test and usually times up to three times more than the slowest method besides it. Iterators may be the native STL way to iterate over STL containers but it provides very slow way to so. While iterators can be called STL pointers as the work and behave much the same way pointers do, they didn&#8217;t preform even close to pointers. Iterating using pointers came up as the fastest way to iterate over a string with a small margin over string::at() and indexes (both over usual C strings representation and std::string). When inspecting the rest of the methods one may find that string::at() and the indexes came up with very close timings, but on most tests the indexes over std::string where faster. It is obvious why they should preform better than string::at(), as they don&#8217;t do range checks as the latter does. For some reason I got that iterating using indexes over the std::string directly is faster than iterating over the the C string representation (by very minor margin) and that the C string is timing roughly the same as the string::at(). </p>
<p>To conclude the benchmark pointers are the fastest way to iterate by a small margin, but using the std::string indexes and string::at is preferable in my opinion as the performance difference isn&#8217;t that big and the latter methods provide safer way (that can handle string manipulation that may cause the string data to be copied to another place in the memory) than the pointers. The indexes over C string representation suffer the same disadvantages as the pointers but don&#8217;t operate as fast. Stay away from iterators at all costs! Iterators suffer similar disadvantages as pointers do (regarding string manipulation) and don&#8217;t give anything in return except for horrible run time.</p>
<h2>My timings</h2>
<p>Here is the output of one of the runs of the benchmark. The results where typical to almost all of the test. To try the benchmark on your computer, see instructions bellow.</p>
<pre>
guy@Guy_Computer ~/temp/stringbenchmark $ ./benchmark
iteration using indexes

real    0m0.690s
user    0m0.644s
sys     0m0.048s
iteration using indexes over C string

real    0m0.752s
user    0m0.720s
sys     0m0.028s
iteration using string::at()

real    0m0.762s
user    0m0.708s
sys     0m0.036s
iteration using pointers over C string

real    0m0.642s
user    0m0.604s
sys     0m0.036s
iteration using iterators

real    0m2.323s
user    0m2.272s
sys     0m0.052s
</pre>
<h2>Running the Tests on Your System</h2>
<p>If you want you can run these test on your system to see the exact results. To do so download the <a href='http://www.guyrutenberg.com/wp-content/uploads/2007/08/stringbenchmarktar.gz' title='stringbenchmark.tar.gz'>tar archive</a> and go to the directory you downloaded it into. Open the command line on this directory and execute:<br />
<code>tar -zxvf stringbenchmark.tar.gz<br />
cd stringbenchmark<br />
make<br />
./benchmark</code><br />
You will see the test results printed on your screen. While exact timings might differ from run to run, the overall trends should be clear.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.guyrutenberg.com/2007/08/30/what-is-the-fastest-method-to-iterate-over-a-string/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.731 seconds -->
