What is the Fastest Method to Iterate Over a String?

Few days ago I decided to check what is really the fastest method to iterate over strings in C++. As a string class I chose string class from STL as it is very popular and provides a couple of ways to iterate it. So how can one iterate over an std::string?

  1. By using indexes. E.g. str[i] and running i from zero to the length of the string.
  2. By using the at method. string::at(size_t pos) provides similar interface to indexes with the exceptions that it checks whether the given position is past the end of the string and then throws an exception. One may see it as the safe version of the regular index.
  3. Treating the string as a sequence of characters and and iterate over it using iterators.
  4. Using string::c_str() to get a pointer to a regular C string representation of the string stored in the std::string and treating it as array, e.g. using indexes to go over it.
  5. The last way to iterate over the string is to get a pointer to a C string representation using string::c_str() and advancing the pointer itself to iterate over the string.

The third method is the native method of iterating over objects in STL, and like the last two it can’t be used if the iteration changes the string itself (e.g. inserting or deleting characters). The first and second method are similar to the fourth (treating the pointer to the C string as an array), except that they aren’t so problematic as the latter when changing the string. The second method is the safest as it’s the only one that does range checks and throws exception if trying to access positions which are outside the string.

To benchmark and find out which method is the fastest method to iterate over a string I’ve created a huge string of random characters ranging from ‘a’ to ‘z’ and five executables, each one implementing one of the above iteration methods to do a simple task
(count the number of occurrences of each letter). The string is fifty million characters long which, as the longer the string the less important the overhead becomes.

The executables for the benchmark of every version were compiled with the default setting of g++ (without optimization as the compiler might change the iteration methods when optimizing). The benchmark executables where timed by using the time command and redirecting the executables output to /dev/null. The tests were run both on 64bit Gentoo (with 1 GB RAM) and on 32bit Kubuntu (with 512 MB RAM), to make sure the overall results (which method it better not the runtime itself) isn’t system depended.

Continue reading What is the Fastest Method to Iterate Over a String?

radio.py – a Wrapper Script for Listening to Radio in Linux

Download radio-0.3.tar.gz.

Update: radio.py-0.4 is now available.

I like listening to music and radio while working, and fortunately there are numerous ways to do that. Unfortunately, most ways that allow you to listen to radio are very resource consuming/memory hogs (such as listening to streaming-media via web-browsers) or very unfriendly to users (listening via mplayer for example). So, I set out to find a way that will use as little system resources as possible while keeping it user-friendly. One other requirement that I had, that I will be able to do all that from the command-line, so it will work great with GNU Screen and won’t require an X server (if I work without one).

I used for some time mplayer for listening to radio. I had a file with a list of web-radio streams URLs which I would copy and pass to mplayer -playlist. This method answered two of the requirements (minimal resources and command-line interface), but wasn’t really user friendly. So, I wrote a little wrapper script in python around mplayer – radio.py. After quick installation (download and extract the tar archive and copy radio.py to somewhere in you PATH), radio.py will allow you to listen to stations easily, and it will also do couple more things for you.

To listen to a station just call radio.py with the station’s name, e.g. in the command-line enter radio.py BBC1 to listen for BBC radio channel 1. To view a list of know stations run radio.py --list. Currently there aren’t many stations (just stations I thought that are needed or I listen to). You can easily edit radio.py to add new stations (the script is documented and very clear). If you do so, please write a comment or email me so I will be able to add those stations to next release by default.

So, as you seen radio.py allows you to easily listen to radio, as easy as writing the station’s name. But, as I said, it can do more things that I thought should be in a radio script. It has both a sleep feature (that turns off the radio after specified amount of time) and a wake-up feature (that starts the radio after a specified amount of time). This two features can be used together, and practically allow you to use radio.py as an alarm clock.

You can find more information about radio.py options by calling radio.py --help. I hope you will find this script useful as I do.

Download:
radio-0.3.tar.gz.

Introduction to C++ CGI

In this post and its follow ups I intend to cover the basics of CGI programming in C++. There are great performance gain in writing CGIs in C++ compared to interpreted languages such as PHP and it’s usually it’s even faster than PHP scripts which are interpreted via mod_php. On the other hand PHP and other traditional web development languages are well suited for the task, by means of libraries and development time. However developing small highly efficient CGI scripts in C++ is easier than you think.
Continue reading Introduction to C++ CGI

Convert KDevelop’s Source Archive to Source Package

I use KDevelop as my main IDE and I’m pretty satisfied. KDevelop can create a source archive of the project’s source code automatically for you which simplifies the distribution of the project. Unfortunately the archive created isn’t ready for distribution. The user can’t just run ./configure ; make as he needs to run all the automake tools before. Not ideal for distributing. So you need to convert this source archive to a source package which is ready for the user to compile immediately

Continue reading Convert KDevelop’s Source Archive to Source Package

Tracking MediaWiki External Links Statistics using Google Analytics

When you track MediaWiki statistics, you usually track only internal page statistics, but tracking external links which leads out of your site is not some thing you can ignore. Unfortunately we probably can’t put actual tracking code in the pages linked to by our site’s external links. Fortunately we can track the actual clicks on those links that lead out of the site, and it’s quite easy to do when tracking statistics with Google Analytics. If you don’t already use Google Analytics with your MediaWiki site, open a new account in Google Analytics and see my previous post: Track MediaWiki Statistics using Google Analytics.

Continue reading Tracking MediaWiki External Links Statistics using Google Analytics

Track MediaWiki Statistics using Google Analytics

Google Analytics is one of the best free web-statistics services available. It’s also quite easy to use with MediaWiki. To install Google Analytics in you MediaWiki you should put the tracking code, which is something that looks like:

<script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
</script>
<script type="text/javascript">
_uacct="UA-xxxx-x";
urchinTracker();
</script>

in every page, preferably just above the </body> tag. The best way to do so will be to put the tracking code inside the base skin php file. That means that unless you changed the default skin for MediaWiki you need to edit /wiki/skins/MonoBook.php. In this file you will find the </body> tag towards the bottom of the file. Insert the tracking code just above it, save the file, and you’re done, as all pages will now show the script. Google Analytics will start gathering statistics usually in about 24-28 hours.

Update: If you also want to track external links to files and other websites take a look at Tracking Mediawiki External Links Statistics Using Google Analytics.

Installing IvriTex-1.2.1 on teTex-3.0

Few days ago I finally decided to install Ivritex-1.2.1 on my system. I’m running a tetex-3.0. The new version of Ivritex includes some very important improvements and, at least for me, the most important thing is support for the Culmus fonts. tetex-3.0 introduced a major directory change which cause many problem with installing packages which are unaware of the changes. In this post I will try to walk through the installation process.

TEXMF will be the directory of you local TeX tree (usually /usr/share/texmf). Before Begining the installation process make sure you have the Culmus fonts installed. Apparently Culmus is not optional it’s a requirement. I’ll assume that your Culmus fonts are installed in /usr/share/fonts/culmus.

  1. Download the ivritex-1.2.1 source-code from here .
  2. Extract the archive into a temporary directory.
  3. Save the diff file below a file named “Makefile_patch” and save it inside ivritex-1.2.1/fonts/culmus .
  4. Apply the patch by going to the ivritex-1.2.1/fonts/culmus directory (under the directory where you extracted the source archive) and executing “patch Makefile_patch. The patch will alter the places where some file will be installed.
  5. As root execute “updmap –enable Map culmus.map”.
  6. Still as root execute “mktexlsr”.
  7. Ivritex 1.2.1 should be installed now.
--- Makefile    2007-02-14 19:59:52.000000000 +0200
+++ Makefilenew 2007-02-16 10:11:07.000000000 +0200
@@ -20,8 +20,8 @@
 vf_target     = $(TEX_ROOT)/fonts/vf/culmus
 # this is where ivritex will eventually be:
 tex_target    = $(TEX_ROOT)/tex/generic/babel
-encode_dir    = $(TEX_ROOT)/dvips/base
-dvips_cfg_dir = $(TEX_ROOT)/dvips/config
+encode_dir    = $(TEX_ROOT)/fonts/enc/dvips/base
+map_dir       = $(TEX_ROOT)/fonts/map/
 sysconf       = $(DESTDIR)/etc
 updmap_dir    = $(sysconf)/texmf/updmap
 #culmus_target = $(PREFIX)/fonts/culmus
@@ -137,11 +137,11 @@
    mkdir -p $(sysconf)/texmf/updmap.d
    echo &quot;Map culmus.map&quot; &gt;$(sysconf)/texmf/updmap.d/10culmus.cfg
 else
-   mkdir -p $(dvips_cfg_dir)
-   cp culmus.map $(dvips_cfg_dir)/
+   mkdir -p $(map_dir)
+   cp culmus.map $(map_dir)/
   ifeq ($(tetex_ver),2)
    # this should run mktexlsr as well
-   $(updmap) --enable Map $(dvips_cfg_dir)/culmus.map
+   $(updmap) --enable Map $(map_dir)/culmus.map
   else # for tetex-1
     ifeq ($(tetex_ver),1)
    # TODO: fill in sed line here

Samba and Firewall Configuration

I’ve been using Guarddog as a GUI for iptables for some time. I’ve configured it to allow to connect to samba network shares but for some reason it won’t allowed me connect to the shares without the disabling the firewall first. The blockage happened despite the proper configuration in Guarddog. So today I decided to look again at the problem and fix it.

After inspecting the output of ‘dmesg’ I found out that it tries to connect to 192.168.2.255 (192.168.2.* is my network), which is the broadcast address for the network. I tried enabling connection to the address and to my surprise this fixed the problem. I guess samba for some reason requires access for the broadcast for some name/address lookup of hosts in the network.

Prevent Line Breaking Inline Formula in Tex/Latex

If you ever wrote a document in latex (or tex) that used inline formulas you know how frustrating it is when latex insists on breaking you inline formula across two lines. The easiest solution to this problem, in my opinion is to prevent line breaking inline formulas at all except under extreme cases. To prevent line breaking inline formulas just add the following two lines into your preamble:

\relpenalty=9999
\binoppenalty=9999

Now I will explain what we did. \relpenalty=[number parameter] the parameter specifies the penalty for breaking a math formula after a relation when the formula appears in a paragraph. Plain TEX sets \relpenalty to 500. \binoppenalty=[number parameter] the parameter specifies the penalty for breaking a math formula after a binary operator when the formula appears in a paragraph. Plain TEX sets \binoppenalty to 700. Both parameters can be set anywhere from 0 to 10000. If set to 10000 the inline formulas will never break even in extreme cases. Setting it a bit lower would prevent line breaking except where tex would encounter extreme cases which must have a line break because of the situation.

Using Hebrew TrueType fonts with pdfTeX

This guide is base on a guide published by Dekel Tsur that can be found here. Dekel Tsur’s guide was very good but now it is outdated since it doesn’t work with teTex 3.0. In this guide I addressed this issue and updated the instructions and scripts so it will work with teTex 3.0.Since the quality of the Hebrew metafonts that comes with the Hebrew LaTeX is quite poor, alternative fonts are needed. The best quality free Hebrew fonts are TrueType fonts (for example, the times new/arial/courier new fonts). Using TrueType fonts with TeX is somewhat complicated, but it is quite easy with pdfTeX, as pdfTeX has native support for TrueType fonts. This document explains how to use TrueType fonts with pdfTeX. Since Hebrew requires the use of the eTeX engine, you need to have the pdfelatex program. It is available in teTeX 1.0 (which comes with recent Linux distributions). The instruction below allows using nikud, although the result is quite poor as the nikud glyphs are not aligned correctly (but it is better than nothing).

Continue reading Using Hebrew TrueType fonts with pdfTeX