Guy Rutenberg

The Revised String Iteration Benchmark

In this post I’m going to discuss again the string benchmark I did before to find out what is the fastest way to iterate over an std::string. If you haven’t read the previous post on this subject go a head and read it as it covers the basic idea behind this benchmark. As I did the last time I did the benchmark, I check 5 ways of iteration:
Continue reading The Revised String Iteration Benchmark

Profiling Code Using `clock_gettime`

After raising the issue of the low resolution problem of the timer provided by clock() in Resolution Problems in clock(), I’ve ended the post by mentioning to two more functions that should provide high-resolution timers suitable for profiling code. In this post I will discuss one of them, clock_gettime().
Continue reading Profiling Code Using clock_gettime

`Random` – A Random Number Generator Class

After dealing with the seeding of srand(), I’ve realized that rand() just doesn’t give strong enough random numbers for some of my needs (e.g. strong password generator), so I decided to find a better solution. The solution came in the form of Random, a cryptography strong pseudo-random number generator class.
Continue reading Random – A Random Number Generator Class

Resolution Problems in `clock()`

While playing recently with clock() in order to time the performance of different kinds of code and algorithms, I found an annoying bug. clock() just can’t register work that has taken less than 0.01 seconds. This is pretty unexpected as clock() should return the processor time used by the program. The man page for clock() states:

The clock() function returns an approximation of processor time used by the program.

Continue reading Resolution Problems in clock()

Introduction to C++ CGI – Processing Forms

In this post I will show you how to process HTML forms easily using CGIs in C++. I assume you have already basic knowledge of writing CGIs in C++, if you don’t go a head and read Introduction to C++ CGI.

Processing forms is the basic function of any CGI script and the main purpose of CGIs. As you probably know there are two common ways to send form data back to the web server: “post” and “get”. When form data is sent with the “get” method it is appended to the URL string of the form submission URL. The “post” method is much like the “get” except the data is transmitted via http headers and not via the URL itself. When a form uses “get” it allows the user to easily bookmark the query created by the form as the data is transmitted in URL itself, on the other hand the “post” method allows to send much more data and spares to user from seeing the data in the URL.

Getting the “post” and “get” data is relatively easy. To get the data sent by “get” you can just call getenv("QUERY_STRING") and you will receive a pointer to null-terminated string containing the “get” data. Reading the “post” data is a bit more complicated. The data needs to be read from the standard input, but the program won’t receive an EOF when it reaches the end of the data but instead it should stop reading after reading a specified amount of bytes, which is defined in the environment variable “CONTENT_LENGTH“. So you should read getenv("CONTENT_LENGTH") bytes from the standard input to receive the “post” data.

Continue reading Introduction to C++ CGI – Processing Forms

Seeding srand()

As any C/C++ programmer know, just using rand() won’t return random-numbers, and not even pseudo random numbers, as each time the program will run the same random numbers sequence will be generated. To overcome this you seed the random number generator of rand() with a number that creates a different random number sequence. For every seed there is a corresponding random number sequence, and for the same seed the same sequence will be generated every time. This can be used to recreate a random numbers sequence if needed for some reason if the seed used to create it is known.

To randomize the random numbers generator, most programmers pass to strand() the time in seconds since epoch, e.g. they do something like this:

srand( (unsigned)time(NULL) );

It’s a very common way to seed the random number generator, and it’s also shown in many books that teach programming. This may look sufficient for most uses (and it does) but nonetheless it’s also used many times where it’s just isn’t random enough. Let’s consider a program with a very fast runtime which depends on the above mentioned method for seeding the random number generator. If someone wrote a script that runs this program in a loop, causing the program to run several times in a second, the same random number sequence will be generated multiple times as the same seed was used. One can see that this behavior may not be what was intended.

Another problem might come up if we will use this method for let’s say password generation. Let’s say Joe wrote a small program that seeds the password generator in the above mentioned way and now generated for himself a strong 8 character long alphanumeric password. Joe thought that his password was secured and that even is somebody knew its length they will need to try 62^8=218,340,105,584,896 combinations in order to crack it. Now Sally want to crack Joe’s “secure” password. Instead of attacking directly on the password Sally will attack the password generator. Sally can easily know the day Joe created the password, and we shall assume Sally got access to the password generator’s source code Joe used. During the day Joe generated his password, the time(NULL) function returned 86400 different values. Let’s also assume that Sally knows the length of Joe’s password. Sally just modifies the password generator and seeds the random number generator with each one of the possible values of time(). Sally will get now 86,400 different combinations of password, and one of them is guaranteed to be Joe’s. If you think 86400 is many, remember that Sally went down from 218,340,105,584,896 possible combinations and under very weak assumptions, if sally knew the exact 10 minutes in which Joe generated the password (this isn’t a very hard thing to find out) the number will drop to only 600 combinations.
Continue reading Seeding srand()

What is the Fastest Method to Iterate Over a String?

Few days ago I decided to check what is really the fastest method to iterate over strings in C++. As a string class I chose string class from STL as it is very popular and provides a couple of ways to iterate it. So how can one iterate over an std::string?

By using indexes. E.g. str[i] and running i from zero to the length of the string.
By using the at method. string::at(size_t pos) provides similar interface to indexes with the exceptions that it checks whether the given position is past the end of the string and then throws an exception. One may see it as the safe version of the regular index.
Treating the string as a sequence of characters and and iterate over it using iterators.
Using string::c_str() to get a pointer to a regular C string representation of the string stored in the std::string and treating it as array, e.g. using indexes to go over it.
The last way to iterate over the string is to get a pointer to a C string representation using string::c_str() and advancing the pointer itself to iterate over the string.

The third method is the native method of iterating over objects in STL, and like the last two it can’t be used if the iteration changes the string itself (e.g. inserting or deleting characters). The first and second method are similar to the fourth (treating the pointer to the C string as an array), except that they aren’t so problematic as the latter when changing the string. The second method is the safest as it’s the only one that does range checks and throws exception if trying to access positions which are outside the string.

To benchmark and find out which method is the fastest method to iterate over a string I’ve created a huge string of random characters ranging from ‘a’ to ‘z’ and five executables, each one implementing one of the above iteration methods to do a simple task
(count the number of occurrences of each letter). The string is fifty million characters long which, as the longer the string the less important the overhead becomes.

The executables for the benchmark of every version were compiled with the default setting of g++ (without optimization as the compiler might change the iteration methods when optimizing). The benchmark executables where timed by using the time command and redirecting the executables output to /dev/null. The tests were run both on 64bit Gentoo (with 1 GB RAM) and on 32bit Kubuntu (with 512 MB RAM), to make sure the overall results (which method it better not the runtime itself) isn’t system depended.

Continue reading What is the Fastest Method to Iterate Over a String?

radio.py – a Wrapper Script for Listening to Radio in Linux

Download radio-0.3.tar.gz.

Update: radio.py-0.4 is now available.

I like listening to music and radio while working, and fortunately there are numerous ways to do that. Unfortunately, most ways that allow you to listen to radio are very resource consuming/memory hogs (such as listening to streaming-media via web-browsers) or very unfriendly to users (listening via mplayer for example). So, I set out to find a way that will use as little system resources as possible while keeping it user-friendly. One other requirement that I had, that I will be able to do all that from the command-line, so it will work great with GNU Screen and won’t require an X server (if I work without one).

I used for some time mplayer for listening to radio. I had a file with a list of web-radio streams URLs which I would copy and pass to mplayer -playlist. This method answered two of the requirements (minimal resources and command-line interface), but wasn’t really user friendly. So, I wrote a little wrapper script in python around mplayer – radio.py. After quick installation (download and extract the tar archive and copy radio.py to somewhere in you PATH), radio.py will allow you to listen to stations easily, and it will also do couple more things for you.

To listen to a station just call radio.py with the station’s name, e.g. in the command-line enter radio.py BBC1 to listen for BBC radio channel 1. To view a list of know stations run radio.py --list. Currently there aren’t many stations (just stations I thought that are needed or I listen to). You can easily edit radio.py to add new stations (the script is documented and very clear). If you do so, please write a comment or email me so I will be able to add those stations to next release by default.

So, as you seen radio.py allows you to easily listen to radio, as easy as writing the station’s name. But, as I said, it can do more things that I thought should be in a radio script. It has both a sleep feature (that turns off the radio after specified amount of time) and a wake-up feature (that starts the radio after a specified amount of time). This two features can be used together, and practically allow you to use radio.py as an alarm clock.

You can find more information about radio.py options by calling radio.py --help. I hope you will find this script useful as I do.

Download:
radio-0.3.tar.gz.

Introduction to C++ CGI

In this post and its follow ups I intend to cover the basics of CGI programming in C++. There are great performance gain in writing CGIs in C++ compared to interpreted languages such as PHP and it’s usually it’s even faster than PHP scripts which are interpreted via mod_php. On the other hand PHP and other traditional web development languages are well suited for the task, by means of libraries and development time. However developing small highly efficient CGI scripts in C++ is easier than you think.
Continue reading Introduction to C++ CGI

Convert KDevelop’s Source Archive to Source Package

I use KDevelop as my main IDE and I’m pretty satisfied. KDevelop can create a source archive of the project’s source code automatically for you which simplifies the distribution of the project. Unfortunately the archive created isn’t ready for distribution. The user can’t just run ./configure ; make as he needs to run all the automake tools before. Not ideal for distributing. So you need to convert this source archive to a source package which is ready for the user to compile immediately

Continue reading Convert KDevelop’s Source Archive to Source Package