Seeding srand()

As any C/C++ programmer know, just using rand() won’t return random-numbers, and not even pseudo random numbers, as each time the program will run the same random numbers sequence will be generated. To overcome this you seed the random number generator of rand() with a number that creates a different random number sequence. For every seed there is a corresponding random number sequence, and for the same seed the same sequence will be generated every time. This can be used to recreate a random numbers sequence if needed for some reason if the seed used to create it is known.

To randomize the random numbers generator, most programmers pass to strand() the time in seconds since epoch, e.g. they do something like this:

srand( (unsigned)time(NULL) );

It’s a very common way to seed the random number generator, and it’s also shown in many books that teach programming. This may look sufficient for most uses (and it does) but nonetheless it’s also used many times where it’s just isn’t random enough. Let’s consider a program with a very fast runtime which depends on the above mentioned method for seeding the random number generator. If someone wrote a script that runs this program in a loop, causing the program to run several times in a second, the same random number sequence will be generated multiple times as the same seed was used. One can see that this behavior may not be what was intended.

Another problem might come up if we will use this method for let’s say password generation. Let’s say Joe wrote a small program that seeds the password generator in the above mentioned way and now generated for himself a strong 8 character long alphanumeric password. Joe thought that his password was secured and that even is somebody knew its length they will need to try 62^8=218,340,105,584,896 combinations in order to crack it. Now Sally want to crack Joe’s “secure” password. Instead of attacking directly on the password Sally will attack the password generator. Sally can easily know the day Joe created the password, and we shall assume Sally got access to the password generator’s source code Joe used. During the day Joe generated his password, the time(NULL) function returned 86400 different values. Let’s also assume that Sally knows the length of Joe’s password. Sally just modifies the password generator and seeds the random number generator with each one of the possible values of time(). Sally will get now 86,400 different combinations of password, and one of them is guaranteed to be Joe’s. If you think 86400 is many, remember that Sally went down from 218,340,105,584,896 possible combinations and under very weak assumptions, if sally knew the exact 10 minutes in which Joe generated the password (this isn’t a very hard thing to find out) the number will drop to only 600 combinations.
Continue reading Seeding srand()

What is the Fastest Method to Iterate Over a String?

Few days ago I decided to check what is really the fastest method to iterate over strings in C++. As a string class I chose string class from STL as it is very popular and provides a couple of ways to iterate it. So how can one iterate over an std::string?

  1. By using indexes. E.g. str[i] and running i from zero to the length of the string.
  2. By using the at method. string::at(size_t pos) provides similar interface to indexes with the exceptions that it checks whether the given position is past the end of the string and then throws an exception. One may see it as the safe version of the regular index.
  3. Treating the string as a sequence of characters and and iterate over it using iterators.
  4. Using string::c_str() to get a pointer to a regular C string representation of the string stored in the std::string and treating it as array, e.g. using indexes to go over it.
  5. The last way to iterate over the string is to get a pointer to a C string representation using string::c_str() and advancing the pointer itself to iterate over the string.

The third method is the native method of iterating over objects in STL, and like the last two it can’t be used if the iteration changes the string itself (e.g. inserting or deleting characters). The first and second method are similar to the fourth (treating the pointer to the C string as an array), except that they aren’t so problematic as the latter when changing the string. The second method is the safest as it’s the only one that does range checks and throws exception if trying to access positions which are outside the string.

To benchmark and find out which method is the fastest method to iterate over a string I’ve created a huge string of random characters ranging from ‘a’ to ‘z’ and five executables, each one implementing one of the above iteration methods to do a simple task
(count the number of occurrences of each letter). The string is fifty million characters long which, as the longer the string the less important the overhead becomes.

The executables for the benchmark of every version were compiled with the default setting of g++ (without optimization as the compiler might change the iteration methods when optimizing). The benchmark executables where timed by using the time command and redirecting the executables output to /dev/null. The tests were run both on 64bit Gentoo (with 1 GB RAM) and on 32bit Kubuntu (with 512 MB RAM), to make sure the overall results (which method it better not the runtime itself) isn’t system depended.

Continue reading What is the Fastest Method to Iterate Over a String?

Introduction to C++ CGI

In this post and its follow ups I intend to cover the basics of CGI programming in C++. There are great performance gain in writing CGIs in C++ compared to interpreted languages such as PHP and it’s usually it’s even faster than PHP scripts which are interpreted via mod_php. On the other hand PHP and other traditional web development languages are well suited for the task, by means of libraries and development time. However developing small highly efficient CGI scripts in C++ is easier than you think.
Continue reading Introduction to C++ CGI