Introduction to C++ CGI – Processing Forms

In this post I will show you how to process HTML forms easily using CGIs in C++. I assume you already have basic knowledge of writing CGIs in C++. If you don’t, go ahead and read Introduction to C++ CGI.

Processing forms is the basic function of any CGI script and the main purpose of CGIs. As you probably know, there are two common ways to send form data back to the web server: “post” and “get.” When form data is sent with the “get” method, it is appended to the URL string of the form submission URL. The “post” method is much like the “get” method, except the data is transmitted via HTTP headers and not via the URL itself. When a form uses “get,” it allows the user to easily bookmark the query created by the form, as the data is transmitted in the URL itself. On the other hand, the “post” method allows you to send much more data and spares the user from seeing the data in the URL.

Getting the “post” and “get” data is relatively easy. To get the data sent by “get” you can just call getenv("QUERY_STRING"), and you will receive a pointer to a null-terminated string containing the “get” data. Reading the “post” data is a bit more complicated. The data needs to be read from the standard input, but the program won’t receive an EOF when it reaches the end of the data. Instead, it should stop reading after reading a specified amount of bytes, which is defined in the environment variable “CONTENT_LENGTH.” So you should read getenv("CONTENT_LENGTH") bytes from the standard input to receive the “post” data.

Continue reading Introduction to C++ CGI – Processing Forms

Seeding srand()

As any C/C++ programmer knows, just using rand() won’t return random numbers, and not even pseudo-random numbers, as each time the program runs the same random number sequence will be generated. To overcome this, you seed the random number generator of rand() with a number that creates a different random number sequence. For every seed, there is a corresponding random number sequence, and for the same seed the same sequence will be generated every time. This can be used to recreate a random number sequence if needed for some reason, if the seed used to create it is known.

To randomize the random number generator, most programmers pass to srand() the time in seconds since the epoch, e.g. they do something like this:

srand( (unsigned)time(NULL) );

It’s a very common way to seed the random number generator, and it’s also shown in many books that teach programming. This may look sufficient for most uses (and it does), but nonetheless it’s also used many times where it just isn’t random enough. Let’s consider a program with a very fast runtime that depends on the above-mentioned method for seeding the random number generator. If someone wrote a script that runs this program in a loop, causing the program to run several times in a second, the same random number sequence will be generated multiple times as the same seed was used. One can see that this behavior may not be what was intended.

Another problem might come up if we use this method for, let’s say, password generation. Let’s say Joe wrote a small program that seeds the password generator in the above-mentioned way and generated for himself a strong 8-character-long alphanumeric password. Joe thought that his password was secure and that even if somebody knew its length, they would need to try 62^8=218,340,105,584,896 combinations in order to crack it. Now Sally wants to crack Joe’s “secure” password. Instead of attacking the password directly, Sally will attack the password generator. Sally can easily know the day Joe created the password, and we shall assume Sally got access to the password generator’s source code Joe used. During the day Joe generated his password, the time(NULL) function returned 86400 different values. Let’s also assume that Sally knows the length of Joe’s password. Sally just modifies the password generator and seeds the random number generator with each one of the possible values of time(). Sally will now get 86,400 different combinations of the password, and one of them is guaranteed to be Joe’s. If you think 86400 is many, remember that Sally went down from 218,340,105,584,896 possible combinations and, under very weak assumptions, if Sally knew the exact 10 minutes in which Joe generated the password (this isn’t a very hard thing to find out), the number will drop to only 600 combinations.
Continue reading Seeding srand()

What Is the Fastest Method to Iterate Over a String?

A few days ago, I decided to check what really is the fastest method to iterate over strings in C++. As a string class, I chose the string class from STL, as it is very popular and provides a couple of ways to iterate over it. So how can one iterate over an std::string?

  1. By using indexes, e.g. str[i], and running i from zero to the length of the string.
  2. By using the at method. string::at(size_t pos) provides an interface similar to indexes, with the exception that it checks whether the given position is past the end of the string and then throws an exception. One may see it as the safe version of the regular index.
  3. Treating the string as a sequence of characters and iterating over it using iterators.
  4. Using string::c_str() to get a pointer to a regular C string representation of the string stored in the std::string and treating it as an array, e.g. using indexes to go over it.
  5. The last way to iterate over the string is to get a pointer to a C string representation using string::c_str() and advance the pointer itself to iterate over the string.

The third method is the native method of iterating over objects in STL, and like the last two, it can’t be used if the iteration changes the string itself (e.g. inserting or deleting characters). The first and second methods are similar to the fourth (treating the pointer to the C string as an array), except that they aren’t as problematic as the latter when changing the string. The second method is the safest, as it’s the only one that does range checks and throws an exception when trying to access positions that are outside the string.

To benchmark and find out which method is the fastest way to iterate over a string, I’ve created a huge string of random characters ranging from ‘a’ to ‘z’ and five executables, each one implementing one of the above iteration methods to do a simple task
(count the number of occurrences of each letter). The string is fifty million characters long, because the longer the string, the less important the overhead becomes.

The executables for the benchmark of every version were compiled with the default settings of g++ (without optimization, as the compiler might change the iteration methods when optimizing). The benchmark executables were timed by using the time command and redirecting the executables’ output to /dev/null. The tests were run both on 64-bit Gentoo (with 1 GB RAM) and on 32-bit Kubuntu (with 512 MB RAM), to make sure the overall results (which method is better, not the runtime itself) aren’t system-dependent.

Continue reading What Is the Fastest Method to Iterate Over a String?

Introduction to C++ CGI

In this post and its follow-ups, I intend to cover the basics of CGI programming in C++. There are great performance gains in writing CGIs in C++ compared to interpreted languages such as PHP, and usually it’s even faster than PHP scripts that are interpreted via mod_php. On the other hand, PHP and other traditional web development languages are well suited for the task by means of libraries and development time. However, developing small, highly efficient CGI scripts in C++ is easier than you think.
Continue reading Introduction to C++ CGI