Guy Rutenberg

Introduction to C++ CGI – Processing Forms

In this post I will show you how to process HTML forms easily using CGIs in C++. I assume you already have basic knowledge of writing CGIs in C++. If you don’t, go ahead and read Introduction to C++ CGI.

Processing forms is the basic function of any CGI script and the main purpose of CGIs. As you probably know, there are two common ways to send form data back to the web server: “post” and “get.” When form data is sent with the “get” method, it is appended to the URL string of the form submission URL. The “post” method is much like the “get” method, except the data is transmitted via HTTP headers and not via the URL itself. When a form uses “get,” it allows the user to easily bookmark the query created by the form, as the data is transmitted in the URL itself. On the other hand, the “post” method allows you to send much more data and spares the user from seeing the data in the URL.

Getting the “post” and “get” data is relatively easy. To get the data sent by “get” you can just call getenv("QUERY_STRING"), and you will receive a pointer to a null-terminated string containing the “get” data. Reading the “post” data is a bit more complicated. The data needs to be read from the standard input, but the program won’t receive an EOF when it reaches the end of the data. Instead, it should stop reading after reading a specified amount of bytes, which is defined in the environment variable “CONTENT_LENGTH.” So you should read getenv("CONTENT_LENGTH") bytes from the standard input to receive the “post” data.

Continue reading Introduction to C++ CGI – Processing Forms

Seeding srand()

As any C/C++ programmer knows, just using rand() won’t return random numbers, and not even pseudo-random numbers, as each time the program runs the same random number sequence will be generated. To overcome this, you seed the random number generator of rand() with a number that creates a different random number sequence. For every seed, there is a corresponding random number sequence, and for the same seed the same sequence will be generated every time. This can be used to recreate a random number sequence if needed for some reason, if the seed used to create it is known.

To randomize the random number generator, most programmers pass to srand() the time in seconds since the epoch, e.g. they do something like this:

srand( (unsigned)time(NULL) );

It’s a very common way to seed the random number generator, and it’s also shown in many books that teach programming. This may look sufficient for most uses (and it does), but nonetheless it’s also used many times where it just isn’t random enough. Let’s consider a program with a very fast runtime that depends on the above-mentioned method for seeding the random number generator. If someone wrote a script that runs this program in a loop, causing the program to run several times in a second, the same random number sequence will be generated multiple times as the same seed was used. One can see that this behavior may not be what was intended.

Another problem might come up if we use this method for, let’s say, password generation. Let’s say Joe wrote a small program that seeds the password generator in the above-mentioned way and generated for himself a strong 8-character-long alphanumeric password. Joe thought that his password was secure and that even if somebody knew its length, they would need to try 62^8=218,340,105,584,896 combinations in order to crack it. Now Sally wants to crack Joe’s “secure” password. Instead of attacking the password directly, Sally will attack the password generator. Sally can easily know the day Joe created the password, and we shall assume Sally got access to the password generator’s source code Joe used. During the day Joe generated his password, the time(NULL) function returned 86400 different values. Let’s also assume that Sally knows the length of Joe’s password. Sally just modifies the password generator and seeds the random number generator with each one of the possible values of time(). Sally will now get 86,400 different combinations of the password, and one of them is guaranteed to be Joe’s. If you think 86400 is many, remember that Sally went down from 218,340,105,584,896 possible combinations and, under very weak assumptions, if Sally knew the exact 10 minutes in which Joe generated the password (this isn’t a very hard thing to find out), the number will drop to only 600 combinations.
Continue reading Seeding srand()

What Is the Fastest Method to Iterate Over a String?

A few days ago, I decided to check what really is the fastest method to iterate over strings in C++. As a string class, I chose the string class from STL, as it is very popular and provides a couple of ways to iterate over it. So how can one iterate over an std::string?

By using indexes, e.g. str[i], and running i from zero to the length of the string.
By using the at method. string::at(size_t pos) provides an interface similar to indexes, with the exception that it checks whether the given position is past the end of the string and then throws an exception. One may see it as the safe version of the regular index.
Treating the string as a sequence of characters and iterating over it using iterators.
Using string::c_str() to get a pointer to a regular C string representation of the string stored in the std::string and treating it as an array, e.g. using indexes to go over it.
The last way to iterate over the string is to get a pointer to a C string representation using string::c_str() and advance the pointer itself to iterate over the string.

The third method is the native method of iterating over objects in STL, and like the last two, it can’t be used if the iteration changes the string itself (e.g. inserting or deleting characters). The first and second methods are similar to the fourth (treating the pointer to the C string as an array), except that they aren’t as problematic as the latter when changing the string. The second method is the safest, as it’s the only one that does range checks and throws an exception when trying to access positions that are outside the string.

To benchmark and find out which method is the fastest way to iterate over a string, I’ve created a huge string of random characters ranging from ‘a’ to ‘z’ and five executables, each one implementing one of the above iteration methods to do a simple task
(count the number of occurrences of each letter). The string is fifty million characters long, because the longer the string, the less important the overhead becomes.

The executables for the benchmark of every version were compiled with the default settings of g++ (without optimization, as the compiler might change the iteration methods when optimizing). The benchmark executables were timed by using the time command and redirecting the executables’ output to /dev/null. The tests were run both on 64-bit Gentoo (with 1 GB RAM) and on 32-bit Kubuntu (with 512 MB RAM), to make sure the overall results (which method is better, not the runtime itself) aren’t system-dependent.

radio.py – a Wrapper Script for Listening to Radio in Linux

Download radio-0.3.tar.gz.

Update: radio.py-0.4 is now available.

I like listening to music and radio while working, and fortunately there are numerous ways to do that. Unfortunately, most ways that allow you to listen to radio are very resource-consuming memory hogs (such as listening to streaming media via web browsers) or very unfriendly to users (listening via mplayer, for example). So, I set out to find a way that would use as few system resources as possible while keeping it user-friendly. One other requirement I had was being able to do all that from the command line, so it would work great with GNU Screen and wouldn’t require an X server (if I worked without one).

I used mplayer for some time for listening to radio. I had a file with a list of web radio stream URLs, which I would copy and pass to mplayer -playlist. This method met two of the requirements (minimal resources and command-line interface), but wasn’t really user-friendly. So, I wrote a little wrapper script in Python around mplayer – radio.py. After a quick installation (download and extract the tar archive and copy radio.py somewhere in your PATH), radio.py will allow you to listen to stations easily, and it will also do a couple more things for you.

To listen to a station, just call radio.py with the station’s name; e.g., in the command line enter radio.py BBC1 to listen to BBC Radio 1. To view a list of known stations, run radio.py --list. Currently there aren’t many stations (just stations I thought were needed or that I listen to). You can easily edit radio.py to add new stations (the script is documented and very clear). If you do so, please write a comment or email me so I will be able to add those stations to the next release by default.

So, as you’ve seen, radio.py allows you to easily listen to radio, as easily as writing the station’s name. But, as I said, it can do more things that I thought should be in a radio script. It has both a sleep feature (that turns off the radio after a specified amount of time) and a wake-up feature (that starts the radio after a specified amount of time). These two features can be used together, and practically allow you to use radio.py as an alarm clock.

You can find more information about radio.py options by calling radio.py --help. I hope you will find this script as useful as I do.

Download:
radio-0.3.tar.gz.

Introduction to C++ CGI

In this post and its follow-ups, I intend to cover the basics of CGI programming in C++. There are great performance gains in writing CGIs in C++ compared to interpreted languages such as PHP, and usually it’s even faster than PHP scripts that are interpreted via mod_php. On the other hand, PHP and other traditional web development languages are well suited for the task by means of libraries and development time. However, developing small, highly efficient CGI scripts in C++ is easier than you think.
Continue reading Introduction to C++ CGI

Convert KDevelop’s Source Archive to a Source Package

I use KDevelop as my main IDE, and I’m pretty satisfied. KDevelop can create a source archive of the project’s source code automatically for you, which simplifies distribution of the project. Unfortunately, the archive created isn’t ready for distribution. The user can’t just run ./configure ; make, as they need to run all the automake tools first. That’s not ideal for distribution. So you need to convert this source archive to a source package that is ready for the user to compile immediately.

Continue reading Convert KDevelop’s Source Archive to a Source Package

Tracking MediaWiki External Links Statistics Using Google Analytics

When you track MediaWiki statistics, you usually track only internal page statistics, but tracking external links that lead out of your site is not something you can ignore. Unfortunately, we probably can’t put actual tracking code in the pages linked to by our site’s external links. Fortunately, we can track the actual clicks on those links that lead out of the site, and it’s quite easy to do when tracking statistics with Google Analytics. If you don’t already use Google Analytics with your MediaWiki site, open a new account in Google Analytics and see my previous post: Track MediaWiki Statistics using Google Analytics.

Continue reading Tracking MediaWiki External Links Statistics Using Google Analytics

Track MediaWiki Statistics using Google Analytics

Google Analytics is one of the best free web-statistics services available. It’s also quite easy to use with MediaWiki. To install Google Analytics in your MediaWiki, you should put the tracking code, which is something that looks like:

<script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
</script>
<script type="text/javascript">
_uacct="UA-xxxx-x";
urchinTracker();
</script>

in every page, preferably just above the </body> tag. The best way to do so is to put the tracking code inside the base skin PHP file. That means that unless you changed the default skin for MediaWiki, you need to edit /wiki/skins/MonoBook.php. In this file, you will find the </body> tag towards the bottom of the file. Insert the tracking code just above it, save the file, and you’re done, as all pages will now show the script. Google Analytics will start gathering statistics usually in about 24-28 hours.

Update: If you also want to track external links to files and other websites, take a look at Tracking MediaWiki External Links Statistics Using Google Analytics.

Installing IvriTeX-1.2.1 on teTeX-3.0

A few days ago, I finally decided to install IvriTeX-1.2.1 on my system. I’m running teTeX-3.0. The new version of IvriTeX includes some very important improvements and, at least for me, the most important thing is support for the Culmus fonts. teTeX-3.0 introduced a major directory change, which causes many problems when installing packages that are unaware of the changes. In this post, I will try to walk through the installation process.

TEXMF will be the directory of your local TeX tree (usually /usr/share/texmf). Before beginning the installation process, make sure you have the Culmus fonts installed. Apparently, Culmus is not optional; it’s a requirement. I’ll assume that your Culmus fonts are installed in /usr/share/fonts/culmus.

Download the IvriTeX-1.2.1 source code from here.
Extract the archive into a temporary directory.
Save the diff file below to a file named “Makefile_patch” and save it inside ivritex-1.2.1/fonts/culmus.
Apply the patch by going to the ivritex-1.2.1/fonts/culmus directory (under the directory where you extracted the source archive) and executing “patch Makefile_patch”. The patch will alter the places where some files will be installed.
As root, execute “updmap –enable Map culmus.map”.
Still as root, execute “mktexlsr”.
IvriTeX 1.2.1 should be installed now.

--- Makefile    2007-02-14 19:59:52.000000000 +0200
+++ Makefilenew 2007-02-16 10:11:07.000000000 +0200
@@ -20,8 +20,8 @@
 vf_target     = $(TEX_ROOT)/fonts/vf/culmus
 # this is where ivritex will eventually be:
 tex_target    = $(TEX_ROOT)/tex/generic/babel
-encode_dir    = $(TEX_ROOT)/dvips/base
-dvips_cfg_dir = $(TEX_ROOT)/dvips/config
+encode_dir    = $(TEX_ROOT)/fonts/enc/dvips/base
+map_dir       = $(TEX_ROOT)/fonts/map/
 sysconf       = $(DESTDIR)/etc
 updmap_dir    = $(sysconf)/texmf/updmap
 #culmus_target = $(PREFIX)/fonts/culmus
@@ -137,11 +137,11 @@
    mkdir -p $(sysconf)/texmf/updmap.d
    echo &quot;Map culmus.map&quot; &gt;$(sysconf)/texmf/updmap.d/10culmus.cfg
 else
-   mkdir -p $(dvips_cfg_dir)
-   cp culmus.map $(dvips_cfg_dir)/
+   mkdir -p $(map_dir)
+   cp culmus.map $(map_dir)/
   ifeq ($(tetex_ver),2)
    # this should run mktexlsr as well
-   $(updmap) --enable Map $(dvips_cfg_dir)/culmus.map
+   $(updmap) --enable Map $(map_dir)/culmus.map
   else # for tetex-1
     ifeq ($(tetex_ver),1)
    # TODO: fill in sed line here

Samba and Firewall Configuration

I’ve been using Guarddog as a GUI for iptables for some time. I’ve configured it to allow connections to Samba network shares, but for some reason it won’t allow me to connect to the shares without disabling the firewall first. The blockage happened despite the proper configuration in Guarddog. So today I decided to look at the problem again and fix it.

After inspecting the output of ‘dmesg’, I found out that it tries to connect to 192.168.2.255 (192.168.2.* is my network), which is the broadcast address for the network. I tried enabling connections to that address, and to my surprise this fixed the problem. I guess Samba, for some reason, requires access to the broadcast address for some name/address lookup of hosts in the network.