Expanding Macros into String Constants in C

Today I came across an annoying problem, how do I expand a C macro into a string?

One of C’s preprocessor operators is the # which surrounds the token that follows it in the replacement text with double quotes (“). So, at first the solution sounds pretty simple, just define

#define STR(tok) #tok

and things will work. However, there is one caveat: it will not work if passed another macro. For example,

#define BUF_LEN 100
#define STR(tok) #tok

STR(BUF_LEN)

will produce after going through the preprocessor

"BUF_LEN"

instead of "100", which is undesired. This behavior is due to the C standard noting that no macro expansions should happen to token preceded by #.

However, after reconsidering the source of the problem, I’ve found the following workaround: define another macro which will expand the argument and only then call the macro which does the quoting.

#define STR_EXPAND(tok) #tok
#define STR(tok) STR_EXPAND(tok)

#define BUF_LEN 100

STR(BUF_LEN)

will produce

"100"

as desired.

Explanation: The STR macro calls the STR_EXPAND macro with its argument. Unlike in the first example, this time the parameter is checked for macro expansions and evaluated by the preprocessor before being passed to STR_EXPAND which quotes it, thus giving the desired behavior.

Damerau-Levenshtein Distance in Python

Damerau-Levenshtein Distance is a metric for measuring how far two given strings are, in terms of 4 basic operations:

  • deletion
  • insertion
  • substitution
  • transposition

The distance of two strings are the minimal number of such operations needed to transform the first string to the second. The algorithm can be used to create spelling correction suggestions, by finding the closest word from a given list to the users input. See Damerau–Levenshtein distance (Wikipedia) for more info on the subject.

Here is an of the algorithm (restricted edit distance version) in Python. While this implementation isn’t perfect (performance wise) it is well suited for many applications.

"""
Compute the Damerau-Levenshtein distance between two given
strings (s1 and s2)
"""
def damerau_levenshtein_distance(s1, s2):
    d = {}
    lenstr1 = len(s1)
    lenstr2 = len(s2)
    for i in xrange(-1,lenstr1+1):
        d[(i,-1)] = i+1
    for j in xrange(-1,lenstr2+1):
        d[(-1,j)] = j+1

    for i in xrange(lenstr1):
        for j in xrange(lenstr2):
            if s1[i] == s2[j]:
                cost = 0
            else:
                cost = 1
            d[(i,j)] = min(
                           d[(i-1,j)] + 1, # deletion
                           d[(i,j-1)] + 1, # insertion
                           d[(i-1,j-1)] + cost, # substitution
                          )
            if i and j and s1[i]==s2[j-1] and s1[i-1] == s2[j]:
                d[(i,j)] = min (d[(i,j)], d[i-2,j-2] + cost) # transposition

    return d[lenstr1-1,lenstr2-1]

Update 24 Mar, 2012: Fixed the error in computing transposition at the beginning of the strings.

Backup a SourceForge hosted SVN repository – sf-svn-backup

SourceForge urges their users to backup the code repositories of their projects. As I have several projects hosted with SourceForge, I should do it too. Making the backups isn’t complicated at all, but because it isn’t automated properly, I’ve been lazy with it.

sf-svn-backup was written in order to simply automate the process. The script is pretty simple to use, just pass as the first argument the project name and the script will write down to stdout the dump file.

For example:

sf-svn-backup openyahtzee > openyahtzee.dump

The project name should be it’s UNIX name (e.g. openyahtzee and not Open Yahtzee). Because the script writes the dump file directly to stdout it’s easy to pipe the output first through a compression program such as gzip to create compressed SVN dump files.

Question Marks Instead of Non-ASCII Chars when using Gettext in PHP

Yesterday I’ve ported a PHP website to use Gettext for localizations (l10n). After reading through the Gettext documentation and going through the documentation in the PHP site, I’ve manged to get everything working (almost). I had one problem, all the non-ASCII characters (accented Latin chars, Japanese and Chinese) where displayed as question marks (?) instead of the correct form. This happened despite me using UTF-8 encoded files.

While some people (e.g. this one) suggested that it’s not possible to use non-ASCII characters when using a UTF-8 encoded message files, there is a solution and it’s quiet simple one. All you have to do is to call bind_textdomain_codset and pass it UTF-8 as charset.

InfiniteTTT 0.6 Released

InfiniteTTT 0.6 was released today. The main change in the new version is that the game is now multi-threaded.

InfiniteTTT is a variation of Tic-Tac-Toe which is played on an infinite board.

The new version has new multi-threaded AI engine, and several minor fixes and improvements. The changes improved the user experience and made the game more responsive. The new release contains binaries for Windows, source package and a Gentoo ebuild. Packages for other Linux distributions will follow soon (help will be appreciated).

To download the new version visit InfiniteTTT’s download page.

radio.py Station List Patch

Some of the stations in radio.py-0.5 changed the URLs or their streams. The patch updates the stream URLs of three stations: Galgalatz, Galatz and Radius.

To apply the patch and update radio.py, open a terminal and cd to the directory where you installed it. Type the following commands in the terminal (If you installed as root, you’ll need to run the commands as root too).

$ wget "http://www.guyrutenberg.com/wp-content/uploads/2008/11/radio.py.patch"
$ patch radio.py < radio.py.patch
$ rm radio.py.patch

Drawing Finite Automata and State Machines

I had to draw couple of Finite Automata and Turing Machines for some university assignments. Usually I would have done it using Inkscape (as it is my favorite tool for creating figures for my LaTeX documents), but doing it manually is pretty tedious work. Inkscape diagram tool is currently sub par, so everything have to be done by hand. It’s OK if you need to draw one State Machine once in a while, but not suitable for larger quantities. I’ve also tried using Dia, but it also required lots of manual tweaking and tuning.

To my surprise, Graphviz (and especially the dot utility) turned out to be the (almost) perfect tool for the job. It lets you describe the graph in a simple text-based way, and it handles the graph layout by himself. This is somewhat like LaTeX but for graphs (you concentrate on content not layout).

My Finite Automata needed no manual tweaking and resulted in a very nice graphs. For more complicated State Machines it’s sometimes necessary to do some manual tuning. The commands I found most useful to tweak the graph were:

  • Grouping nodes to be in the same level – { rank="same"; "q1"; "q2"; "q3"}. The other options for rank can affect how the group is positioned relative to the other nodes in the graph (source, above all, sink bellow all).
  • Adding weight to edges – q1 -> q2 [weight="10"]. This affects the cost of strecting the edge. The higher the weight the straighter the edges will be.
  • Adding invisible edges – q1 -> q3 [style="invis"]. This allowed me to control the order of the nodes in the same rank (height).

Last but not least Graphivz can generate the graphs in variety for formats including eps, pdf and svg (which allows post-processing with inkscape).

tarsum – Calculate Checksum for Files inside Tar Archive

Update: I’ve released tarsum-0.2, a new version of tarsum.

Some time ago, I got back a hard disk back from data recovery. One of the annoying issues I encountered with the recovered data was corrupted files. Some files looked like they were recovered successfully but their content was corrupted. The ones that were configuration files, where usually easy to detect, as it raised errors in programs that tried to use them. But when such error occurs in some general text file, (or inside the data of an SQL dump), the file may seem correctly fine unless closely inspected.

I have an habit of storing old backups on CDs (they are initially made to online storage), I do it in order to reduce backup costs. But the recovered/corrupted data issue raised some concerns about my ability to recover using this disks. Assuming that I have a disk failure, and I couldn’t recover from my online backups for reason, how can I check the integrity of my CD backups?

Only storing and comparing hash signature for the whole archive, is almost useless. It allows you to validate whether all the files are probably fine, but it can’t tell apart one corrupted file in the archive from a completed corrupted archive. My idea was to calculate checksum (hash) for each file in the data and store the signature in a way that would allow me to see which individual files are corrupted.

This is where tarsum comes to the rescue. As it’s name applies it calculate checksum for each file in the archive. You can download tarsum from here.

Using tarsum is pretty straight forward.

tarsum backup.tar > backup.tar.md5

Calculates the MD5 checksums of the files. You can specify other hashes as well, by passing a tool that calculates it (it must work like md5sum).

tarsum --checksum=sha256sum backup.tar > backup.tar.sha256

To verify the integrity of the files inside the archive we use the diff command:

tarsum backup.tar | diff backup.tar.md5 -

where backup.tar.md5 is the original signature file we created. This is possible because the signatures are sorted alphabetically by the file name inside the archive, so it the order of the files is always the same.

Note that if you use an updated version of GNU tar, tarsum can also operate directly on compressed archives (e.g. tar.bz2, tar.gz).

s3backup – Easy backups of Folders to Amazon S3

This is an updated version of my previous backups script – Backup Directories to Amazon S3 Script. The new script works much better and safer. Unlike the old script, the new one creates the tarballs in a temporary file under /tmp, and allows more control over the backup process.

Continue reading s3backup – Easy backups of Folders to Amazon S3

Kernel Configuration for acpid Issue

I’ve installed acpid on my system some time ago (Gentoo package: sys-power/acpid. However each time I tried to start it it complained:

acpid: can't open /proc/acpi/event: No such file or directory

Apparently acpid requires you to enable ACPI_PROC_EVENT, which in it’s labels states “Deprecated /proc/acpi/event support”. I really wonder why such tool only support the deprecated way to receive the ACPI events.