Batch Renaming Using sed

I was reorganizing my music library and decided to change the naming convention I’d used. This task was just asking to be automated. Since the filename change could be described using a regular expression, I looked for a way to use sed for the renaming process.

The files I had followed the filename pattern ARTIST – SONG – TRACK – ALBUM

James Brown - I Got You (I Feel Good).ogg  - 01 - Classic James Brown

I wanted to rename them to ARTIST – ALBUM – TRACK – NAME

James Brown - Classic James Brown - 01 - I Got You (I Feel Good).ogg

Describing the change as a sed program is easy:

s/(.*) - (.*) - (.*) - (.*).ogg/1 - 4 - 3 - 2.ogg/

Now all that has to be done is to pass each filename to mv and pass it again after it has gone through the sed script. This can be done like this:

for i in *; do
  mv "$i" "`echo $i | sed "s/(.*) - (.*) - (.*) - (.*).ogg/1 - 4 - 3 - 2.ogg/"`";
done

The important part is

`echo $i | sed "s/(.*) - (.*) - (.*) - (.*).ogg/1 - 4 - 3 - 2.ogg/"`

which pipes the filename to sed and returns it as an argument for mv.

To see what renaming will be done, one can alter the above command a bit and get

for i in *; do
  echo "$i" "->" "`echo $i | sed "s/(.*) - (.*) - (.*) - (.*).ogg/1 - 4 - 3 - 2.ogg/"`";
done

which will effectively print a list of lines in the form oldname -> newname.

Of course, this technique isn’t limited to the renaming I’ve done. By changing the pattern given to sed, one can do any kind of renaming that can be described as a regular expression replacement. Also, one can change the globbing (the *) in the for loop to operate only on specific files that match a given pattern in the directory, instead of all of them.

Deleting a Range of Tickets in Trac

Recently, the Open Yahtzee website, which runs Trac, has fallen victim to several spam attacks. The spammers submit a large number of tickets containing links to various sites. This post was written mainly to allow me to copy and paste a command to delete a range of tickets at once, but I thought it might be useful to others as well.
Continue reading Deleting a Range of Tickets in Trac

WordPress Backup to FTP

Update: A newer version of the script is available.

This script allows you to easily back up your WordPress blog to an FTP server. It’s actually a modification of my WordPress Backup to Amazon S3 Script, but instead of saving the backup to Amazon S3, it uploads it to an FTP server. Another update is that now the SQL dump includes the database creation instructions, so you don’t need to create it manually before restoring from the backup.

Although I’ve written it with WordPress in mind (to create backups of my blog), it isn’t WordPress-specific. It can be used to back up any website that consists of a MySQL database and files. I’ve successfully used it to back up a MediaWiki installation.
Continue reading WordPress Backup to FTP

Extract Public Key from X.509 Certificate as Hex

X.509 certificates are a common way to exchange and distribute public key information. For example, most Open Social containers use the OAuth RSA-SHA1 signature method and distribute their public keys in the X.509 format.

While working on an AppEngine application, I needed to verify requests from such containers. However, there is (currently) no pure Python library capable of parsing the certificates. This meant that I needed to extract the public key out of the certificate manually and store it in some parsed way inside the Python code.

Fortunately, parsing public keys from an X.509 certificate and representing them as a hex number turned out to be simple and easy.
Continue reading Extract Public Key from X.509 Certificate as Hex

Expanding Macros into String Constants in C

Today I came across an annoying problem: how do I expand a C macro into a string?

One of C’s preprocessor operators is #, which surrounds the token that follows it in the replacement text with double quotes (“). So, at first the solution sounds pretty simple: just define

#define STR(tok) #tok

and things will work. However, there is one caveat: it will not work if it is passed another macro. For example,

#define BUF_LEN 100
#define STR(tok) #tok

STR(BUF_LEN)

will produce, after going through the preprocessor,

"BUF_LEN"

instead of "100", which is undesired. This behavior is due to the C standard noting that no macro expansions should happen to a token preceded by #.

However, after reconsidering the source of the problem, I’ve found the following workaround: define another macro that will expand the argument and only then call the macro that does the quoting.

#define STR_EXPAND(tok) #tok
#define STR(tok) STR_EXPAND(tok)

#define BUF_LEN 100

STR(BUF_LEN)

will produce

"100"

as desired.

Explanation: The STR macro calls the STR_EXPAND macro with its argument. Unlike in the first example, this time the parameter is checked for macro expansions and evaluated by the preprocessor before being passed to STR_EXPAND, which quotes it, thus giving the desired behavior.

Damerau-Levenshtein Distance in Python

Damerau-Levenshtein distance is a metric for measuring how far two given strings are, in terms of 4 basic operations:

  • deletion
  • insertion
  • substitution
  • transposition

The distance between two strings is the minimal number of such operations needed to transform the first string into the second. The algorithm can be used to create spelling correction suggestions by finding the closest word from a given list to the user’s input. See Damerau–Levenshtein distance (Wikipedia) for more info on the subject.

Here is an implementation of the algorithm (restricted edit distance version) in Python. While this implementation isn’t perfect (performance-wise), it is well suited for many applications.

"""
Compute the Damerau-Levenshtein distance between two given
strings (s1 and s2)
"""
def damerau_levenshtein_distance(s1, s2):
    d = {}
    lenstr1 = len(s1)
    lenstr2 = len(s2)
    for i in xrange(-1,lenstr1+1):
        d[(i,-1)] = i+1
    for j in xrange(-1,lenstr2+1):
        d[(-1,j)] = j+1

    for i in xrange(lenstr1):
        for j in xrange(lenstr2):
            if s1[i] == s2[j]:
                cost = 0
            else:
                cost = 1
            d[(i,j)] = min(
                           d[(i-1,j)] + 1, # deletion
                           d[(i,j-1)] + 1, # insertion
                           d[(i-1,j-1)] + cost, # substitution
                          )
            if i and j and s1[i]==s2[j-1] and s1[i-1] == s2[j]:
                d[(i,j)] = min (d[(i,j)], d[i-2,j-2] + cost) # transposition

    return d[lenstr1-1,lenstr2-1]

Update 24 Mar, 2012: Fixed the error in computing transposition at the beginning of the strings.

Back Up a SourceForge-Hosted SVN Repository – sf-svn-backup

SourceForge urges its users to back up their projects’ code repositories. As I have several projects hosted on SourceForge, I should do it too. Making the backups isn’t complicated at all, but because it isn’t properly automated, I’ve been lazy about it.

sf-svn-backup was written to simply automate the process. The script is pretty simple to use: just pass the project name as the first argument, and the script will write the dump file to stdout.

For example:

sf-svn-backup openyahtzee > openyahtzee.dump

The project name should be its UNIX name (e.g. openyahtzee and not Open Yahtzee). Because the script writes the dump file directly to stdout, it’s easy to pipe the output through a compression program such as gzip to create compressed SVN dump files.

Question Marks Instead of Non-ASCII Chars When Using Gettext in PHP

Yesterday I’ve ported a PHP website to use Gettext for localization (l10n). After reading through the Gettext documentation and going through the documentation on the PHP site, I’ve managed to get everything working (almost). I had one problem: all the non-ASCII characters (accented Latin chars, Japanese, and Chinese) were displayed as question marks (?) instead of in the correct form. This happened despite my using UTF-8 encoded files.

While some people (e.g. this one) suggested that it’s not possible to use non-ASCII characters when using UTF-8 encoded message files, there is a solution, and it’s quite simple. All you have to do is call bind_textdomain_codeset and pass it UTF-8 as charset.

InfiniteTTT 0.6 Released

InfiniteTTT 0.6 was released today. The main change in the new version is that the game is now multithreaded.

InfiniteTTT is a variation of Tic-Tac-Toe that is played on an infinite board.

The new version has a new multithreaded AI engine, and several minor fixes and improvements. The changes improve the user experience and make the game more responsive. The new release contains binaries for Windows, a source package, and a Gentoo ebuild. Packages for other Linux distributions will follow soon (help would be appreciated).

To download the new version, visit InfiniteTTT’s download page.

radio.py Station List Patch

Some of the stations in radio.py-0.5 changed their URLs or streams. The patch updates the stream URLs of three stations: Galgalatz, Galatz, and Radius.

To apply the patch and update radio.py, open a terminal and cd to the directory where you installed it. Type the following commands in the terminal (if you installed it as root, you’ll need to run the commands as root too).

$ wget "http://www.guyrutenberg.com/wp-content/uploads/2008/11/radio.py.patch"
$ patch radio.py < radio.py.patch
$ rm radio.py.patch