SQL Dump for MS Access databases (.mdb files) on Linux

I recently had to work with some data that came in a huge Microsoft Access database. Because I like SQLite (and despise Access), I decided to export the data to an SQLite file. The first thing I needed to do was somehow get all the data out of the db. Being a Linux user complicates things a bit, but thanks to mdb-tools, it’s possible to process .mdb files without resorting to Windows and buying Access. Using mdb-tools directly can be tedious if you want to export a large db with multiple tables, so when I looked for a way to automate it, I came across Liberating data from Microsoft Access “.mdb” files. This post shows a nice script that dumps every table in a .mdb file to a separate CSV file.

While useful, I wanted something that I could easily import into SQLite. So I modified their script to generate an SQL dump of the db. Given a db file, it writes SQL statements describing the schema of the DB to stdout, followed by INSERTs for each table. Actually, because mdb-tools doesn’t support SQLite as a backend, the dump uses a MySQL dialect, but it should work fine with SQLite as well (SQLite will mostly ignore the parts it can’t process, such as COMMENTs). The easiest way to use the script is

$ python AccessDump.py access.mdb | sqlite3 new.db

If the original db contains non-ASCII characters and isn’t encoded in UTF-8, you should set the MDB_JET3_CHARSET environment variable to the correct charset. The dump itself will be UTF-8 encoded.

$ MDB_JET3_CHARSET="cp1255" python AccessDump.py access.mdb | sqlite3 new.db

Continue reading SQL Dump for MS Access databases (.mdb files) on Linux

A Note About Open Sound System (OSS)

A while ago I wrote about creating random numbers out of noise gathered from an audio device and also created a password generator based on the idea. The implementation was based on Open Sound System (commonly known as OSS). OSS was the de facto way to access audio devices a couple of years ago, until it hit licensing issues and was subsequently replaced by ALSA. As Ubuntu no longer supports OSS (and even the ALSA wrapper for it is in Universe), I’ve decided to re-write the code using some modern alternative.
Continue reading A Note About Open Sound System (OSS)

Debugging File Type (MIME) Associations

I’m having less and less time to blog and write stuff lately, so it’s a good opportunity to catch up with old things I did. Back in the happy days when I used Gentoo, one of the irritating issues I faced was messed-up file type associations. The MIME type for some files was recognized incorrectly, and as a result, KDE offered to open files with unsuitable applications. In order to debug it, I wrote a small Python script that would help me debug the way KDE applications are associated with MIME types and what MIME type is inferred from each file.

The script does so by querying KMimeType and KMimeTypeTrader. The script does 3 things:

  • Given a MIME type, show its hierarchy and a list of applications associated with it.
  • Given an application, list all MIME types it’s associated with.
  • Given a file, show its MIME type (and also the accuracy, which allows one to know why that MIME type was selected, although I admit that in the two years since I wrote it, I forgot how it works :))

The script is pasted below. I hope someone who still fiddles with less-than-standard installations will find it helpful.
Continue reading Debugging File Type (MIME) Associations

Installing culmus-latex on Ubuntu 11.10

After someone complained to me that he can’t install culmus-latex on Ubuntu 11.10, I decided to check the issue. Apparently, culmus-latex can’t be installed as-is on Ubuntu 11.10 (and probably other new versions of Debian and Ubuntu). The problem has been reported in a few places such as Whatsup, but as I don’t frequent the forum lately, I wasn’t aware of it. Skip below if you’re just interested in the workaround.

Technical Details

The problem manifests itself as:

sudo make install
... snipped for brevity ...
mktexlsr: Done.
updmap-sys --enable Map=culmus.map
updmap: This is updmap, version $Id: updmap 14402 2009-07-23 17:09:15Z karl $
updmap: using transcript file `/var/lib/texmf/web2c/updmap.log'
updmap: initial config file is `/var/lib/texmf/web2c/updmap.cfg'
make: *** [install] Error 2

But if you look at updmap’s man page, there is no documentation for the return codes. Also, there is no explicit place where it exits with return code 2 in the code. After some strace’ing, I found the culprit in the combination of the set -e at the top of /usr/bin/updmap and the function pickLocalFile in /usr/share/tex-common/debianize-updmap, which overrides certain behaviors in updmap. The pickLocalFile function uses the following lines

localfile=""
localfile="`ls $debDirname/*local*cfg 2>/dev/null`"
if [ -n "$localfile" ]; then

To check if there is a local configuration file under /etc/texmf/updmap.d. If such a file doesn’t exist, instead of creating one (as the maintainers of debianize-updmap intended), it fails due to the set -e in /usr/bin/updmap. Thus, updmap exits with error code 2 instead of completing the installation.

Meanwhile, until the bug is fixed, there is a simple workaround.

Workaround

Before installing, execute

sudo touch /etc/texmf/updmap.d/10local.cfg

And now the regular sudo make install installation should finish successfully.

As the problem is a result of a Debian bug, I don’t expect to release a new version of culmus-latex. Instead, I’ll report the bug to the Debian team.

Solving Sudoku using Python and Prolog

Two weeks ago, I came up with an interesting algorithm for solving Hidato, which basically involves decomposing the board grid (which can be square, hexagonal, or any other shape) into classes of pieces and then arranging them (maybe I’ll write a detailed post on it in the future). So while pondering whether it would be interesting enough to go forward and actually implement the algorithm, compared to the work it would require, I started thinking about what would be the simplest way to solve such puzzles, as opposed to the most efficient way.

At first I looked at general-purpose constraint solvers and decided to tackle Sudoku instead, as it’s a bit simpler to define in terms of constraints. I considered several libraries, but in the end I settled on simply using Prolog. I chose Prolog because, as a logic programming language, constraints are its bread and butter. I also kind of liked it, as I haven’t done anything in Prolog for quite a few years.

Describing Sudoku in terms of constraints is extremely simple. You need to state that every cell is in a given range and that all rows, columns, and sub-grids contain different integers. Since mangling with lists in Prolog isn’t fun, I wrote a Python program that outputs all the Prolog statements with hardcoded references to the variables that build up the board. It’s ugly but dead simple. The script gets the dimensions of the sub-grid.
Continue reading Solving Sudoku using Python and Prolog

passha – A hashapass variant

I like the idea of hashapass, but I’m unwilling to use an online tool because I fear that someday it might be compromised. So I wrote my own variant of hashapass. It uses slightly longer passwords and SHA-256 as the hash function.

#! /usr/bin/python

"""
passha.py - Generate passwords from a master password and a parameter.

Based on hashapass (http://hashapass.com)
"""
import hmac
import hashlib

def main(passwd, param):
    hm = hmac.HMAC(passwd, param, hashlib.sha256)
    print hm.digest().encode("base64")[:10]
    
if __name__=="__main__":
    import getpass
    passwd = getpass.getpass()
    param = raw_input("Parameter: ")
    main(passwd, param)

CSS Compactor – Reduces CSS File Size

This is a script I wrote back in 2006 that reduces the file size of CSS files by removing unnecessary whitespace and comments. It’s also capable of taking such a compacted CSS file and re-indenting it to make it readable. For example, it would take the following CSS:

/* sample css */
* {
    margin: 0px;
    padding: 0px;
}

/* define style for the logo */
#header .logo {
    float: left;
    /* another comment */
}

and turn it into:

*{ margin:0px; padding:0px;}#header .logo{ float:left;}

which is equivalent but much shorter CSS code. It can also re-indent it back to:

*{
     margin: 0px;
     padding: 0px;
}
#header .logo{
     float: left;
}

Continue reading CSS Compactor – Reduces CSS File Size

Building CookieJar out of Firefox’s cookies.sqlite

Firefox 3 started storing its cookies in a SQLite database instead of the old plain-text cookie.txt. While Python’s cookielib module could read the old cookie.txt file, it doesn’t handle the new format. The following Python snippet takes a CookieJar object and the path to Firefox cookies.sqlite (or a copy of it) and fills the CookieJar with the cookies from cookies.sqlite.

import sqlite3
import cookielib

def get_cookies(cj, ff_cookies):
    con = sqlite3.connect(ff_cookies)
    cur = con.cursor()
    cur.execute("SELECT host, path, isSecure, expiry, name, value FROM moz_cookies")
    for item in cur.fetchall():
        c = cookielib.Cookie(0, item[4], item[5],
            None, False,
            item[0], item[0].startswith('.'), item[0].startswith('.'),
            item[1], False,
            item[2],
            item[3], item[3]=="",
            None, None, {})
        print c
        cj.set_cookie(c)

It works well for me, except that apparently Firefox doesn’t save session cookies to disk at all.

spass-2.0 – Secure Password Generator

This is a complete rewrite of my secure password generator. The new version uses my true random number generator (and here).

The major change was using the new true random number generator in order to ensure strong passwords. Less significant changes include an easy way to specify password strips and some calling convention changes.

Usage examples:

$ ./spass
E5pT35Fg
$ ./spass -l 14
R$tfOm4g_yRQ2J
$ ./spass -s 0-9a-f -l 32
8b5f14a1eeaabe58c2878ab5416a9ebb

Download the tarball spass-2.0.tar.bz2. The program depends on Boost‘s program_options (it was tested against versions 1.37 and 1.42 and should work with other versions too).

Statistical Tests for My Audio Based Random Number Generator

In May, I’ve written about a way to generate random numbers from audio noise. Basically, it went like this:

  1. Get an audio sample from the microphone.
  2. Push the least significant bit to a buffer.
  3. Repeat steps 1-2 until the buffer is full (buffer size == block size for the hash function).
  4. Apply the hash function to the buffer.
  5. Get random bits from the digest.

In order to continue developing this random number generator (RNG), I’ve written a C++ class that simplifies working with it.
Continue reading Statistical Tests for My Audio Based Random Number Generator