SQL Dump for MS Access databases (.mdb files) on Linux

I recently had to work with some data that came in a huge Microsoft Access database. Because I like SQLite (and despise Access), I decided to export the data to an SQLite file. The first thing I needed to do was somehow get all the data out of the db. Being a Linux user complicates things a bit, but thanks to mdb-tools, it’s possible to process .mdb files without resorting to Windows and buying Access. Using mdb-tools directly can be tedious if you want to export a large db with multiple tables, so when I looked for a way to automate it, I came across Liberating data from Microsoft Access “.mdb” files. This post shows a nice script that dumps every table in a .mdb file to a separate CSV file.

While useful, I wanted something that I could easily import into SQLite. So I modified their script to generate an SQL dump of the db. Given a db file, it writes SQL statements describing the schema of the DB to stdout, followed by INSERTs for each table. Actually, because mdb-tools doesn’t support SQLite as a backend, the dump uses a MySQL dialect, but it should work fine with SQLite as well (SQLite will mostly ignore the parts it can’t process, such as COMMENTs). The easiest way to use the script is

$ python AccessDump.py access.mdb | sqlite3 new.db

If the original db contains non-ASCII characters and isn’t encoded in UTF-8, you should set the MDB_JET3_CHARSET environment variable to the correct charset. The dump itself will be UTF-8 encoded.

$ MDB_JET3_CHARSET="cp1255" python AccessDump.py access.mdb | sqlite3 new.db

Continue reading SQL Dump for MS Access databases (.mdb files) on Linux

Starting Django upon Reboot using Cron

Three years ago I wrote about starting services as a user via cron instead of init.d (specifically Trac). However, this method has a serious downside: it has no support for dependencies. This really bothered me when I used cron to start a Django server, as it would attempt to load before MySQL was running. This made cron useless, as after every reboot I would receive an error email saying it couldn’t connect to the MySQL server, and I would have to log in and start the Django server manually. Yesterday I got sick of it and decided to hack something together that will work properly. So my crontab had the following line:

@reboot python ./manage.py runfcgi ...options...

which I changed to:

@reboot until /sbin/status mysql | grep start/running ; do echo "Mysql isn't running yet...>&2"; sleep 1; done; python ./manage.py runfcgi ...options...

Basically, it loops and checks whether the MySQL service has started. If so, it starts Django as it did before. On the other hand, if MySQL isn’t running, it just sleeps for a second and repeats the check.

A small issue is that if, for some reason, MySQL won’t start at all, it will loop forever. If this happens, it would mean that I’ll have to manually kill that cron job, but I would have to log in anyway to see what’s wrong with MySQL. So while this method can’t support dependencies like init.d does, it does provide a good-enough solution.

Update 2012-11-23: Fixed the crontab line (it would fail when mysql was in the start/post-start state).