Tag Archives: Python

SQL Dump for MS Access databases (.mdb files) on Linux

I recently had to work with some data that came in a huge Microsoft Access database. Because I like SQLite (and despise Access), I’ve decided to export the data to an SQLite file. The first thing I needed to do was to somehow get all the data out of the db. Being a Linux user, complicates things a bit, but thanks to mdb-tools it’s possible to process the .mdb files without resorting to Windows and buying Access. Using mdb-tools directly can be tedious if you want to export a large db with multiple tables, so when I’ve looked for a way to automate it, I came across Liberating data from Microsoft Access “.mdb” files. This post shows a nice script that dumps every table in a .mdb file to separate CSV file.

While useful, I wanted something that I could easily import into SQLite. So I’ve modified their script to generate an SQL dump of the db. Given a db file, it writes to stdout SQL statements describing the schema of the DB followed by INSERTs for each table. Actually because mdb-tools doesn’t support SQLite as a backend, the dump uses a MySQL dialect, but it should be fine with SQLite as well (SQLite will mostly ignore the parts it can’t process such as COMMENTs). The easiest way to use the script is

$ python AccessDump.py access.mdb | sqlite3 new.db

If the original db contains non-ascii characters, and isn’t encoded in UTF-8, you should set the MDB_JET3_CHARSET environment variable to the correct charset. The dump itself will be UTF-8 encoded.

$ MDB_JET3_CHARSET="cp1255" python AccessDump.py access.mdb | sqlite3 new.db

Continue reading

Fixing virtualenv after Upgrading Your Distribution/Python

After you upgrade your python/distribution (specifically this happened to me after upgrading from Ubuntu 11.10 to 12.04), your existing virtualenv environments may stop working. This manifests itself by reporting that some modules are missing. For example when I tried to open a Django shell, it complained that urandom was missing from the os module. I guess almost any module will be broken.

Apparently, the solution is dead simple. Just re-create the virtualenv environment:



virtualenv --system-site-packages /PATH/TO/EXISTING/ENVIRONMENT

(depending on how you created it in the same place). All the modules you’ve already installed should keep working as before (at least it was that way for me).

Debugging File Type (MIME) Associations

I’m having less and less time to blog and write stuff lately, so it’s a good oppertunity to catch up with old thing I did. Back in the happy days I used Gentoo, one of irritating issues I faced was messed up file type associations. MIME type for some files was recognized incorrectly, and as a result, KDE offered to open files with unsuitable applications. In order to debug it I wrote a small python script which would help me debug the way KDE applications are associated with MIME types and what MIME type is inferred form each file.

The script does so by querying the KMimeType and KMimeTypeTrader. The script does 3 things:

  • Given a MIME type, show it’s hierarchy and a list of applications associated with it.
  • Given an applications, list all MIME types it’s associated with
  • Given a file, show its MIME type (and also the accuracy, which allows one to know why that MIME type was selected, although I admit that in the two years since I wrote it, I forgot how it works :))

The script is pasted below. I hope someone that still fiddles with less than standard installations, will find it helpful.
Continue reading

Building CookieJar out of Firefox’s cookies.sqlite

Firefox 3 started to store it’s cookies in a SQLite database instead of the old plain-text cookie.txt. While Python’s cookielib module could read the old cookie.txt file, it doesn’t handle the new format. The following python snippet takes a CookieJar object and the path to Firefox cookies.sqlite (or a copy of it) and fills the CookieJar with the cookies from cookies.sqlite.

import sqlite3
import cookielib
def get_cookies(cj, ff_cookies):
    con = sqlite3.connect(ff_cookies)
    cur = con.cursor()
    cur.execute("SELECT host, path, isSecure, expiry, name, value FROM moz_cookies")
    for item in cur.fetchall():
        c = cookielib.Cookie(0, item[4], item[5],
            None, False,
            item[0], item[0].startswith('.'), item[0].startswith('.'),
            item[1], False,
            item[3], item[3]=="",
            None, None, {})
        print c

It works well for me, except that apperantly Firefox doesn’t save session cookies to the disk at all.

Audio Based True Random Number Generator POC

Few days ago I came up with an idea to create a true random number generator based on noise gathered from a cheap microphone attached to my computer. Tests showed that when sampling the microphone, the least significant bit behaves pretty randomly. This lead me to think it might be good source for gathering entropy for a true random number generator.
Continue reading

Python’s base64 Module Fails to Decode Unicode Strings

If you’ve got a base64 string as a unicode object and you try to use Python’s base64 module with altchars set, it fails with the following error:

TypeError: character mapping must return integer, None or unicode

This is pretty unhelpful error message also occurs if you try any method that indirectly use altchars. For example:


both fail while the following works:


While it’s not complicated to fix it (just convert any unicode string to ascii string), it’s still annoying.

URL-Safe Timestamps using Base64

Passing around timestamps in URLs is a common task. We usually want our URLs to be as shortest as possible. I’ve found using Base64 to result in the shortest URL-safe representation, just 6 chars. This compares with the 12 chars of the naive way, and 8 chars when using hex representation.

The following Python functions allow you to build and read these 6 chars URL-safe timestamps:
Continue reading

An Early Release of the New cssrtl.py-2.0

It has been three years since I’ve released the original version of cssrtl.py (and two since it’s re-release). The old version did a nice job, but experience gained during that time led me to write from scratch a new version. I’ve detailed more than a month ago, the basic principles and ideas that guided me to design a better tool to help adapting CSS files from left-to-right to right-to-left.

The guidelines weren’t just empty words, they were written while working on the Hebrew adaptation to the Fusion theme and in the same time writing a new proof-of-concept version of cssrtl.py. The original intent was to release a more mature version of that code when it will be completed. However, due to the apparent shortage of time in the present and foreseeable future, I can’t see myself complete the project any time soon. So following the “release early” mantra, I’ve decided to release the code as-is. As I said, the code is in working state, but not polished, so it may be of benefit but may contain bugs. If you find any bugs or have any suggestions, I would be glad to hear.
Continue reading

Scanning Documents Written in Blue Ink – biscan

After writing the post on converting PNMs to DjVu I’ve ran into some trouble scanning documents written in blue ink. The problem: XSane didn’t allow me to set the threshold for converting the scanned image to line-art (B&W). So, I tried scanning the document in grayscale and in color and convert it afterwards to bitonal using imagemagick. This ended up with two results. When I used the -monochrome command line switch, the conversion looked good, but it used halftones (dithering), when I tried to convert it to DjVu it resulted in a document size twice as large as normal B&W would. The other thing that I tried is using the -threshold switch. The DjVu compressed document size was much better now, but the document was awful looking, either it was too dark, or some of the text disappeared. After giving it some thought I knew I can find a better solution.
Continue reading