Guy Rutenberg

Keeping track of what I do

Archive for the ‘Python’ Category

Fixing virtualenv after Upgrading Your Distribution/Python

with one comment

After you upgrade your python/distribution (specifically this happened to me after upgrading from Ubuntu 11.10 to 12.04), your existing virtualenv environments may stop working. This manifests itself by reporting that some modules are missing. For example when I tried to open a Django shell, it complained that urandom was missing from the os module. I guess almost any module will be broken.

Apparently, the solution is dead simple. Just re-create the virtualenv environment:

virtualenv /PATH/TO/EXISTING/ENVIRONMENT

or

virtualenv --system-site-packages /PATH/TO/EXISTING/ENVIRONMENT

(depending on how you created it in the same place). All the modules you’ve already installed should keep working as before (at least it was that way for me).

Written by Guy

May 30th, 2012 at 10:36 pm

Posted in Python,Tips

Tagged with , ,

mechanize – Writing Bots in Python Made Simple

with 4 comments

I’ve been using python to write various bots and crawler for a long time. Few days ago I needed to write some simple bot to remove some 400+ spam pages in Sikumuna, I took an old script of mine (from 2006) in order to modify it. The script used ClientForm, a python module that allows you to easily parse and fill html forms using python. I quickly found that ClientForm is now deprecated in favor of mechanize. In the beginning I was partly set back by the change, as ClientForm was pretty easy to use, and mechanize‘s documentation could use some improvement. However, I quickly changed my mind about mechanize. The basic interface for mechanize is a simple browser object, that litteraly allows you to browse using python. It takes care of handling cookies and such and it got similar form-filling abilities to ClientForm, but this time they are integrated into the browser object.

For future reference for myself, and as another code example to mechanizes sparse documentation I’m giving below the gist of the simple bot I wrote:

Read the rest of this entry »

Written by Guy

March 29th, 2012 at 11:36 pm

Posted in Python

Audio Based True Random Number Generator POC

with 3 comments

Few days ago I came up with an idea to create a true random number generator based on noise gathered from a cheap microphone attached to my computer. Tests showed that when sampling the microphone, the least significant bit behaves pretty randomly. This lead me to think it might be good source for gathering entropy for a true random number generator.
Read the rest of this entry »

Written by Guy

May 14th, 2010 at 3:18 pm

Posted in Projects,Python

Tagged with ,

An Early Release of the New cssrtl.py-2.0

with 5 comments

It has been three years since I’ve released the original version of cssrtl.py (and two since it’s re-release). The old version did a nice job, but experience gained during that time led me to write from scratch a new version. I’ve detailed more than a month ago, the basic principles and ideas that guided me to design a better tool to help adapting CSS files from left-to-right to right-to-left.

The guidelines weren’t just empty words, they were written while working on the Hebrew adaptation to the Fusion theme and in the same time writing a new proof-of-concept version of cssrtl.py. The original intent was to release a more mature version of that code when it will be completed. However, due to the apparent shortage of time in the present and foreseeable future, I can’t see myself complete the project any time soon. So following the “release early” mantra, I’ve decided to release the code as-is. As I said, the code is in working state, but not polished, so it may be of benefit but may contain bugs. If you find any bugs or have any suggestions, I would be glad to hear.
Read the rest of this entry »

Written by Guy

September 20th, 2009 at 1:00 pm

Posted in Projects,Python

Tagged with , , ,

tarsum-0.2 – A read only version of tarsum

with 6 comments

When I first scratched the itch of calculating checksums for every file in a tar archive, this was my original intention. When I decided I want the script in bash for simplicity, I forfeited the idea and settled for extracting the files and then going over all the files to calculate their checksum value.

So when Jon Flowers asked in the comments of the original tarsum post about the possibility of getting the checksums of files in the tar file without extracting all the archive, I’ve decided to re-tackle the problem.

Read the rest of this entry »

Written by Guy

April 29th, 2009 at 9:58 am

Posted in Projects,Python

Tagged with ,

Damerau-Levenshtein Distance in Python

with 12 comments

Damerau-Levenshtein Distance is a metric for measuring how far two given strings are, in terms of 4 basic operations:

  • deletion
  • insertion
  • substitution
  • transposition

The distance of two strings are the minimal number of such operations needed to transform the first string to the second. The algorithm can be used to create spelling correction suggestions, by finding the closest word from a given list to the users input. See Damerau–Levenshtein distance (Wikipedia) for more info on the subject.

Here is an of the algorithm (restricted edit distance version) in Python. While this implementation isn’t perfect (performance wise) it is well suited for many applications.

"""
Compute the Damerau-Levenshtein distance between two given
strings (s1 and s2)
"""
def damerau_levenshtein_distance(s1, s2):
    d = {}
    lenstr1 = len(s1)
    lenstr2 = len(s2)
    for i in xrange(-1,lenstr1+1):
        d[(i,-1)] = i+1
    for j in xrange(-1,lenstr2+1):
        d[(-1,j)] = j+1
 
    for i in xrange(lenstr1):
        for j in xrange(lenstr2):
            if s1[i] == s2[j]:
                cost = 0
            else:
                cost = 1
            d[(i,j)] = min(
                           d[(i-1,j)] + 1, # deletion
                           d[(i,j-1)] + 1, # insertion
                           d[(i-1,j-1)] + cost, # substitution
                          )
            if i and j and s1[i]==s2[j-1] and s1[i-1] == s2[j]:
                d[(i,j)] = min (d[(i,j)], d[i-2,j-2] + cost) # transposition
 
    return d[lenstr1-1,lenstr2-1]

Update 24 Mar, 2012: Fixed the error in computing transposition at the beginning of the strings.

Written by Guy

December 15th, 2008 at 3:08 pm

Posted in Python

Retrieving Google’s Cache for a Whole Website

with 68 comments

Some time ago, as some of you noticed, the web server that hosts my blog went down. Unfortunately, some of the sites had no proper backup, so some thing had to be done in case the hard disk couldn’t be recovered. My efforts turned to Google’s cache. Google keeps a copy of the text of the web page in it’s cache, something that is usually useful when the website is temporarily unavailable. The basic idea is to retrieve a copy of all the pages of a certain site that Google has a cache of.
Read the rest of this entry »

Written by Guy

October 2nd, 2008 at 11:07 pm

Posted in Python

Tagged with ,

Start Trac on Startup – Init.d Script for tracd

with 16 comments

As part of a server move, I went on to reinstall Trac. I’ve tried to install it as FastCGI but I failed to configure the clean URLs properly. I got the clean URLs to work if the user access them, but Trac insisted on addeing trac.fcgi to the beginning of every link it generated. So I’ve decided to use the Trac standalone server, tracd.

The next problem I faced was how to start the Trac automatically upon startup. The solution was to use an init.d script for stating Trac. After some searching, I didn’t find an init.d script for tracd that were satisfactory (mostly poorly written). So I went on an wrote my own init.d script for tracd.
Read the rest of this entry »

Written by Guy

June 4th, 2008 at 10:12 am

Posted in Linux,Python,Tips,Uncategorized

Tagged with

Convert CSS layout to RTL – cssrtl.py

with 17 comments

This is a re-release of a script of mine that helps convert CSS layouts to RTL. I originally released it about a year ago but it was lost when I moved to the new blog. The script, cssrtl.py, utilizes a bunch of regular expressions to translate a given CSS layout to RTL.

Read the rest of this entry »

Written by Guy

December 28th, 2007 at 11:28 am

Posted in Projects,Python

Tagged with , ,

Conditional Expressions in Python 2.4

without comments

Python 2.5 introduced new syntax structure: the conditional expressions. For programmers in languages such as C these structures seem very basic and fundamental but Python lacked them for many years. As I said Python 2.5 introduced such syntax structure, one may use it in the following form:

x =  a if condition else b

As you probably guessed a is assigned to x if condition evaluates to true and b is assigned otherwise. This is pretty much equivalent to the C conditional expressions. But as I said, this structure was only introduced in 2.5. Previous versions of Python are still widely deployed and in use, so how do you achieve the same thing in older version of Python?
Read the rest of this entry »

Written by Guy

October 12th, 2007 at 12:52 pm

Posted in Python