Archive for the ‘Python’ Category
Fixing virtualenv after Upgrading Your Distribution/Python
After you upgrade your python/distribution (specifically this happened to me after upgrading from Ubuntu 11.10 to 12.04), your existing virtualenv environments may stop working. This manifests itself by reporting that some modules are missing. For example when I tried to open a Django shell, it complained that urandom was missing from the os module. I guess almost any module will be broken.
Apparently, the solution is dead simple. Just re-create the virtualenv environment:
virtualenv /PATH/TO/EXISTING/ENVIRONMENT |
or
virtualenv --system-site-packages /PATH/TO/EXISTING/ENVIRONMENT |
(depending on how you created it in the same place). All the modules you’ve already installed should keep working as before (at least it was that way for me).
mechanize – Writing Bots in Python Made Simple
I’ve been using python to write various bots and crawler for a long time. Few days ago I needed to write some simple bot to remove some 400+ spam pages in Sikumuna, I took an old script of mine (from 2006) in order to modify it. The script used ClientForm, a python module that allows you to easily parse and fill html forms using python. I quickly found that ClientForm is now deprecated in favor of mechanize. In the beginning I was partly set back by the change, as ClientForm was pretty easy to use, and mechanize‘s documentation could use some improvement. However, I quickly changed my mind about mechanize. The basic interface for mechanize is a simple browser object, that litteraly allows you to browse using python. It takes care of handling cookies and such and it got similar form-filling abilities to ClientForm, but this time they are integrated into the browser object.
For future reference for myself, and as another code example to mechanizes sparse documentation I’m giving below the gist of the simple bot I wrote:
Audio Based True Random Number Generator POC
Few days ago I came up with an idea to create a true random number generator based on noise gathered from a cheap microphone attached to my computer. Tests showed that when sampling the microphone, the least significant bit behaves pretty randomly. This lead me to think it might be good source for gathering entropy for a true random number generator.
Read the rest of this entry »
An Early Release of the New cssrtl.py-2.0
It has been three years since I’ve released the original version of cssrtl.py (and two since it’s re-release). The old version did a nice job, but experience gained during that time led me to write from scratch a new version. I’ve detailed more than a month ago, the basic principles and ideas that guided me to design a better tool to help adapting CSS files from left-to-right to right-to-left.
The guidelines weren’t just empty words, they were written while working on the Hebrew adaptation to the Fusion theme and in the same time writing a new proof-of-concept version of cssrtl.py. The original intent was to release a more mature version of that code when it will be completed. However, due to the apparent shortage of time in the present and foreseeable future, I can’t see myself complete the project any time soon. So following the “release early” mantra, I’ve decided to release the code as-is. As I said, the code is in working state, but not polished, so it may be of benefit but may contain bugs. If you find any bugs or have any suggestions, I would be glad to hear.
Read the rest of this entry »
tarsum-0.2 – A read only version of tarsum
When I first scratched the itch of calculating checksums for every file in a tar archive, this was my original intention. When I decided I want the script in bash for simplicity, I forfeited the idea and settled for extracting the files and then going over all the files to calculate their checksum value.
So when Jon Flowers asked in the comments of the original tarsum post about the possibility of getting the checksums of files in the tar file without extracting all the archive, I’ve decided to re-tackle the problem.
Damerau-Levenshtein Distance in Python
Damerau-Levenshtein Distance is a metric for measuring how far two given strings are, in terms of 4 basic operations:
- deletion
- insertion
- substitution
- transposition
The distance of two strings are the minimal number of such operations needed to transform the first string to the second. The algorithm can be used to create spelling correction suggestions, by finding the closest word from a given list to the users input. See Damerau–Levenshtein distance (Wikipedia) for more info on the subject.
Here is an of the algorithm (restricted edit distance version) in Python. While this implementation isn’t perfect (performance wise) it is well suited for many applications.
""" Compute the Damerau-Levenshtein distance between two given strings (s1 and s2) """ def damerau_levenshtein_distance(s1, s2): d = {} lenstr1 = len(s1) lenstr2 = len(s2) for i in xrange(-1,lenstr1+1): d[(i,-1)] = i+1 for j in xrange(-1,lenstr2+1): d[(-1,j)] = j+1 for i in xrange(lenstr1): for j in xrange(lenstr2): if s1[i] == s2[j]: cost = 0 else: cost = 1 d[(i,j)] = min( d[(i-1,j)] + 1, # deletion d[(i,j-1)] + 1, # insertion d[(i-1,j-1)] + cost, # substitution ) if i and j and s1[i]==s2[j-1] and s1[i-1] == s2[j]: d[(i,j)] = min (d[(i,j)], d[i-2,j-2] + cost) # transposition return d[lenstr1-1,lenstr2-1] |
Update 24 Mar, 2012: Fixed the error in computing transposition at the beginning of the strings.
Retrieving Google’s Cache for a Whole Website
Some time ago, as some of you noticed, the web server that hosts my blog went down. Unfortunately, some of the sites had no proper backup, so some thing had to be done in case the hard disk couldn’t be recovered. My efforts turned to Google’s cache. Google keeps a copy of the text of the web page in it’s cache, something that is usually useful when the website is temporarily unavailable. The basic idea is to retrieve a copy of all the pages of a certain site that Google has a cache of.
Read the rest of this entry »
Start Trac on Startup – Init.d Script for tracd
As part of a server move, I went on to reinstall Trac. I’ve tried to install it as FastCGI but I failed to configure the clean URLs properly. I got the clean URLs to work if the user access them, but Trac insisted on addeing trac.fcgi to the beginning of every link it generated. So I’ve decided to use the Trac standalone server, tracd.
The next problem I faced was how to start the Trac automatically upon startup. The solution was to use an init.d script for stating Trac. After some searching, I didn’t find an init.d script for tracd that were satisfactory (mostly poorly written). So I went on an wrote my own init.d script for tracd.
Read the rest of this entry »
Convert CSS layout to RTL – cssrtl.py
This is a re-release of a script of mine that helps convert CSS layouts to RTL. I originally released it about a year ago but it was lost when I moved to the new blog. The script, cssrtl.py, utilizes a bunch of regular expressions to translate a given CSS layout to RTL.
Conditional Expressions in Python 2.4
Python 2.5 introduced new syntax structure: the conditional expressions. For programmers in languages such as C these structures seem very basic and fundamental but Python lacked them for many years. As I said Python 2.5 introduced such syntax structure, one may use it in the following form:
x = a if condition else b |
As you probably guessed a is assigned to x if condition evaluates to true and b is assigned otherwise. This is pretty much equivalent to the C conditional expressions. But as I said, this structure was only introduced in 2.5. Previous versions of Python are still widely deployed and in use, so how do you achieve the same thing in older version of Python?
Read the rest of this entry »