Guy Rutenberg

Keeping track of what I do

Damerau-Levenshtein Distance in Python

with 5 comments

Damerau-Levenshtein Distance is a metric for measuring how far two given strings are, in terms of 4 basic operations:

  • deletion
  • insertion
  • substitution
  • transposition

The distance of two strings are the minimal number of such operations needed to transform the first string to the second. The algorithm can be used to create spelling correction suggestions, by finding the closest word from a given list to the users input. See Damerau–Levenshtein distance (Wikipedia) for more info on the subject.

Here is an of the algorithm (restricted edit distance version) in Python. While this implementation isn’t perfect (performance wise) it is well suited for many applications.

"""
Compute the Damerau-Levenshtein distance between two given
strings (s1 and s2)
"""
def DamerauLevenshteinDistance(s1, s2):
    d = {}
    lenstr1 = len(s1)
    lenstr2 = len(s2)
    for i in xrange(-1,lenstr1+1):
        d[(i,-1)] = i+1
    for j in xrange(-1,lenstr2+1):
        d[(-1,j)] = j+1
 
    for i in xrange(0,lenstr1):
        for j in xrange(0,lenstr2):
            if s1[i] == s2[j]:
                cost = 0
            else:
                cost = 1
            d[(i,j)] = min(
                           d[(i-1,j)] + 1, # deletion
                           d[(i,j-1)] + 1, # insertion
                           d[(i-1,j-1)] + cost, # substitution
                          )
            if i>1 and j>1 and s1[i]==s2[j-1] and s1[i-1] == s2[j]:
                d[(i,j)] = min (d[(i,j)], d[i-2,j-2] + cost) # transposition
 
    return d[lenstr1-1,lenstr2-1]
Share and Enjoy:
  • del.icio.us
  • StumbleUpon
  • Digg
  • Facebook
  • Mixx
  • Google Bookmarks
  • Simpy

Written by Guy

December 15th, 2008 at 3:08 pm

Posted in Python

5 Responses to 'Damerau-Levenshtein Distance in Python'

Subscribe to comments with RSS or TrackBack to 'Damerau-Levenshtein Distance in Python'.

  1. אני אהיה אוף-טופיק לגמרי, כיצד עיצבת את מראה הקוד הנ”ל, האם אתה משתמש בתוסף כלשהו שעושה את העבודה?

    אריאל

    17 Dec 08 at 12:52

  2. אני משתמש בתוסף בשם
    wp-syntax
    הוא תומך בהרבה שפות תכנות והוא נוח לשימוש

    Guy

    17 Dec 08 at 13:06

  3. incorrect algorithm
    refer to wikipedia
    they say that whatever you have written is worng

    Sloan

    17 Mar 09 at 14:18

  4. @Sloan, I’ve just check the English wikipedia article about Damerau-Levenshtein distance (and the talk page) and didn’t see anyone referring to a mistake in my implementation.

    Can you please point out the error, if it really exists, so I can correct it?

    Guy

    17 Mar 09 at 14:26

  5. Thanks for this, however I found another which is faster. http://mwh.geek.nz/2009/04/26/python-damerau-levenshtein-distance/

    Joseph Reagle

    9 Oct 09 at 16:47

Leave a Reply