# Damerau-Levenshtein Distance in Python

Damerau-Levenshtein Distance is a metric for measuring how far two given strings are, in terms of 4 basic operations:

• deletion
• insertion
• substitution
• transposition

The distance of two strings are the minimal number of such operations needed to transform the first string to the second. The algorithm can be used to create spelling correction suggestions, by finding the closest word from a given list to the users input. See Damerau–Levenshtein distance (Wikipedia) for more info on the subject.

Here is an of the algorithm (restricted edit distance version) in Python. While this implementation isn’t perfect (performance wise) it is well suited for many applications.

``````"""
Compute the Damerau-Levenshtein distance between two given
strings (s1 and s2)
"""
def damerau_levenshtein_distance(s1, s2):
d = {}
lenstr1 = len(s1)
lenstr2 = len(s2)
for i in xrange(-1,lenstr1+1):
d[(i,-1)] = i+1
for j in xrange(-1,lenstr2+1):
d[(-1,j)] = j+1

for i in xrange(lenstr1):
for j in xrange(lenstr2):
if s1[i] == s2[j]:
cost = 0
else:
cost = 1
d[(i,j)] = min(
d[(i-1,j)] + 1, # deletion
d[(i,j-1)] + 1, # insertion
d[(i-1,j-1)] + cost, # substitution
)
if i and j and s1[i]==s2[j-1] and s1[i-1] == s2[j]:
d[(i,j)] = min (d[(i,j)], d[i-2,j-2] + cost) # transposition

return d[lenstr1-1,lenstr2-1]
``````

Update 24 Mar, 2012: Fixed the error in computing transposition at the beginning of the strings.

## 16 thoughts on “Damerau-Levenshtein Distance in Python”

1. אריאל says:

אני אהיה אוף-טופיק לגמרי, כיצד עיצבת את מראה הקוד הנ”ל, האם אתה משתמש בתוסף כלשהו שעושה את העבודה?

2. Guy says:

אני משתמש בתוסף בשם
wp-syntax
הוא תומך בהרבה שפות תכנות והוא נוח לשימוש

3. Sloan says:

incorrect algorithm
refer to wikipedia
they say that whatever you have written is worng

4. Guy says:

@Sloan, I’ve just check the English wikipedia article about Damerau-Levenshtein distance (and the talk page) and didn’t see anyone referring to a mistake in my implementation.

Can you please point out the error, if it really exists, so I can correct it?

5. numerix says:

Yes, the algorithm shown above is not correct, but the other one J. Reagle gave a link to, is also incorrect.
Example: The Levenshtein-Distance between HURQBOHP and QKHOZ is 7, but both algorithms give 6 as result.
You can check it e.g. here: http://www-igm.univ-mlv.fr/~lecroq/seqcomp/node2.html

It is true, that the second algorithm is (by far!) faster than the one shown above (I checked it with two strings of length 2000 and it was about 100 times faster), but both algorithms are unefficient implementations. My own Python implementation is about 1000 times faster than the one above. I won’t show the code, because I used it for SPOJ problem EDIST (https://www.spoj.pl/problems/EDIST/). You can use that problem to check correctness and performance of Levenshtein-algorithms.

6. Michael says:

numerix, note the difference between Levenshtein Distance and Damerau-Levenshtein distance. The latter allows transpositions. The former is what is described in the SPOJ problem, and in the calculator you link to.

Both algorithms give the correct answer for that pair: delete H, U, R, replace B with K, transpose O and H, replace P with Z. Six steps.

7. Michael says:

Oh, and to Guy, yes, there’s a problem in the algorithm – try it on “AB” and “BA”. It should give one but instead says two. I didn’t figure out why that was.

8. numerix says:

Thanks Michael, for pointing that out. I was not aware of that difference!

9. Cris Stringfellow says:

@Michael deleting
“i>1 and j>1” from the if clause corrects the extra cost at position 0.

Now transposition at position 0 and 1 counts correctly. Best

Cris

10. Michele Filannino says:

The name of the function contains a typo:
– “damerau_lenenshtein_distance” -> “damerau_levenshtein_distance”

Bye,
michele.

11. Guy says:

@Michele: thanks, I corrected this.

12. Tyler Crompton says:

This algorithm is incorrect, because it is actually computes the optimal string alignment distance, which is just the Levenshtein distance with a simple transposition check. A slightly different approach must be taken for the Damerau-Levenshtein distance. As others have suggested, see the Wikipedia article for further information. It goes into more detail about the differences.

13. Elsa says:

‘zx’ to ‘xyz’, the answer should be 2 (z-x transpose, insert ‘y’) .