Damerau-Levenshtein Distance is a metric for measuring how far two given strings are, in terms of 4 basic operations:
- deletion
- insertion
- substitution
- transposition
The distance of two strings are the minimal number of such operations needed to transform the first string to the second. The algorithm can be used to create spelling correction suggestions, by finding the closest word from a given list to the users input. See Damerau–Levenshtein distance (Wikipedia) for more info on the subject.
Here is an of the algorithm (restricted edit distance version) in Python. While this implementation isn’t perfect (performance wise) it is well suited for many applications.
"""
Compute the Damerau-Levenshtein distance between two given
strings (s1 and s2)
"""
def damerau_levenshtein_distance(s1, s2):
d = {}
lenstr1 = len(s1)
lenstr2 = len(s2)
for i in xrange(-1,lenstr1+1):
d[(i,-1)] = i+1
for j in xrange(-1,lenstr2+1):
d[(-1,j)] = j+1
for i in xrange(lenstr1):
for j in xrange(lenstr2):
if s1[i] == s2[j]:
cost = 0
else:
cost = 1
d[(i,j)] = min(
d[(i-1,j)] + 1, # deletion
d[(i,j-1)] + 1, # insertion
d[(i-1,j-1)] + cost, # substitution
)
if i and j and s1[i]==s2[j-1] and s1[i-1] == s2[j]:
d[(i,j)] = min (d[(i,j)], d[i-2,j-2] + cost) # transposition
return d[lenstr1-1,lenstr2-1] |
"""
Compute the Damerau-Levenshtein distance between two given
strings (s1 and s2)
"""
def damerau_levenshtein_distance(s1, s2):
d = {}
lenstr1 = len(s1)
lenstr2 = len(s2)
for i in xrange(-1,lenstr1+1):
d[(i,-1)] = i+1
for j in xrange(-1,lenstr2+1):
d[(-1,j)] = j+1
for i in xrange(lenstr1):
for j in xrange(lenstr2):
if s1[i] == s2[j]:
cost = 0
else:
cost = 1
d[(i,j)] = min(
d[(i-1,j)] + 1, # deletion
d[(i,j-1)] + 1, # insertion
d[(i-1,j-1)] + cost, # substitution
)
if i and j and s1[i]==s2[j-1] and s1[i-1] == s2[j]:
d[(i,j)] = min (d[(i,j)], d[i-2,j-2] + cost) # transposition
return d[lenstr1-1,lenstr2-1]
Update 24 Mar, 2012: Fixed the error in computing transposition at the beginning of the strings.