To determine the minimum number of operations required to convert one string into another, we can employ the concept of the Edit Distance, commonly known as the Levenshtein Distance. This algorithm considers three types of operations: insertion, deletion, and substitution of a single character.
Below is a Python function that implements this logic using dynamic programming. The function calculates the minimum number of operations needed to transform one string (str1) into another (str2).
Python Function to Calculate Edit Distance
def min_edit_distance(str1, str2):
len_str1 = len(str1)
len_str2 = len(str2)
# Create a 2D array to store the edit distances
dp = [[0 for _ in range(len_str2 + 1)] for _ in range(len_str1 + 1)]
# Fill the first column and first row of the dp array
for i in range(len_str1 + 1):
dp[i][0] = i # Deleting all characters from str1
for j in range(len_str2 + 1):
dp[0][j] = j # Adding all characters to str1 to make str2
# Fill the dp array
for i in range(1, len_str1 + 1):
for j in range(1, len_str2 + 1):
if str1[i - 1] == str2[j - 1]:
dp[i][j] = dp[i - 1][j - 1] # No operation needed
else:
dp[i][j] = min(
dp[i - 1][j] + 1, # Deletion
dp[i][j - 1] + 1, # Insertion
dp[i - 1][j - 1] + 1 # Substitution
)
return dp[len_str1][len_str2]
# Example usage
str1 = "kitten"
str2 = "sitting"
result = min_edit_distance(str1, str2)
print(f"Minimum edit distance from '{str1}' to '{str2}' is: {result}")
Explanation of the Code
1. Initialization: A 2D list dp is created to store the minimum edit distances. The size of this list is (len_str1 + 1) x (len_str2 + 1) to accommodate all possible combinations of sub-strings.
2. Base Cases:
- The first row is initialized to reflect the cost of converting an empty string to str2 by inserting all characters.
- The first column represents the cost of converting str1 to an empty string by deleting all characters.
3. Filling the DP Table:
- Iterate through each character of both strings:
- If characters match, inherit the value from the diagonal (no additional cost).
- If characters do not match, calculate the minimum cost considering all three possible operations, updating dp[i][j] accordingly.
4. Final Result: The bottom-right cell of the table (dp[len_str1][len_str2]) contains the minimum edit distance between the two strings.
This dynamic programming approach effectively reduces the time complexity to \(O(m \times n)\), where \(m\) and \(n\) are the lengths of str1 and str2, making it suitable for practical use in various applications.