Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)
You have the following 3 operations permitted on a word:
a) Insert a character
b) Delete a character
c) Replace a character
自然語言處理(NLP)中,有一個基本問題就是求兩個字串的minimal Edit Distance, 也稱Levenshtein distance。受到一篇Edit Distance介紹文章的啟發,本文用動態規劃求取了兩個字串之間的minimal Edit Distance. 動態規劃方程將在下文進行講解。
1. what is minimal edit distance?
簡單地說,就是僅通過插入(insert)、刪除(delete)和替換(substitute)個操作將一個字串s1變換到另一個字串s2的最少步驟數。熟悉演算法的同學很容易知道這是個動態規劃問題。
其實一個替換操作可以相當於一個delete+一個insert,所以我們將權值定義如下:
I (insert):1
D (delete):1
S (substitute):2
2. example:
intention->execution
Minimal edit distance:
delete i ; n->e ; t->x ; insert c ; n->u 求和得cost=8
3.calculate minimal edit distance dynamically
思路見註釋,這裡D[i,j]就是取s1前i個character和s2前j個character所得minimal edit distance
三個操作動態進行更新:
D(i,j)=min { D(i-1, j) +1, D(i, j-1) +1 , D(i-1, j-1) + s1[i]==s2[j] ? 0 : 2};中的三項分別對應D,I,S。(詳見我同學的部落格)
1 class Solution { 2 public: 3 int minDistance(string word1, string word2) { 4 int len1 = word1.length(); 5 int len2 = word2.length(); 6 if (len1 == 0) return len2; 7 if (len2 == 0) return len1; 8 vector<vector<int> > dp(len1 + 1, vector<int>(len2 + 1)); 9 for (int i = 0; i <= len1; ++i) dp[i][0] = i; 10 for (int j = 0; j <= len2; ++j) dp[0][j] = j; 11 int cost; 12 for (int i = 1; i <= len1; ++i) { 13 for (int j = 1; j <= len2; ++j) { 14 cost = (word1[i-1] == word2[j - 1]) ? 0 : 1; 15 dp[i][j] = min(dp[i-1][j-1] + cost, min(dp[i][j-1] + 1, dp[i-1][j] + 1)); 16 } 17 } 18 return dp[len1][len2]; 19 } 20 };