2014年亞馬遜線上筆試題目及解法_9_24(字串編輯距離升級版)

菜鳥加貝的爬升發表於2013-10-05

Amazon Campus(2013-Sep-24)Question 2 / 2 (Amazon Campus(17):Find the differences of items in amazon)

Amazon has millions of different items in different categories right now, so when sellers want to sell items in our website, sellers want to find the right categories their items belong to.  Suppose we want to build a system to help sellers find the minimal differences items and then find the right category. The difference index is a number that sum of single-word edits (insertion, deletion, substitution) required to change one phrase into the other:
For example, we get two lines from standard input Hadoop in practice
Hadoop operations
The difference index  of ‘Hadoop in practice’ and ‘Hadoop operations’ is 2. Because we can remove ‘practice’ and substitute ‘in’ with ‘operations’, then ‘Hadoop in practice’ can convert to ‘Hadoop operations’

For example, we get two lines from standard input
Hadoop cookbook
Hadoop operations
The difference index of ‘Hadoop cookbook’ and ‘Hadoop operations’ is 1. Because we can substitute ‘cookbook’ with ‘operations’ then convert 'Hadoop cookbook' can convert to 'Hadoop operations'

For example, we get two lines from standard input:
Ruby in action
Hadoop operations
The difference index of ‘Ruby in action’ and ‘Hadoop operations’ is 3. Because we can substitute ‘Ruby’ with ‘Hadoop’, ‘in’ with ‘operations’ and remove ‘action’ then 'Ruby in action' can convert to 'Hadoop operations'

// 其實這道題目考查的兩個字串的編輯距離問題,詳細請看另一篇文章《詳解字串編輯距離求解》,不多述,程式碼如下:

 

  1 #include <iostream>
  2 #include <string>
  3 #include <vector>
  4 
  5 using namespace std;
  6 
  7 void CutStr2Vec(const char *str, vector<string> &vec)
  8 {
  9     char *pNext = NULL;
 10     int   nLen = strlen(str);
 11     // 因為strtok函式是會影響源字串的,這裡我們不想被影響,所以拷貝
 12     char *strTmp = new char[nLen + 1];
 13     memset(strTmp,0, nLen + 1);
 14     strcpy_s(strTmp,nLen+1,str);
 15 
 16     char *tokenStr = strtok_s(strTmp," ",&pNext);
 17     while (tokenStr != NULL)
 18     {
 19         string str_(tokenStr);
 20         vec.push_back(str_);
 21         tokenStr = strtok_s(NULL," ",&pNext);
 22     }
 23 
 24     delete strTmp;
 25 }
 26 
 27 int Min3Values(int a, int b, int c)
 28 {
 29     int tmp = (a <= b? a:b);
 30     return (tmp <= c ? tmp :c);
 31 }
 32 
 33 
 34 int nDiffOf2Strings(char*strA,char *strB) 
 35 {
 36     // 分別將兩個字串進行分解,存起來成為兩個字串陣列
 37     vector<string> VecStrA;
 38     vector<string> VecStrB;
 39     CutStr2Vec(strA,VecStrA);
 40     CutStr2Vec(strB,VecStrB);
 41     // 獲得兩個字串容器的大小
 42     int nLenA = VecStrA.size();
 43     int nLenB = VecStrB.size();
 44     // 動態規劃中代價陣列的計算
 45     int **matrix = new int *[nLenA + 1];
 46     int i,j;
 47     for (i = 0; i != nLenA +1; i++)
 48     {
 49         matrix[i] = new int[nLenB + 1];
 50     }
 51 
 52 
 53     matrix[0][0] = 0;
 54     for (i = 0; i != nLenA+1; i++)
 55     {
 56         matrix[i][0] = i;
 57     }
 58     for (i=0; i!= nLenB+1; i++)
 59     {
 60         matrix[0][i] = i;
 61     }
 62     //
 63     for (i = 1; i != nLenA + 1; i++)
 64     {
 65         for (j = 1; j != nLenB + 1; j++)
 66         {
 67             int Fij = 0;
 68             // 如果兩個字串不一致,則距離+1 ,否則預設為0
 69             if (VecStrA[i-1].compare(VecStrB[j-1]) != 0)
 70             {
 71                 Fij = 1;
 72             }
 73             matrix[i][j] = Min3Values(matrix[i][j-1] +1, matrix[i-1][j] +1, matrix[i-1][j-1] +Fij);
 74         }
 75     }
 76 
 77 
 78     int nDis = matrix[nLenA][nLenB];
 79     for ( i = 0; i != nLenA+1; i++)
 80     {
 81         delete[] matrix[i];
 82     }
 83     delete[] matrix;
 84 
 85     return nDis;
 86 }
 87 
 88 
 89 
 90 int main()
 91 {
 92     int n1 = nDiffOf2Strings("Ruby in action","Hadoop operations");
 93     int n2 = nDiffOf2Strings("Hadoop in practice","Hadoop operations");
 94     int n3 = nDiffOf2Strings("Hadoop cookbook","Hadoop operations");
 95     int n4 = nDiffOf2Strings("Kindle Fire HD Tablet","Kindle Fire HD 8.9\" 4G LTE Wireless Tablet");
 96 
 97     cout<<n1<<endl<<n2<<endl<<n3<<endl<<n4<<endl;
 98 
 99     return 0;
100 }

 

結果如下所示:

 如果有什麼建議或者問題,希望能夠給予幫助,歡迎討論!謝謝~

相關文章