重複程式碼(克隆程式碼)的幾個概念與型別

Liuwei-Sunny發表於2013-07-14

       本文內容來源於以下兩篇參考文獻:

       [1] Chanchal K. Roy, James R. Cordy, Rainer Koschke. Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Qualitative Approach. Science of Computer Programming, 2009, 74(7): 470-495.

       [2] Hamid Abdul Basit, Stan Jarzabek. A Data Mining Approach for Detecting Higher-level Clones in Software. IEEE Transactions on Software Engineering, 2009, 35(4): 497-513.

 

Code FragmentA code fragment (CF) is any sequence of code lines (with or without comments.) It can be of any granularity, e.g., function definition, begin-end block, or sequence of statements. A CF is identified by its file name and begin-end line numbers in the original code base and is denoted as a triple (CF.FileName, CF.BeginLine, CF.EndLine).

程式碼片段:程式碼片段(CF)是任意一個程式碼行序列(可能包含註釋,也可能不包含註釋)。它可以是任意粒度的,例如,程式碼片段可以是一個函式的定義,一個begin-end語句塊或者一個語句序列。一個程式碼片段可通過它所在的檔名、原始碼中的起始行號和結束行號來標識,它可以通過一個三元組表示:CF.FileName(檔名),CF.BeginLine(起始程式碼行行號),CF.EndLine(結束程式碼行行號)。

 

Code Clone: A code fragment CF2 is a clone of another code fragment CF1 if they are similar by some given definition of similarity, that is, f(CF1) = f(CF2) where f is the similarity function. Two fragments that are similar to each other form a clone pair (CF1, CF2), and when many fragments are similar, they form a clone class or clone group.

程式碼克隆:程式碼片段CF2是另一個程式碼片段CF1的一個克隆,是指根據一些給定的相似性定義它們之間是相似的,也就是說,f(CF1) = f(CF2)f表示相似度函式。兩個相似的程式碼片段構成了一個克隆對Clone Pair(CF1, CF2),多個相似的程式碼片段構成了一個克隆類(Clone Class)或克隆組(Clone Group)

 

Clone Type: There are two main kinds of similarity between code fragments. Fragments can be similar based on the similarity of their program text, or they can be similar based on their functionality (independent of their text). The first kind of clone is often the result of copying a code fragment and pasting into another location. In the following we provide the types of clones based on both the textual (Types 1 to 3) and functional similarities:

Type-1: Identical code fragments except for variations in whitespace, layout and comments.

Type-2: Syntactically identical fragments except for variations in identifiers, literals, types, whitespace, layout and comments.

Type-3: Copied fragments with further modifications such as changed, added or removed statements, in addition to variations in identifiers, literals, types, whitespace, layout and comments.

Type-4: Two or more code fragments that perform the same computation but are implemented by different syntactic variants.

克隆型別:程式碼片段之間的相似性主要有兩種型別,第一類片段之間的相似性基於它們程式文字之間的相似性,第二類相似性是基於函式性的(獨立於程式文字)。前者通常是由於拷貝一段程式碼片段並貼上到另一個位置而產生的。接下來我們提供了四種基於文字(型別一至型別三)和函式相似性的克隆型別:

型別一:除空格、佈局和註釋不同之外,其餘部分都相同的程式碼片段。

型別二:除識別符號、字面量、型別、空格、佈局和註釋外,語法結構相同的程式碼片段。

型別三:除識別符號、字面量、型別、空格、佈局和註釋外,進一步對克隆程式碼段進行改動,例如修改、增加或者刪除語句。

型別四:兩個或多個程式碼片段執行相同的計算,但是語法結構的實現方式不同。

 

       上述四類克隆統稱為簡單克隆(Simple Clone),將簡單克隆組合成高層的粗粒度克隆稱為結構克隆(Structural Clone)

 

【作者:劉偉 http://blog.csdn.net/lovelion

相關文章