寫在前面的話: 這週末我一個同學在群上說找到一篇挺有意思的文章(就是下面要說的可讀性程式碼的心理學),說要翻譯出來,我就主動請纓了,跟他合作翻譯這篇文章,在看這篇文章的同時,我突然間想到,為什麼程式碼的可讀性那麼多人重視呢?當然我也認為程式碼的可讀性很重要,能給我們的協作開發帶來好處。我突然聯想起我前一段時間在工作之餘看的一本書,叫《人類簡史》,它在介紹人類這一物種的歷史的同時,也對人類為什麼能成為地球霸主(位於食物鏈頂端)做出瞭解釋,它認為其中一個原因就是,人類演化出想象的能力,人類能臉不紅,心不跳的說出他從未見過的東西。認為它真的存在那樣。比如說,神、科幻小說裡說的技術、以及那些概念(國家、主權、科學、民主以及各種主義)。並且讓所有人相信,利用這些想象的概念建立了能讓陌生人也能合作的框架。重點是合作。這種幾百萬人為了同一個目標而奮鬥的合作能力,這讓人類能夠戰勝其他物種的原因之一。再往下說就離題了,我想強調的是合作能力的重要性,它讓我們人類達成了今天這樣的成就,我想把它搬到我們公司研發的身上,就是說,個體能力可以不強,但是協作開發的能力一定要強,怎麼提高協作開發的能力呢?方法之一就是程式碼的可讀性,我認為程式碼的可讀性是我們協作的基礎,程式碼都看不懂,協作從何談起。又怎麼提高我們研發的生產效率呢?所以我想把下面這篇文章介紹給大家。
翻譯協作者: github.com/a1023293003 隨諭 github.com/lwhile lwhile
程式碼可讀性心理學 Psychology of Code Readability
大腦如何認識事物
By no means should this be regarded as truth, but rather a model that I’ve found extremely helpful in understanding and finding better ways of writing code. 我發現了一個很有用的模型,這個模型雖然不是真理,但是卻非常能夠幫助我理解和編寫出更好的程式碼。
I think one of the things every programmer strives for is writing better code. Readability is one of the aspects of “good code”. There have been many papers and books written on the topic, however I find many of them lacking. Not because of the recommendations, but rather the analysis part. 寫出更好的程式碼是每個程式設計師都在努力的目標。程式碼的可讀性是“好程式碼”的一個判斷標準之一。關於這個主題的論文和書已經有非常多了,然而我發現他們都存在缺點。不是因為給出的那些建議不好,而是他們都少了分析的那一部分。 (我的理解:就是說,市面上的書都是說該怎麼怎麼做,並沒有說出為什麼怎麼做。)
What makes some piece of code more readable than another? It’s one thing to say that it uses better variable names, but what makes a certain variable name easier to read? I really mean digging deeper into human psyche. It is our brain that is doing all the processing after all. 到底是什麼東西,讓有些程式碼的可讀性就是比另外一些程式碼強?有一種說法是更好的變數名,可究竟是什麼東西讓變數名更易讀?我的意思是要深入到人類的心裡層面,畢竟我們的大腦接管了所有的處理過程。
心理學入門 Psychology Primer
As any programmer knows we have limited capacity to think about things. This is our working memory limit. There’s an old myth going around that we can hold 7±2 objects in our head. It is known as “The Magical Number Seven” and it isn’t entirely accurate. This number has been refined to 4±1 and some even suggest there isn’t a limit, but rather a degradation of ideas over time. For all intents and purposes we can assume that we have a small number of ideas we can process in our head at a given time. The exact number isn’t that important. 任何程式設計師都知道,我們的思考容量是有限的。這就是我們的執行記憶體限制。有一個古老的傳說,相傳我們的大腦可以容納 7±2 個物體,它被稱為“神奇的 7 號”,不過這並不準確,這個數字有點限制到 4±1,而有些建議則認為其沒有受到限制,而是隨著時間推算思想受到了退化。出於所有意圖和目的,我們可以假設我們的大腦在給定的時間內,能處理的東西只是一個很小的數字,具體是數字是多少並不重要。
But some would still confidently say that they can handle problems involving more than 4 ideas. Luckily there’s another process going on in our brain called chunking. Our brain automatically groups information pieces into larger pieces (chunks). 但有些人還是堅持認為他們能夠同時處理超過 4 個主意。幸運的是,我們的大腦中正在進行另一個叫做分塊的過程。 我們的大腦會自動將資訊片段分成更大的片段(塊)。
Dates and phone-numbers are good examples of this: 日期和電話號碼就是很好的例子:
From these chunks we build up our long term memory. I like to imagine it as a large web of consisting many chunks, chunk sequences and groupings. 通過這些資訊塊,我們建立起我們的長期記憶。我喜歡把它們想象成一個由許多塊、塊序列和塊分組組成的大網路。 (分塊記憶,舉例,電話號碼的分段記憶。由此推出,程式碼的方法編寫,一個方法只做一件事,基於大腦用分塊來儲存資訊來解釋為什麼這麼做)
You might guess from this image that moving from one place to another in memory is slow. And you would be right. In UX there’s a concept called singular focus of attention. Which means that we can focus at a single thing at a time. It also has a friend called locus of attention, which says that our attention is also localized in space. 你可以從這張圖片得出一個結論,記憶塊之間的資訊交流是很慢的。你是對的,在 UX 領域中,這個結論被稱為單一關注焦點理論。也就是說,我們一次只能關注單一的事物。也可以說我們只能有一個關注點,我們關注的地方也只能是個區域性的。 (也就是,在一堆資訊中突然插入一段不相關的資訊,大腦會花時間建立聯絡,由此推出一個方法只做一件事!!)
You might think this is the same thing as working memory limit, however there is a slight difference. Working memory capacity talks how big our focusing area is, the focus/locus of attention say that we can only do that when there is a place in our brain that contains the ideas. 同理,我們人腦的執行的記憶體也是有限的。然而,有一個細微的不同在於,我們的人腦記憶體有多大,聚焦的範圍也就能有多大。單一關注焦點理論也說明了只有我們大腦中有存在一個包含這些想法的地方時,大腦才能正常工作。
The focus and locus of attention are important to know, because switching cost is significant. It is even slower when we need to create new chunks and groupings. It also goes the other way, the more familiar something is the less time it takes to make it our focus. 認識焦點和關注點非常重要,因為切換他們的成本是非常高的。我們人腦建立一個認知事物的塊和分組是很慢的。同樣的,如果事物之間相似度很高,建立分組的時間就會縮短,那我們就能更快的把我們的關注點聚焦到另一個事物。 (也就是,在一堆資訊中突然插入一段不相關的資訊,大腦會花時間建立聯絡,由此推出一個方法只做一件事!!這段話也是說明這個的)
We also remember things better when we are in a similar context. This is called encoding specificity principle. This means by designing our encoding and recalling conditions we can design better what we remember. 當我們處在類似的環境時,我們也能更好地記住事情。這被稱為編碼特異性原則。這意味著,通過設計編碼和回憶條件,我們可以更好地設計我們記憶中的內容。
譯者注:情境相似性是心理學中編碼特異性原則,描述的是,當回憶時的背景與識記時的背景相匹配時,記憶效果最好。觸景生情,睹物思人。
In an experiment divers were assigned to memorize words on land and under water. Then recall them on land or in water. The best results were for people who memorized and recalled on land. Surprisingly the second best were the people that memorized and recalled on water. This showed that the context where you learn things has an impact on how well you can remember things. 在一個實驗中,潛水員被分配在陸地和水下記憶單詞。然後在陸地或水中回憶它們。最好的結果是那些在陸地上記憶和回憶的人。令人驚訝的是,第二好的是那些在水中記憶和回憶的人。這表明,你學習事物的環境會影響你對事物的回憶能力。
To make things shorter, I’ll use context to refer to “focus and locus of attention” and how it relates to other chunks and loci. Effectively our brain is moving from one context to another. When we move our focus of attention we also remember what our previous contexts were, until our memory fades. 為了縮短篇幅,我會用上文來指代關注點以及它與其它塊的聯絡,以及他們之間是如何聯絡的。我們的大腦承上啟下的能力是挺強的,當我們轉移注意力的時候,我們依舊能夠記住上文出現的內容,直到我們的記憶變淡
譯者注:這樣的描述給我的感覺非常像程式和執行緒在競爭CPU的樣子。
From these contexts and chunks we build up mental representations and a mental model. There’s a slight difference between these two things. Mental representation is our internal cognitive symbol for representing the external world or a mental processes. Mental model can be thought of as a explanation of a mental representation. Often these terms are used interchangeably. 根據這些上下文和塊,我們可以構建出心理表徵和一個心理模型,兩者之間存在著細微的不同。心裡表徵是我們內在的認知中對外部世界或者心裡過程的符號。心理模型可以被認為是心理表徵的解釋。在大多數情況下,這些術語通常可以互換使用。
Mental models have a vital importance in our ability to precisely describe a solution to a problem. There are many different mental models possible for a single problem each having their own benefits and problems. 心理模型對於我們精確描述問題的解決方案的能力至關重要。對於一個問題,有許多不同的心理模型,每種模型都有各種好處和問題。
All of these ideas sound nice and precise, however our brains are quite imprecise. There are many other problems with our brain. 所有這些想法聽起來都很準確,但是我們的大腦是非常不精確的。我們的大腦還有很多其它的問題。
Our brains need to do more work when dealing with abstractions. 我們的大腦在處理抽象概念時需要做更多的努力。
When ideas are similar their chunks are related and linked in our brains in a similar way. This leads to our brain being unable to “rebuild the contexts properly” because we are uncertain which chunk is the right one. Example: I and 1; O and 0. 當一些想法相似,我們把新想法以相似的方法建立起區塊並與已存在的區塊建立聯絡並連線這導致了我們的大腦不能正確地“重構上文說的結論”(context)因為當新區塊與就區塊起衝突,我們不確定哪個是對的。比如說 l 和 1 , O 和 0。
Ambiguity is another source for uncertainty. When a thing is ambiguous then there are multiple interpretations for the same thing. Homonyms are the best example of this property. Example: Crane — the bird or the machine. 歧義是不確定性的另一個來源。當一件事模稜兩可時,對同一件事情就會有多種解釋。同義詞是此屬性的最佳示例。例如:Crane-意思可能是鶴,也可能是起重機。 (起變數名不要有歧義!,原因下面有解釋)
Uncertainty causes us to slow down. It might be a few milliseconds, but that can be enough to disrupt our state of flow or make us use more working memory than necessary. 不確定性會讓減緩我們的速度。這可能只有幾毫秒的時間,但是卻足夠打亂我們的狀態或者讓我們使用更多的工作記憶體。
There are of course interruptions that can disrupt our working memory, but there are also “smaller interruptions” called noise. If someone is saying random numbers and you are trying to calculate, then we can end-up accidentally start processing them and use up some of our working memory. This can happen also visually on screen when there are many irrelevant things between the important things. 當然,中斷可以打斷我們的工作記憶,但是還有一些小中斷被稱為噪聲。如果一個人在說一些隨機數,你試著對這些數字做計算,我們最終會意外得停止,因為這個處理過程會耗盡我們的一些工作記憶。當重要事物之間存在許多不相關的事物時,這也可以在螢幕上直觀地發生。 (同樣這言論也可以對一個方法只做一件事做出解釋,不相干的事情會佔用我們大腦的工作記憶體,記憶體滿了就當機了)
Our brains also have trouble processing negation, with support from many studies. The effect of negation depends on the context, but negation should be used with care. 在許多研究表明,我們的大腦也難以處理否定。否定的影響取決於上下文,否定應謹慎使用。
All of these together add up to cognitive load. It is the total amount of mental effort being used. Our processing capacity decreases with prolonged cognitive load and it is restored with rest. With prolonged cognitive load our minds also start to wander. 所有這些共同增加了認知負荷。認知負荷是被使用的精力的總量。長期的認知負荷會使我們處理能力下降,這通過休息來恢復。長期的認知負荷,也會使得我們的大腦開始走神。 (小休五到十分鐘,番茄時間之類的)
譯者注:認知負荷理論假設人類的認知結構由工作記憶和長時記憶組成。其中工作記憶也可稱為短時記憶,它的容量有限,一次只能儲存3-5條基本資訊或資訊塊。當要求處理資訊時,工作記憶一次只能處理2-3條資訊,因為儲存在其中的元素之間的互動也需要工作記憶空間,這就減少了能同時處理的資訊數。 (一個方法只做一件事)
If this is new information to you, then I highly suggest taking a break now. These form fundamental properties that code analysis will rest upon. 如果這對你來說是新資訊,我強烈建議現在休息一下。這些構成了程式碼分析所依賴的基本條件。
應用到程式碼裡
I’m going to use the term programming artifact. By that I mean everything that is created as a result of programming. It might be a method you write, type declarations for a function, variable names, comments, Unreal Engine Blueprints, UML diagrams etc. Effectively anything that is a direct result of programming. 我將使用程式設計工件這個術語。指代跟程式設計相關的所有內容。它可能是您編寫的方法、函式的型別宣告、變數名稱、註釋、虛幻的引擎藍圖(Unreal Engine Blueprints)、UML圖等。實際上就是程式設計的直接結果。
Here are a few recommendations, rules-of-thumb and paradigms analyzed in the context of psychology. By no means is this an exhaustive list or even a guide on what exactly to do. Probably there are many places where the analysis could be better, but this is more about showing how we can gain deeper insight into code readability by using psychology. 這裡有一些在心理學背景下分析的建議,經驗法則和範例。這絕不是一份詳盡的清單,甚至也不是關於究竟要做什麼的指南。有很多地方可能還能做得更好,但更多的是展示如何使用心理學來深入瞭解程式碼可讀性。
名稱的範圍
Scope of a name
Length is not a virtue in a name; clarity of expression is. — Rob Pike 長度不是名字中的優點,表達的清晰度是。—羅布·派卡
Let’s take a simple for loop: 讓我們使用一個簡單的for迴圈:
A. for(i=0 to N) B. for(theElementIndex=0 to theNumberOfElementsInTheList) (我在工作中經常使用長名字,太長確實也要花時間看)
Most programmers would recommend A. Why? 大多數程式設計師會推薦A。為什麼呢?
B. uses longer names which prevents us from recognizing this as a single chunk. The longer name also doesn’t help creating a better context, effectively it is just noise. B. 使用較長的名稱,這使我們無法將其識別為單個塊。更長的名字也無助於建立一個更好的上下文,實際上它只是一個噪音。
However, let’s imagine different ways of writing packages / units / modules / namespaces: 但是,讓我們想象一下編寫包/單元/模組/名稱空間的不同方式:
A. strings.IndexOf(x, y) B. s.IndexOf(x, y) C. std.utils.strings.IndexOf(x, y) D. IndexOf(x, y)
In example B. the namespace s is too short and doesn’t help “to find the right chunk”. 在例子B. 中命令空間s太短,不能幫助“找到正確的資訊塊”。
In example C. the namespace std.utils.strings is too long, most of it’s unnecessary, because strings itself is descriptive enough. (Unless you need to use multiple of them). 在例子C. 中,名稱空間std.utils.strings太長,大部分都是不必要的,因為strings本身具有足夠的描述性(除非你需要使用其中的多個)。
In example D. without namespaces, then the call becomes ambiguous, you might be unsure where the IndexOf comes from and what it is related to. 在例子D. 中,如果沒有名稱空間,那麼呼叫就變的模稜兩可,你可能無法確定IndexOf來自任何以及處理它與什麼相關。 (就是說,IndexOf這個方法不知道是幹嘛用的)
It’s important to mention that, if all of code is dealing with strings it will be quite easy to assume that IndexOf is some string related function. In such cases, even the strings part might be too noisy. For example: int16.Add(a, b) compared to a + b, would be much harder to read. 需要注意的是,如果所有程式碼都在處理字串,那麼很容易假定IndexOf是一些與字串相關的函式。在這種情況下,甚至strings部分也可能太嘈雜了。例如:int16.Add(a, b)比a + b更難以閱讀。 (變數名沒有統一的說明要如何做,也就是說受人的主觀意願的影響大,有些人覺得這樣足夠了,有些人不認為。所以,我認為在我們團隊內部統一變數命名規則會很好。)
變數狀態
State of a variable
With variables it would be easy to conclude that “modification is bad, because it makes harder to track what is happening”. But, lets take these examples: 對於變數,很容易得出這樣的結論:“修改是不好的,因為它使跟蹤正在發生的事情變得更加困難”。但是,讓我們以以下例子為例:
// A.
func foo() (int, int) {
sum, sumOfSquares := 0, 0
for _, v := range values {
sum += v
sumOfSquares += v * v
}
return sum, sumOfSquares
}
複製程式碼
// B.
func GCD(a, b int) int {
for b != 0 {
a, b = b, a % b
}
return a
}
複製程式碼
// C.
func GCD(a, b int) int {
if b == 0 {
return a
}
return GCD(b, a % b)
}
複製程式碼
Here foo is probably easiest to understand. Why? The problem isn’t modifying the variables, but rather how they are modified. A doesn’t have any complex interactions, which both B and C do. I would also guess, that even though C doesn’t have modifications, our brain still processes it as such. 在這裡,foo可能是最容易理解的。為什麼呢?問題不是修改變數,而是如何修改它們。A不存在任何複雜的相互作用,B和C都存在。我也會猜測,即使C沒有修改,我們的大腦仍然是這樣處理它的。
// D.
sum = sum + v.x
sum = sum + v.y
sum = sum + v.z
sum = sum + v.w
複製程式碼
// E.
sum1 = v.x
sum2 := sum1 + v.y
sum3 := sum2 + v.z
sum4 := sum3 + v.w
複製程式碼
Here is another example where the modification based version (D) is easier to follow. E introduces new variables for the same idea, effectively, the different variables become noise. 這裡是另一個示例,其中基於修改的版本(D)更容易理解。E為相同的思想引入新的變數,有效地將不同的變數轉換為噪聲。
慣用語法
Idioms
Let’s take another for loop: 讓我們再來一次for迴圈:
A. for(i = 0; i < N; i++)
B. for(i = 0; N > i; i++)
D. for(i = 0; i <= N-1; i += 1)
C. for(i = 0; N-1 >= i; i += 1)
複製程式碼
How long did it take for you to figure out what each line is doing? For anyone who has been coding for a while, A probably took the least time. Why is that? 你花了多長時間才弄清楚每一行都做了什麼?對於任何已經程式設計了一段時間的人來說,A可能花的時間最少的。為什麼會這樣呢?
The main reason is familiarity. To be more precise, we have a chunk in our long-term-memory for A, however not for any of the others. This means that we need to do more processing, before we can extract the meaning and concept from it. 主要原因是熟悉。更準確的說,我們的長期記憶中有一塊關於A的資訊塊,而不是其他的。這意味著我們需要做更多的處理,然後才能從中提取含義和概念。 (大家都知道的一些做法)
For any complete beginner, all of these would be processed quite similarly. They wouldn’t notice that one is “better” than any other. 對於任何一個完全的初學者來說,所有這些都會被處理得非常相似。他們不會注意到一個比其它任何一個都“更好”。
A proficient programmers reads A as a single chunk or idea “i is looped for N items”. However a beginner reads this as “We initialize i to zero. Then we test whether each time we are still smaller than N. Then we add one to i.” 熟練的程式設計師將A理解為“i的N次迴圈”。但是初學者認為這是“我們初始化為零。然後每次迴圈都測試i是否比N小。然後我們在i中加1。”
A is what you call the “idiomatic way” of writing the for loop. It’s not really better in terms of intrinsic complexity. However, most programmers can read it more easily, because it is part of our common vocabulary. A是你所稱為for迴圈的“慣用方式”。就內在的複雜性而言,這並不是真的更好。但是,大多數程式設計師可以更容易地閱讀它,因為它是我們常用詞彙表的一部分。
Most languages have an idiomatic way of writing things. There are even papers and books about them, starting with APL idioms, C++ idioms and more structural idioms like in GoF Design Patterns. These books can be regarded as a vocabulary for writing sentences and paragraphs, such that it will be recognized by people. 大多數語言都有一種慣用的寫作方式。甚至還有關於它們的論文和書籍,從APL慣用語法、C++慣用語法和像是在GoF設計模式中的更加結構化的慣用語法。這些書可以看作是寫句子和段落的詞彙,這樣才能被人們所認可。
There’s however a downside to all of this. The more idioms there are, the bigger vocabulary you have to have to understand something. Languages with unlimited flexibility often suffer due to this. People end up creating “idioms” that help them write more concise code, however everybody else will be slowed down by them. 然而,所有這些都有不利的一面。慣用語法越多,不得不去理解的詞彙量就越大。具有高度靈活性的語言常常因此而受到影響。人們最終會建立“慣用語法”,幫助他們編寫更簡潔的程式碼,但是其他人都會被它們拖慢。 (程式碼潛規則,不利用新手,新手需要記住很多潛規則。有些潛規則無法避免, 最好寫個文件,讓每個剛入職的員工先看一遍,熟悉)
一致性
Consistency
With regards to repeated structures names such as “model” and “controller” act as a chunk to remind of how these structures relate to each other. 對於重複結構,諸如“模型”和“控制器”這樣的名稱作為資訊塊來提醒這些結構是如何相互關聯的。
Frameworks, micro-architectures and game engines all try to create and enforce such relations. This means people have to spend less time figuring out how things communicate and are wired up. Once you grok the structures it becomes easier to jump from one code base to another. 框架、微體系結構和遊戲引擎都試圖建立和加強這種關係。這意味著人們可以花費更少的時間去弄清楚事物是如何溝通和連線起來的。一旦你通過感覺意會了這個結構,就更容易從一個程式碼庫跳到另一個程式碼庫。
However the main factor with all of this is consistency. The more consistent the code base is in naming, formatting, structure, interaction etc. the easier it is to jump into arbitrary code and understand it. 然而,所有這些的主要因素是一致性。程式碼庫在命名、格式化、結構、互動等方面越一致。跳入任意程式碼並理解它就越容易。 (一致性,也就是說要用個通用的規則,比如說變數名都用駱駝峰之類的)
不確定性
Uncertainty
As previously mentioned uncertainty can cause stuttering when reading or writing code. 如前所述,當閱讀或編寫程式碼時,不確定性會導致不順暢的工作。
Let’s take ambiguity as our first example. The simplest example would be [1,2,3].filter(v => v >= 2). The question is, what will this print, is it “2 and 3” or “1”. It’s a simple question, but it can cause a reading/writing stutter when you don’t use it day-in-out. 讓我們以模糊度作為我們的第一個例子。最簡單的例子是[1, 2, 3].filter(v => v >= 2)。問題是,這個印刷品是“2和3”還是“1”?這是一個簡單的問題,但當你不使用它時,它會導致讀寫工作的不順暢。
譯者注:是過濾出大於等於2的元素?還是過濾掉大於等於2的元素? (到底想要【1】還是【2,3】)
The source of the stutter is ambiguity. In the real-world there are two uses for it, one is to keep the part that is getting stuck in the filter and the other that passes through the filter. For example when you have gold in water, then you want to get rid of the water. When you have dirt in the water, you probably want to get rid of the dirt. 工作不順暢的根源是含糊不清。在現實世界中,它有兩種用途,一種是保留被卡在過濾器中的部分,另一種是通過過濾器。例如,當你有金子落入水中,那麼你想擺脫水。當你在水中有汙垢時,你可能想要清除這些汙垢。
Even if we precisely define what filter does, it can still cause stutter because it’s hardwired with two meanings in our brain. The common solution is to use functions such as select, discard, keep. 即時我們精確地定義了filter(過濾器)的作用,它仍然會導致工作不順暢,因為它在我們的大腦中有兩個含義。常見的解決方案是使用諸如select、discard、keep等函式。
We can also attach meaning in different ways, such as types. For example: instead of GetUser(string) you can use type CustomerID string to ensure GetUser(CustomerID) to make clear that the interpretation is “get user using a customer id” instead of other possibilities such as “get user by name”. 我們還可以以不同的方式附加含義,例如型別。例如:你可以使用CustomerID型別的字串代替GetUser(String),以確保GetUser(CustomerID)解釋為“使用客戶ID獲取使用者”,而不是“按名稱獲取使用者”等其它可能性。
Similarity is also easy to conceptually understand. For example having variables such as total1, total2, total3 can lead to situation where you make copy paste mistakes or over a longer piece of code lose track what it meant. For example name such as sum, sum_of_squares, total_error can provide more meaning. 相似性在概念上也很容易理解,例如,擁有諸如total1、total2、total3這樣的變數可能會導致複製貼上錯誤或在程式碼較長的時候,無法跟蹤它的含義。例如,sum、sum_of_squares、total_error等名稱可以提供更多含義。
Having multiple names for the same thing can also be source of confusion when moving between packages. For example in one package you use variable name c, cl and in another client in the third source. It’s interesting to think about special variables such as this and self. 當在包之間移動時,為同一件事情設定多個名稱也可能是混淆的根源。例如,在一個包中使用變數名稱c、c1,在另一個地方使用變數名client,在第三個地方使用變數名source。想一想特殊的變數,比如this和self,是很有趣的。
Ambiguity and similarity is not a problem just at the source level. Eric Evans noted this in DDD with the Ubiquitous Language pattern. The notion is that in different contexts such as billing and shipping, words such as “client” can have widely different usages and meanings, so it’s helpful to keep a vocabulary around to ensure that everyone communicates clearly. 歧義與相似並不僅僅是來源層的問題。Eric Evans用無處不在的語言模式在DDD中注意了這一點。這個概念是,在不同的上下文中,例如賬單和發貨,諸如“client”這樣的詞可以有寬泛而不同的用法和含義,所以保持詞彙量有助於確保每個人都清楚地溝通。
註釋
Comments
We have all seen the “stupid beginner examples” of commenting: 我們都看到了“愚蠢的初學者的例子”的註釋:
// makes variable i go from 0 to 99
for(var i = 0; i < 100; i++) {
// sets value 4 to variable a
var a = 4;
複製程式碼
(愚蠢指的是每行加註釋吧)
While it may look stupid, it might have some purpose. Think about learning a second or third language. You usually learn the new language by understanding the translation in your primary language. These are the “chunks” written out explicitly. 雖然它看起來很愚蠢,但可能有它的目的。考慮學習第二或第三語言。你通常通過理解你的主要語言的翻譯來學習新的語言。這些是明確寫出的“資訊塊”。
Once you have learned “chunk” the comments become noise, because you already know that information by looking at the second line. 一旦你學會了“資訊塊”,這些註釋就會變成噪音,因為你已經通過看第二行就知道了這些資訊。
As programmers get better, the intent of comments becomes to condense information and to provide a context for understanding code. Why was a particular approach taken when doing X or what needs to be considered when modifying the code. 當程式設計師變得更好時,註釋的目的就變成了壓縮資訊和提供理解程式碼的上下文。為什麼在執行X時採用了特定的方法,或者在修改程式碼時需要考慮什麼。
Effectively, it’s for setting up the right mental model for reading the code. 實際上,這是為了建立正確的閱讀程式碼的心理模型。 (也就是說,只在關鍵或難以理解或在潛規則程式碼處加上註釋)
上下文
Contexts
Working memory limitation leads us to decompose and partition our code into different interacting pieces. We must be mindful in how we relate different pieces and how they interact. 工作記憶限制導致我們分解和劃分我們的程式碼到不同的互動部件。我們必須注意我們如何將不同的部分聯絡起來,以及它們是如何相互作用的。
For example when we have a very deep inheritance chain and we use things from all different inheritance levels, the class might be too complicated, even if each class has maybe two methods and each method is five lines of code. The class and all the parents form a single “whole”. Illustratively you can count each “inheritance step” as a “single idea” that you need to remember when you use that particular class. 例如,當我們有一個非常深的繼承鏈,並且使用來自所有不同繼承級別的東西時,類可能太複雜了,即時每個類可能有兩種方法,而且每個方法都是五行程式碼。所有類和父類組成一個單一的“整體”。舉例說明,你可以將每個“繼承步驟”計算為使用該特定類時需要記住的“單個想法”。
The other side of contexts is moving between function calls. Each call is a “context in our mental model”, so we need to remember where we came from and how it relates to the current situation. The deeper the call stack, the more stuff we have to keep in mind. 上下文的另一面是在函式呼叫之間移動。每一個呼叫都是一個“心理模型中的上下文”,所以我們需要記住我們來自何處以及它是如何與當前的情況相關聯的。呼叫堆疊越深,我們需要記住的東西就越多。
One way to reduce the depth of our mental model contexts is to clearly separate them. One of such examples is early return: 減少我們的心理模型上下文的深度的另一種方法是清楚地將它們分開。其中一個例子就是提前返回:
public void SomeFunction(int age)
{
if (age >= 0) {
// Do Something
} else {
System.out.println("invalid age");
}
}
public void SomeFunction(int age)
{
if (age < 0){
System.out.println("invalid age");
return;
}
// Do Something
}
複製程式碼
In the first version when we read the “Do Something” part we understand it only happens when the age is positive. However, when we reach the “else” part we have forgotten what the condition was, because at that point the distance from the condition can be quite far away. 在第一個版本中,我們讀到“Do Something”的部分時,我們知道只有當年齡是非負數的時候才會發生。然而,當我們到達else部分時,我們已經忘記了條件是什麼,因為在這一點上,與條件的距離可能很遠。
The second version is somewhat nicer. We have lost the necessity to keep multiple “contexts” in our head, but can focus instead of a single context that is setup and verified by multiple checks in the beginning. 第二個版本則要好一些。我們已經失去了在頭腦中保留多個“上下文”的必要性,我們可以集中注意力,而不是在開始時通過多次檢查來設定和驗證單個上下文。
經驗法則
Rules of thumb
One of the usual recommendations is “don’t have global variables”. But, when a variable is set during startup and never changed again, is that a problem? The problem isn’t in the “variableness” or “globalness” of something, but rather in how it affects our capability to understand code. When something is modified at a distance then we cannot build a contained model of it. The “globalness” of course clutters the namespace (depending on the language) and means there are more places it can be accessed from. Of course there are many other things that have same properties, such as “Singleton”. So, why is it considered better than a global variable? 通常的建議之一是“沒有全域性變數”。但是,當一個變數在啟動時被設定,並且再也不會改變,這是一個問題嗎?問題不在於事物的“多樣性”或“全域性性”,而在於它如何影響我們理解程式碼的能力。當某物在一定時間間隔內被修改時,我們就無法建立包含它在內的模型。當然,“全域性性”把名稱空間(取決於語言)弄得很亂,這意味著可以從更多的地方訪問它。當然,還有許多其他的東西具有相同的屬性,比如單例。那麼,為什麼人們認為它比全域性變數更好呢?
Single responsibility principle (SRP) is easy to understand with these concepts. It tries to ensure that we have proper chunks for a thing. This constraint often makes chunks smaller. Having a single responsibility also means that we end up with things that have working memory need. However, we need to consider that when we separate a class or function into multiple pieces we introduce many new artifacts. When these artifacts are deeply bound together we may not even gain the benefits of SRP. 單一責任原則(SRP)很容易理解這個概念。它試圖確保我們有適合某件事的資訊塊。這個約束通常使資訊塊變小。擁有單一的責任也意味著我們最終會有工作記憶所需的東西。但是,我們需要考慮的是,當我們將類或函式分離為多個部分時,我們引入了許多新的構件。當這些構件深深地結合在一起時,我們甚至可能得不到SRP的好處。
Carmack’s comments on inlined functions is a good example of this. The three examples he gave were these: Carmack對行內函數的評價就是一個很好的例子。他列舉了三個例子如下:
// A
void MinorFunction1( void ) {
}
void MinorFunction2( void ) {
}
void MinorFunction3( void ) {
}
void MajorFunction( void ) {
MinorFunction1();
MinorFunction2();
MinorFunction3();
}
// B
void MajorFunction( void ) {
MinorFunction1();
MinorFunction2();
MinorFunction3();
}
void MinorFunction1( void ) {
}
void MinorFunction2( void ) {
}
void MinorFunction3( void ) {
}
// C.
void MajorFunction( void ) {
{ // MinorFunction1
}
{ // MinorFunction2
}
{ // MinorFunction3
}
}
複製程式碼
By making pieces smaller we made the chunks smaller, however understanding the system became harder. We cannot read our code from top-to-bottom and understand what it does, but instead we have to jump around in the code base to read it. Version C preserves the linear ordering while still maintaining the conceptual chunks. 我們通過使部件更小,從而使資訊塊更小,但是理解系統變得更加困難。我們不能自上而下的閱讀我們的程式碼,也不能理解它的作用,相反,我們必須在程式碼庫中跳來跳去去閱讀它。在保持概念快的同時,C版保留了現行排序。
概要
Summary
Overall we can summarize the code readability as trying to balance different aspects: 總之,我們可以將程式碼可讀性概括為試圖平衡不同方面:
1.Names help us retrieve the right chunks from memory and help us figure out their meaning. Too long a name can end up being noisy in our code. Too short a name may not help us figure out its true meaning. Bad names are misleading and confusing. 1.名字幫助我們從記憶中檢索出正確的資訊塊,並幫助我們理解它們的意義。在我們的程式碼中,太長的名字可能會引起噪音。太短的名字可能無法幫助我們找出它真正的含義。不好的名字是誤導和令人困惑的。
2.To minimize the cost of shifting attention, we try to write all related code close together. To minimize the burden to our working memory, we try to split the code into smaller and more fathomable units. 為了將注意力轉移的成本降到最低,我們嘗試將所有相關程式碼緊密地寫在一起。為了將工作記憶的負擔降到最低,我們嘗試將程式碼分割成更小、更可以理解的單元。
3.Using common vocabulary allows the author as well as the team to rely on previous code-reading experience. That means reading, understanding and contributing to code is easier. Using unique solutions in place where a common one would do, can slow down new readers of that code. 使用通用詞彙可以讓作者和團隊依賴以前的程式碼閱讀經驗。這意味著閱讀、理解和貢獻程式碼更容易。在一個普通的解決方案可以解決的問題中使用獨特的解決方案,可以讓程式碼的新讀者閱讀變得遲鈍。
In practice there is no “perfect” way of organizing code, but there are many trade-offs. While I focused on readability, it is never the end goal, there are many other things to consider like reliability, maintainability, performance, speed of prototyping. 在實踐中,沒有“完美”的組織程式碼方式,但有許多權衡。雖然我關注的是可讀性,但它永遠不是最終的目標,還有許多其他的事情需要考慮,比如可靠性、可維護性、效能、原型的速度。