編寫可讀程式碼的藝術

beiyuu發表於2013-03-24

  這是《The Art of Readable Code | 編寫可讀程式碼的藝術》的讀書筆記,再加一點自己的認識,強烈推薦此書。

程式碼為什麼要易於理解

“Code should be written to minimize the time it would take for someone else to understand it.”

  日常工作的事實是:

  • 寫程式碼前的思考和看程式碼的時間遠大於真正寫的時間
  • 讀程式碼是很平常的事情,不論是別人的,還是自己的,半年前寫的可認為是別人的程式碼
  • 程式碼可讀性高,很快就可以理解程式的邏輯,進入工作狀態
  • 行數少的程式碼不一定就容易理解
  • 程式碼的可讀性與程式的效率、架構、易於測試一點也不衝突

  整本書都圍繞“如何讓程式碼的可讀性更高”這個目標來寫。這也是好程式碼的重要標準之一。

 如何命名

  變數名中應包含更多資訊

  使用含義明確的詞,比如用download而不是get,參考以下替換方案:

	 send -> deliver, dispatch, announce, distribute, route
	 find -> search, extract, locate, recover
	start -> lanuch, create, begin, open
	 make -> create,set up, build, generate, compose, add, new

  避免通用的詞

  像tmp和retval這樣詞,除了說明是臨時變數和返回值之外,沒有任何意義。但是給他加一些有意義的詞,就會很明確:

	tmp_file = tempfile.NamedTemporaryFile() 
	...
	SaveData(tmp_file, ...)

  不使用retval而使用變數真正代表的意義:

	sum_squares += v[i]; // Where's the "square" that we're summing? Bug!

  巢狀的for迴圈中,i、j也有同樣讓人困惑的時候:

	for (int i = 0; i < clubs.size(); i++)
	 for (int j = 0; j < clubs[i].members.size(); j++)
	 for (int k = 0; k < users.size(); k++) if (clubs[i].members[k] == users[j])
	 cout << "user[" << j << "] is in club[" << i << "]" << endl;

  換一種寫法就會清晰很多:

	 if (clubs[ci].members[mi] == users[ui]) # OK. First letters match.

  所以,當使用一些通用的詞,要有充分的理由才可以。

  使用具體的名字

  CanListenOnPort就比ServerCanStart好,can start比較含糊,而listen on port確切的說明了這個方法將要做什麼。

  --run_locally就不如--extra_logging來的明確。

  增加重要的細節,比如變數的單位_ms,對原始字串加_raw

  如果一個變數很重要,那麼在名字上多加一些額外的字就會更加易讀,比如將string id; // Example: "af84ef845cd8"換成string hex_id;。

	 Start(int delay) --> delay → delay_secs
	 CreateCache(int size) --> size → size_mb
	ThrottleDownload(float limit) --> limit → max_kbps
	 Rotate(float angle) --> angle → degrees_cw

  更多例子:

	password -> plaintext_password
	 comment -> unescaped_comment
	 html -> html_utf8
	 data -> data_urlenc

  對於作用域大的變數使用較長的名字

  在比較小的作用域內,可以使用較短的變數名,在較大的作用域內使用的變數,最好用長一點的名字,編輯器的自動補全都可以很好的減少鍵盤輸入。對於一些縮寫字首,儘量選擇眾所周知的(如str),一個判斷標準是,當新成員加入時,是否可以無需他人幫助而明白字首代表什麼。

  合理使用_、-等符號,比如對私有變數加_字首。

	var x = new DatePicker(); // DatePicker() 是類的"構造"函式,大寫開始
	var y = pageHeight(); // pageHeight() 是一個普通函式

	var $all_images = $("img"); // $all_images 是jQuery物件
	var height = 250; // height不是

	//id和class的寫法分開
	<div id="middle_column" class="main-content"> ...

  命名不能有歧義

  命名的時候可以先想一下,我要用的這個詞是否有別的含義。舉個例子:

	results = Database.all_objects.filter("year <= 2011")

  現在的結果到底是包含2011年之前的呢還是不包含呢?

	  使用min、max代替limit
	CART_TOO_BIG_LIMIT = 10
	 if shopping_cart.num_items() >= CART_TOO_BIG_LIMIT:
	 Error("Too many items in cart.")

	MAX_ITEMS_IN_CART = 10
	 if shopping_cart.num_items() > MAX_ITEMS_IN_CART:
	 Error("Too many items in cart.")

  對比上例中CART_TOO_BIG_LIMIT和MAX_ITEMS_IN_CART,想想哪個更好呢?

  使用first和last來表示閉區間

	print integer_range(start=2, stop=4)
	# Does this print [2,3] or [2,3,4] (or something else)?

	set.PrintKeys(first="Bart", last="Maggie")

  first和last含義明確,適宜表示閉區間。

  使用beigin和end表示前閉後開(2,9))區間

	PrintEventsInRange("OCT 16 12:00am", "OCT 17 12:00am")

	PrintEventsInRange("OCT 16 12:00am", "OCT 16 11:59:59.9999pm")

  上面一種寫法就比下面的舒服多了。

  Boolean型變數命名

	bool read_password = true;

  這是一個很危險的命名,到底是需要讀取密碼呢,還是密碼已經被讀取呢,不知道,所以這個變數可以使用user_is_authenticated代替。通常,給Boolean型變數新增is、has、can、should可以讓含義更清晰,比如:

	 SpaceLeft() --> hasSpaceLeft()
	bool disable_ssl = false --> bool use_ssl = true

  符合預期

	public class StatisticsCollector {
	 public void addSample(double x) { ... }
	 public double getMean() {
	 // Iterate through all samples and return total / num_samples
	 }
	 ...
	}

  在這個例子中,getMean方法遍歷了所有的樣本,返回總額,所以並不是普通意義上輕量的get方法,所以應該取名computeMean比較合適。

 漂亮的格式

  寫出來漂亮的格式,充滿美感,讀起來自然也會舒服很多,對比下面兩個例子:

	class StatsKeeper {
	 public:
	 // A class for keeping track of a series of doubles
	 void Add(double d); // and methods for quick statistics about them
	 private: int count; /* how many so far
	 */ public:
	 double Average();
	 private: double minimum;
	 list<double>
	 past_items
	 ;double maximum;
	};

  什麼是充滿美感的呢:

	// A class for keeping track of a series of doubles
	// and methods for quick statistics about them.
	class StatsKeeper {
	 public:
	 void Add(double d);
	 double Average();
	 private:
	 list<double> past_items;
	 int count; // how many so far
	 double minimum;
	 double maximum;
	};

  考慮斷行的連續性和簡潔

  這段程式碼需要斷行,來滿足不超過一行80個字元的要求,引數也需要註釋說明:

	public class PerformanceTester {
	 public static final TcpConnectionSimulator wifi = new TcpConnectionSimulator(
	 500, /* Kbps */
	 80, /* millisecs latency */
	 200, /* jitter */
	 1 /* packet loss % */);

	 public static final TcpConnectionSimulator t3_fiber = new TcpConnectionSimulator(
	 45000, /* Kbps */
	 10, /* millisecs latency */
	 0, /* jitter */
	 0 /* packet loss % */);

	 public static final TcpConnectionSimulator cell = new TcpConnectionSimulator(
	 100, /* Kbps */
	 400, /* millisecs latency */
	 250, /* jitter */
	 5 /* packet loss % */);
	}

  考慮到程式碼的連貫性,先優化成這樣:

	public class PerformanceTester {
	 public static final TcpConnectionSimulator wifi =
	 new TcpConnectionSimulator(
	 500, /* Kbps */
	 80, /* millisecs latency */ 200, /* jitter */
	 1 /* packet loss % */);

	 public static final TcpConnectionSimulator t3_fiber =
	 new TcpConnectionSimulator(
	 45000, /* Kbps */
	 10, /* millisecs latency */
	 0, /* jitter */
	 0 /* packet loss % */);

	 public static final TcpConnectionSimulator cell =
	 new TcpConnectionSimulator(
	 100, /* Kbps */
	 400, /* millisecs latency */
	 250, /* jitter */
	 5 /* packet loss % */);
	}

  連貫性好一點,但還是太羅嗦,額外佔用很多空間:

	public class PerformanceTester {
	 // TcpConnectionSimulator(throughput, latency, jitter, packet_loss)
	 // [Kbps] [ms] [ms] [percent]
	 public static final TcpConnectionSimulator wifi =
	 new TcpConnectionSimulator(500, 80, 200, 1);

	 public static final TcpConnectionSimulator t3_fiber =
	 new TcpConnectionSimulator(45000, 10, 0, 0);

	 public static final TcpConnectionSimulator cell =
	 new TcpConnectionSimulator(100, 400, 250, 5);
	}

  用函式封裝

	// Turn a partial_name like "Doug Adams" into "Mr. Douglas Adams".
	// If not possible, 'error' is filled with an explanation.
	string ExpandFullName(DatabaseConnection dc, string partial_name, string* error);

	DatabaseConnection database_connection;
	string error;
	assert(ExpandFullName(database_connection, "Doug Adams", &error)
	 == "Mr. Douglas Adams");
	assert(error == "");
	assert(ExpandFullName(database_connection, " Jake Brown ", &error)
	 == "Mr. Jacob Brown III");
	assert(error == "");
	assert(ExpandFullName(database_connection, "No Such Guy", &error) == "");
	assert(error == "no match found");
	assert(ExpandFullName(database_connection, "John", &error) == "");
	assert(error == "more than one result");

  上面這段程式碼看起來很髒亂,很多重複性的東西,可以用函式封裝:

	CheckFullName("Doug Adams", "Mr. Douglas Adams", "");
	CheckFullName(" Jake Brown ", "Mr. Jake Brown III", "");
	CheckFullName("No Such Guy", "", "no match found");
	CheckFullName("John", "", "more than one result");

	void CheckFullName(string partial_name,
	 string expected_full_name,
	 string expected_error) {
	 // database_connection is now a class member
	 string error;
	 string full_name = ExpandFullName(database_connection, partial_name, &error);
	 assert(error == expected_error);
	 assert(full_name == expected_full_name);
	}

  列對齊

  列對齊可以讓程式碼段看起來更舒適:

	CheckFullName("Doug Adams" , "Mr. Douglas Adams" , "");
	CheckFullName(" Jake Brown ", "Mr. Jake Brown III", "");
	CheckFullName("No Such Guy" , "" , "no match found");
	CheckFullName("John" , "" , "more than one result");

	commands[] = {
	 ...
	 { "timeout" , NULL , cmd_spec_timeout},
	 { "timestamping" , &opt.timestamping , cmd_boolean},
	 { "tries" , &opt.ntry , cmd_number_inf},
	 { "useproxy" , &opt.use_proxy , cmd_boolean},
	 { "useragent" , NULL , cmd_spec_useragent},
	 ...
	};

  程式碼用塊區分

	class FrontendServer {
	 public:
	 FrontendServer();
	 void ViewProfile(HttpRequest* request);
	 void OpenDatabase(string location, string user);
	 void SaveProfile(HttpRequest* request);
	 string ExtractQueryParam(HttpRequest* request, string param);
	 void ReplyOK(HttpRequest* request, string html);
	 void FindFriends(HttpRequest* request);
	 void ReplyNotFound(HttpRequest* request, string error);
	 void CloseDatabase(string location);
	 ~FrontendServer();
	};

  上面這一段雖然能看,不過還有優化空間:

	class FrontendServer {
	 public:
	 FrontendServer();
	 ~FrontendServer();
	 // Handlers
	 void ViewProfile(HttpRequest* request);
	 void SaveProfile(HttpRequest* request);
	 void FindFriends(HttpRequest* request);

	 // Request/Reply Utilities
	 string ExtractQueryParam(HttpRequest* request, string param);
	 void ReplyOK(HttpRequest* request, string html);
	 void ReplyNotFound(HttpRequest* request, string error);

	 // Database Helpers
	 void OpenDatabase(string location, string user);
	 void CloseDatabase(string location);
	};

  再來看一段程式碼:

	# Import the user's email contacts, and match them to users in our system.
	# Then display a list of those users that he/she isn't already friends with.
	def suggest_new_friends(user, email_password):
	 friends = user.friends()
	 friend_emails = set(f.email for f in friends)
	 contacts = import_contacts(user.email, email_password)
	 contact_emails = set(c.email for c in contacts)
	 non_friend_emails = contact_emails - friend_emails
	 suggested_friends = User.objects.select(email__in=non_friend_emails)
	 display['user'] = user
	 display['friends'] = friends
	 display['suggested_friends'] = suggested_friends
	 return render("suggested_friends.html", display)

  全都混在一起,視覺壓力相當大,按功能化塊:

	def suggest_new_friends(user, email_password):
	 # Get the user's friends' email addresses.
	 friends = user.friends()
	 friend_emails = set(f.email for f in friends)

	 # Import all email addresses from this user's email account.
	 contacts = import_contacts(user.email, email_password)
	 contact_emails = set(c.email for c in contacts)

	 # Find matching users that they aren't already friends with.
	 non_friend_emails = contact_emails - friend_emails
	 suggested_friends = User.objects.select(email__in=non_friend_emails)

	 # Display these lists on the page. display['user'] = user
	 display['friends'] = friends
	 display['suggested_friends'] = suggested_friends

	 return render("suggested_friends.html", display)

  讓程式碼看起來更舒服,需要在寫的過程中多注意,培養一些好的習慣,尤其當團隊合作的時候,程式碼風格比如大括號的位置並沒有對錯,但是不遵循團隊規範那就是錯的。

 如何寫註釋

  當你寫程式碼的時候,你會思考很多,但是最終呈現給讀者的就只剩程式碼本身了,額外的資訊丟失了,所以註釋的目的就是讓讀者瞭解更多的資訊。

  應該註釋什麼

  不應該註釋什麼

  這樣的註釋毫無價值:

	// The class definition for Account
	class Account {
	 public:
	 // Constructor
	 Account();
	 // Set the profit member to a new value
	 void SetProfit(double profit);
	 // Return the profit from this Account
	 double GetProfit();
	};

  不要像下面這樣為了註釋而註釋:

	// Find a Node with the given 'name' or return NULL.
	// If depth <= 0, only 'subtree' is inspected.
	// If depth == N, only 'subtree' and N levels below are inspected.
	Node* FindNodeInSubtree(Node* subtree, string name, int depth);

  不要給爛取名註釋

	// Enforce limits on the Reply as stated in the Request,
	// such as the number of items returned, or total byte size, etc. 
	void CleanReply(Request request, Reply reply);

  註釋的大部分都在解釋clean是什麼意思,那不如換個正確的名字:

	// Make sure 'reply' meets the count/byte/etc. limits from the 'request' 
	void EnforceLimitsFromRequest(Request request, Reply reply);

  記錄你的想法

  我們討論了不該註釋什麼,那麼應該註釋什麼呢?註釋應該記錄你思考程式碼怎麼寫的結果,比如像下面這些:

	// Surprisingly, a binary tree was 40% faster than a hash table for this data.
	// The cost of computing a hash was more than the left/right comparisons.

	// This heuristic might miss a few words. That's OK; solving this 100% is hard.

	// This class is getting messy. Maybe we should create a 'ResourceNode' subclass to
	// help organize things.

  也可以用來記錄流程和常量:

	// TODO: use a faster algorithm
	// TODO(dustin): handle other image formats besides JPEG

	NUM_THREADS = 8 # as long as it's >= 2 * num_processors, that's good enough.

	// Impose a reasonable limit - no human can read that much anyway.
	const int MAX_RSS_SUBSCRIPTIONS = 1000;

  可用的詞有:

  • TODO : Stuff I haven’t gotten around to yet
  • FIXME : Known-broken code here
  • HACK : Adimittedly inelegant solution to a problem
  • XXX : Danger! Major problem here

  站在讀者的角度去思考

  當別人讀你的程式碼時,讓他們產生疑問的部分,就是你應該註釋的地方。

	struct Recorder {
	 vector<float> data;
	 ...
	 void Clear() {
	 vector<float>().swap(data); // Huh? Why not just data.clear()? 
	 }
	};

  很多C++的程式設計師啊看到這裡,可能會想為什麼不用data.clear()來代替vector.swap,所以那個地方應該加上註釋:

	// Force vector to relinquish its memory (look up "STL swap trick")
	vector<float>().swap(data);

  說明可能陷阱

  你在寫程式碼的過程中,可能用到一些hack,或者有其他需要讀程式碼的人知道的陷阱,這時候就應該註釋:

	void SendEmail(string to, string subject, string body);

  而實際上這個傳送郵件的函式是呼叫別的服務,有超時設定,所以需要註釋:

	// Calls an external service to deliver email. (Times out after 1 minute.)
	void SendEmail(string to, string subject, string body);

  全景的註釋

  有時候為了更清楚說明,需要給整個檔案加註釋,讓讀者有個總體的概念:

	// This file contains helper functions that provide a more convenient interface to our
	// file system. It handles file permissions and other nitty-gritty details.

  總結性的註釋

  即使是在函式內部,也可以有類似檔案註釋那樣的說明註釋:

	# Find all the items that customers purchased for themselves.
	for customer_id in all_customers:
	 for sale in all_sales[customer_id].sales:
	 if sale.recipient == customer_id:
	 ...

  或者按照函式的步進,寫一些註釋:

	def GenerateUserReport():
	 # Acquire a lock for this user
	 ...
	 # Read user's info from the database
	 ...
	 # Write info to a file
	 ...
	 # Release the lock for this user

  很多人不願意寫註釋,確實,要寫好註釋也不是一件簡單的事情,也可以在檔案專門的地方,留個寫註釋的區域,可以寫下你任何想說的東西。

  註釋應簡明準確

  前一個小節討論了註釋應該寫什麼,這一節來討論應該怎麼寫,因為註釋很重要,所以要寫的精確,註釋也佔據螢幕空間,所以要簡潔。

  精簡註釋

	// The int is the CategoryType.
	// The first float in the inner pair is the 'score',
	// the second is the 'weight'.
	typedef hash_map<int, pair<float, float> > ScoreMap;

  這樣寫太羅嗦了,儘量精簡壓縮成這樣:

	// CategoryType -> (score, weight)
	typedef hash_map<int, pair<float, float> > ScoreMap;

  避免有歧義的代詞

	// Insert the data into the cache, but check if it's too big first.

  這裡的it's有歧義,不知道所指的是data還是cache,改成如下:

	// Insert the data into the cache, but check if the data is too big first.

  還有更好的解決辦法,這裡的it就有明確所指:

	// If the data is small enough, insert it into the cache.

  語句要精簡準確

	# Depending on whether we've already crawled this URL before, give it a different priority.

  這句話理解起來太費勁,改成如下就好理解很多:

	# Give higher priority to URLs we've never crawled before.

  精確描述函式的目的

	// Return the number of lines in this file.
	int CountLines(string filename) { ... }

  這樣的一個函式,用起來可能會一頭霧水,因為他可以有很多歧義:

  • ”” 一個空檔案,是0行還是1行?
  • “hello” 只有一行,那麼返回值是0還是1?
  • “hello\n” 這種情況返回1還是2?
  • “hello\n world” 返回1還是2?
  • “hello\n\r cruel\n world\r” 返回2、3、4哪一個呢?

  所以註釋應該這樣寫:

	// Count how many newline bytes ('\n') are in the file.
	int CountLines(string filename) { ... }

  用例項說明邊界情況

	// Rearrange 'v' so that elements < Pivot come before those >= Pivot;
	// Then return the largest 'i' for which v[i] < Pivot (or -1 if none are < pivot)
	int Partition(vector<int>* v, int pivot);

  這個描述很精確,但是如果再加入一個例子,就更好了:

	// ...
	// Example: Partition([8 5 9 8 2], 8) might result in [5 2 | 8 9 8] and return 1
	int Partition(vector<int>* v, int pivot);

  說明你的程式碼的真正目的

	void DisplayProducts(list<Product> products) {
	 products.sort(CompareProductByPrice);
	 // Iterate through the list in reverse order
	 for (list<Product>::reverse_iterator it = products.rbegin(); it != products.rend();
	 ++it)
	 DisplayPrice(it->price);
	 ... 
	}

  這裡的註釋說明了倒序排列,單還不夠準確,應該改成這樣:

	// Display each price, from highest to lowest
	for (list<Product>::reverse_iterator it = products.rbegin(); ... )

  函式呼叫時的註釋

  看見這樣的一個函式呼叫,肯定會一頭霧水:

	Connect(10, false);

  如果加上這樣的註釋,讀起來就清楚多了:

	def Connect(timeout, use_encryption): ...

	# Call the function using named parameters
	Connect(timeout = 10, use_encryption = False)

  使用資訊含量豐富的詞

	// This class contains a number of members that store the same information as in the
	// database, but are stored here for speed. When this class is read from later, those
	// members are checked first to see if they exist, and if so are returned; otherwise the
	// database is read from and that data stored in those fields for next time.

  上面這一大段註釋,解釋的很清楚,如果換一個詞來代替,也不會有什麼疑惑:

	// This class acts as a caching layer to the database.

 簡化迴圈和邏輯

  流程控制要簡單

  讓條件語句、迴圈以及其他控制流程的程式碼儘可能自然,讓讀者在閱讀過程中不需要停頓思考或者在回頭查詢,是這一節的目的。

  條件語句中引數的位置

  對比下面兩種條件的寫法:

	if (length >= 10)
	while (bytes_received < bytes_expected)

	if (10 <= length)
	while (bytes_expected > bytes_received)

  到底是應該按照大於小於的順序來呢,還是有其他的準則?是的,應該按照引數的意義來

  • 運算子左邊:通常是需要被檢查的變數,也就是會經常變化的
  • 運算子右邊:通常是被比對的樣本,一定程度上的常量

  這就解釋了為什麼bytes_received < bytes_expected比反過來更好理解。

  if/else的順序

  通常,if/else的順序你可以自由選擇,下面這兩種都可以:

	if (a == b) {
	 // Case One ...
	} else {
	 // Case Two ...
	}

	if (a != b) {
	 // Case Two ...
	} else {
	 // Case One ...
	}

  或許對此你也沒有仔細斟酌過,但在有些時候,一種順序確實好過另一種:

  • 正向的邏輯在前,比如if(debug)就比if(!debug)好
  • 簡單邏輯的在前,這樣if和else就可以在一個螢幕顯示 – 有趣、清晰的邏輯在前

  舉個例子來看:

	if (!url.HasQueryParameter("expand_all")) {
	 response.Render(items);
	 ...
	} else {
	 for (int i = 0; i < items.size(); i++) {
	 items[i].Expand();
	 }
	 ... 
	}

  看到if你首先想到的是expand_all,就好像告訴你“不要想大象”,你會忍不住去想它,所以產生了一點點迷惑,最好寫成:

	if (url.HasQueryParameter("expand_all")) {
	 for (int i = 0; i < items.size(); i++) {
	 items[i].Expand();
	 }
	 ... 
	} else {
	 response.Render(items);
	 ... 
	}

  三目運算子(?:)

	time_str += (hour >= 12) ? "pm" : "am";

	Avoiding the ternary operator, you might write:
	 if (hour >= 12) {
	 time_str += "pm";
	 } else {
	 time_str += "am";
	}

 使用三目運算子可以減少程式碼行數,上例就是一個很好的例證,但是我們的真正目的是減少讀程式碼的時間,所以下面的情況並不適合用三目運算子:

	return exponent >= 0 ? mantissa * (1 << exponent) : mantissa / (1 << -exponent);

	if (exponent >= 0) {
	 return mantissa * (1 << exponent);
	} else {
	 return mantissa / (1 << -exponent);
	}

  所以只在簡單表示式的地方用。

  避免使用do/while表示式

	do {
	 continue;
	} while (false);

  這段程式碼會執行幾遍呢,需要時間思考一下,do/while完全可以用別的方法代替,所以應避免使用。

  儘早return

	public boolean Contains(String str, String substr) {
	 if (str == null || substr == null) return false;
	 if (substr.equals("")) return true;
	 ...
	}

  函式裡面儘早的return,可以讓邏輯更加清晰。

  減少巢狀

	if (user_result == SUCCESS) {
	 if (permission_result != SUCCESS) {
	 reply.WriteErrors("error reading permissions");
	 reply.Done();
	 return;
	 }
	 reply.WriteErrors("");
	} else {
	 reply.WriteErrors(user_result);
	}
	reply.Done();

  這樣一段程式碼,有一層的巢狀,但是看起來也會稍有迷惑,想想自己的程式碼,有沒有類似的情況呢?可以換個思路去考慮這段程式碼,並且用盡早return的原則修改,看起來就舒服很多:

	if (user_result != SUCCESS) {
	 reply.WriteErrors(user_result);
	 reply.Done();
	 return;
	}
	if (permission_result != SUCCESS) {
	 reply.WriteErrors(permission_result);
	 reply.Done();
	 return;
	}
	reply.WriteErrors("");
	reply.Done();

  同樣的,對於有巢狀的迴圈,可以採用同樣的辦法:

	for (int i = 0; i < results.size(); i++) {
	 if (results[i] != NULL) {
	 non_null_count++;
	 if (results[i]->name != "") {
	 cout << "Considering candidate..." << endl;
	 ...
	 }
	 }
	}

  換一種寫法,儘早return,在迴圈中就用continue:

	for (int i = 0; i < results.size(); i++) {
	 if (results[i] == NULL) continue;
	 non_null_count++;

	 if (results[i]->name == "") continue;
	 cout << "Considering candidate..." << endl;
	 ... 
	}

  拆分複雜表示式

  很顯然的,越複雜的表示式,讀起來越費勁,所以應該把那些複雜而龐大的表示式,拆分成一個個易於理解的小式子。

  用變數

  將複雜表示式拆分最簡單的辦法,就是增加一個變數:

	if line.split(':')[0].strip() == "root":

	//用變數替換
	username = line.split(':')[0].strip() 
	if username == "root":
	 ...

  或者這個例子:

	if (request.user.id == document.owner_id) {
	 // user can edit this document...
	}
	...
	if (request.user.id != document.owner_id) {
	// document is read-only...
	}

	//用變數替換
	final boolean user_owns_document = (request.user.id == document.owner_id);
	if (user_owns_document) {
	 // user can edit this document...
	}
	...
	if (!user_owns_document) {
	 // document is read-only...
	}

  邏輯替換

  • 1) not (a or b or c) <–> (not a) and (not b) and (not c)
  • 2) not (a and b and c) <–> (not a) or (not b) or (not c)

  所以,就可以這樣寫:

	if (!(file_exists && !is_protected)) Error("Sorry, could not read file.");

	//替換
	if (!file_exists || is_protected) Error("Sorry, could not read file.");

  不要濫用邏輯表示式

	assert((!(bucket = FindBucket(key))) || !bucket->IsOccupied());

  這樣的程式碼完全可以用下面這個替換,雖然有兩行,但是更易懂:

	bucket = FindBucket(key);
	if (bucket != NULL) assert(!bucket->IsOccupied());

  像下面這樣的表示式,最好也不要寫,因為在有些語言中,x會被賦予第一個為true的變數的值:

	x = a || b || c

  拆解大表示式

	var update_highlight = function (message_num) {
	 if ($("#vote_value" + message_num).html() === "Up") {
	 $("#thumbs_up" + message_num).addClass("highlighted");
	 $("#thumbs_down" + message_num).removeClass("highlighted");
	 } else if ($("#vote_value" + message_num).html() === "Down") {
	 $("#thumbs_up" + message_num).removeClass("highlighted");
	 $("#thumbs_down" + message_num).addClass("highlighted");
	 } else {
	 $("#thumbs_up" + message_num).removeClass("highighted");
	 $("#thumbs_down" + message_num).removeClass("highlighted");
	 }
	};

  這裡面有很多重複的語句,我們可以用變數還替換簡化:

	var update_highlight = function (message_num) {
	 var thumbs_up = $("#thumbs_up" + message_num);
	 var thumbs_down = $("#thumbs_down" + message_num);
	 var vote_value = $("#vote_value" + message_num).html();
	 var hi = "highlighted";

	 if (vote_value === "Up") {
	 thumbs_up.addClass(hi);
	 thumbs_down.removeClass(hi);
	 } else if (vote_value === "Down") {
	 thumbs_up.removeClass(hi);
	 thumbs_down.addClass(hi);
	 } else {
	 thumbs_up.removeClass(hi);
	 thumbs_down.removeClass(hi);
	 }
	}

  變數與可讀性

  消除變數

  前一節,講到利用變數來拆解大表示式,這一節來討論如何消除多餘的變數。

  沒用的臨時變數

	now = datetime.datetime.now()
	root_message.last_view_time = now

  這裡的now可以去掉,因為:

  • 並非用來拆分複雜的表示式
  • 也沒有增加可讀性,因為`datetime.datetime.now()`本就清晰
  • 只用了一次

  所以完全可以寫作:

	root_message.last_view_time = datetime.datetime.now()

  消除條件控制變數

	boolean done = false;
	while (/* condition */ && !done) {
	 ...
	 if (...) {
	 done = true;
	 continue; 
	 }
	}

  這裡的done可以用別的方式更好的完成:

	while (/* condition */) {
	 ...
	 if (...) {
	 break;
	 } 
	}

  這個例子非常容易修改,如果是比較複雜的巢狀,break可能並不夠用,這時候就可以把程式碼封裝到函式中。

  減少變數的作用域

  我們都聽過要避免使用全域性變數這樣的忠告,是的,當變數的作用域越大,就越難追蹤,所以要保持變數小的作用域。

	class LargeClass {
	 string str_;
	 void Method1() {
	 str_ = ...;
	 Method2();
	 }
	 void Method2() {
	 // Uses str_
	 }
	 // Lots of other methods that don't use str_ 
	 ... ;
	}

  這裡的str_的作用域有些大,完全可以換一種方式:

	class LargeClass {
	 void Method1() {
	 string str = ...;
	 Method2(str); 
	 }
	 void Method2(string str) {
	 // Uses str
	 }
	 // Now other methods can't see str.
	};

  將str通過變數函式引數傳遞,減小了作用域,也更易讀。同樣的道理也可以用在定義類的時候,將大類拆分成一個個小類。

  不要使用巢狀的作用域

	# No use of example_value up to this point.
	if request:
	 for value in request.values:
	 if value > 0:
	 example_value = value 
	 break

	for logger in debug.loggers:
	 logger.log("Example:", example_value)

  這個例子在執行時候會報example_value is undefined的錯,修改起來不算難:

	example_value = None
	if request:
	 for value in request.values:
	 if value > 0: example_value = value 
	 break

	if example_value:
	 for logger in debug.loggers:
	 logger.log("Example:", example_value)

  但是參考前面的消除中間變數準則,還有更好的辦法:

	def LogExample(value):
	 for logger in debug.loggers:
	 logger.log("Example:", value)

	 if request:
	 for value in request.values:
	 if value > 0:
	 LogExample(value) # deal with 'value' immediately
	 break

  用到了再宣告

  在C語言中,要求將所有的變數事先宣告,這樣當用到變數較多時候,讀者處理這些資訊就會有難度,所以一開始沒用到的變數,就暫緩宣告:

	def ViewFilteredReplies(original_id):
	 filtered_replies = []
	 root_message = Messages.objects.get(original_id) 
	 all_replies = Messages.objects.select(root_id=original_id)
	 root_message.view_count += 1
	 root_message.last_view_time = datetime.datetime.now()
	 root_message.save()

	 for reply in all_replies:
	 if reply.spam_votes <= MAX_SPAM_VOTES:
	 filtered_replies.append(reply)

	 return filtered_replies

  讀者一次處理變數太多,可以暫緩宣告:

	def ViewFilteredReplies(original_id):
	 root_message = Messages.objects.get(original_id)
	 root_message.view_count += 1
	 root_message.last_view_time = datetime.datetime.now()
	 root_message.save()

	 all_replies = Messages.objects.select(root_id=original_id) 
	 filtered_replies = []
	 for reply in all_replies:
	 if reply.spam_votes <= MAX_SPAM_VOTES:
	 filtered_replies.append(reply)

	 return filtered_replies

  變數最好只寫一次

  前面討論了過多的變數會讓讀者迷惑,同一個變數,不停的被賦值也會讓讀者頭暈,如果變數變化的次數少一些,程式碼可讀性就更強。

  一個例子

  假設有一個頁面,如下,需要給第一個空的input賦值:

	<input type="text" id="input1" value="Dustin">
	<input type="text" id="input2" value="Trevor">
	<input type="text" id="input3" value="">
	<input type="text" id="input4" value="Melissa">
	...
	var setFirstEmptyInput = function (new_value) {
	 var found = false;
	 var i = 1;
	 var elem = document.getElementById('input' + i);
	 while (elem !== null) {
	 if (elem.value === '') {
	 found = true;
	 break; 
	 }
	 i++;
	 elem = document.getElementById('input' + i);
	 }
	 if (found) elem.value = new_value;
	 return elem;
	};

  這段程式碼能工作,有三個變數,我們逐一去看如何優化,found作為中間變數,完全可以消除:

	var setFirstEmptyInput = function (new_value) {
	 var i = 1;
	 var elem = document.getElementById('input' + i);
	 while (elem !== null) {
	 if (elem.value === '') {
	 elem.value = new_value;
	 return elem;
	 }
	 i++;
	 elem = document.getElementById('input' + i);
	 }
	 return null;
	};

  再來看elem變數,只用來做迴圈,呼叫了很多次,所以很難跟蹤他的值,i也可以用for來修改:

	var setFirstEmptyInput = function (new_value) {
	 for (var i = 1; true; i++) {
	 var elem = document.getElementById('input' + i);
	 if (elem === null)
	 return null; // Search Failed. No empty input found.
	 if (elem.value === '') {
	 elem.value = new_value;
	 return elem;
	 }
	 }
	};

 重新組織你的程式碼

  分離不相關的子問題

  工程師就是將大問題分解為一個個小問題,然後逐個解決,這樣也易於保證程式的健壯性、可讀性。如何分解子問題,下面給出一些準則:

  • 看看這個方法或程式碼,問問你自己“這段程式碼的最終目標是什麼?”
  • 對於每一行程式碼,要問“它與目標直接相關,或者是不相關的子問題?”
  • 如果有足夠多行的程式碼是處理與目標不直接相關的問題,那麼抽離成子函式

  來看一個例子:

	ajax_post({
	 url: 'http://example.com/submit',
	 data: data,
	 on_success: function (response_data) {
	 var str = "{\n";
	 for (var key in response_data) {
	 str += " " + key + " = " + response_data[key] + "\n";
	 }
	 alert(str + "}");
	 // Continue handling 'response_data' ...
	 }
	});

  這段程式碼的目標是傳送一個ajax請求,所以其中字串處理的部分就可以抽離出來:

	var format_pretty = function (obj) {
	 var str = "{\n";
	 for (var key in obj) {
	 str += " " + key + " = " + obj[key] + "\n";
	 }
	 return str + "}";
	};

  意外收穫

  有很多理由將format_pretty抽離出來,這些獨立的函式可以很容易的新增feature,增強可靠性,處理邊界情況,等等。所以這裡,可以將format_pretty增強,就會得到一個更強大的函式:

	var format_pretty = function (obj, indent) {
	 // Handle null, undefined, strings, and non-objects.
	 if (obj === null) return "null";
	 if (obj === undefined) return "undefined";
	 if (typeof obj === "string") return '"' + obj + '"';
	 if (typeof obj !== "object") return String(obj);
	 if (indent === undefined) indent = "";

	 // Handle (non-null) objects.

	 var str = "{\n";
	 for (var key in obj) {
	 str += indent + " " + key + " = ";
	 str += format_pretty(obj[key], indent + " ") + "\n"; }
	 return str + indent + "}";
	};

  這個函式輸出:

	{
	 key1 = 1
	 key2 = true
	 key3 = undefined
	 key4 = null
	 key5 = {
	 key5a = {
	 key5a1 = "hello world"
	 }
	 }
	}

  多做這樣的事情,就是積累程式碼的過程,這樣的程式碼可以複用,也可以形成自己的程式碼庫,或者分享給別人。

  業務相關的函式

  那些與目標不相關函式,抽離出來可以複用,與業務相關的也可以抽出來,保持程式碼的易讀性,例如:

	business = Business()
	business.name = request.POST["name"]

	url_path_name = business.name.lower()
	url_path_name = re.sub(r"['\.]", "", url_path_name) 
	url_path_name = re.sub(r"[^a-z0-9]+", "-", url_path_name) 
	url_path_name = url_path_name.strip("-")
	business.url = "/biz/" + url_path_name

	business.date_created = datetime.datetime.utcnow() 
	business.save_to_database()

  抽離出來,就好看很多:

	CHARS_TO_REMOVE = re.compile(r"['\.']+")
	CHARS_TO_DASH = re.compile(r"[^a-z0-9]+")

	def make_url_friendly(text):
	 text = text.lower()
	 text = CHARS_TO_REMOVE.sub('', text) 
	 text = CHARS_TO_DASH.sub('-', text) 
	 return text.strip("-")

	business = Business()
	business.name = request.POST["name"]
	business.url = "/biz/" + make_url_friendly(business.name) 
	business.date_created = datetime.datetime.utcnow() 
	business.save_to_database()

  簡化現有介面

  我們來看一個讀寫cookie的函式:

	var max_results;
	var cookies = document.cookie.split(';');
	for (var i = 0; i < cookies.length; i++) {
	 var c = cookies[i];
	 c = c.replace(/^[ ]+/, ''); // remove leading spaces
	 if (c.indexOf("max_results=") === 0)
	 max_results = Number(c.substring(12, c.length));
	}

 這段程式碼實在太醜了,理想的介面應該是這樣的:

	set_cookie(name, value, days_to_expire);
	delete_cookie(name);

  對於並不理想的介面,你永遠可以用自己的函式做封裝,讓介面更好用。

  按自己需要寫介面

	ser_info = { "username": "...", "password": "..." }
	user_str = json.dumps(user_info)
	cipher = Cipher("aes_128_cbc", key=PRIVATE_KEY, init_vector=INIT_VECTOR, op=ENCODE)
	encrypted_bytes = cipher.update(user_str)
	encrypted_bytes += cipher.final() # flush out the current 128 bit block
	url = "http://example.com/?user_info=" + base64.urlsafe_b64encode(encrypted_bytes)
	...

  雖然終極目的是拼接使用者資訊的字元,但是程式碼大部分做的事情是解析python的object,所以:

	def url_safe_encrypt(obj):
	 obj_str = json.dumps(obj)
	 cipher = Cipher("aes_128_cbc", key=PRIVATE_KEY, init_vector=INIT_VECTOR, op=ENCODE) encrypted_bytes = cipher.update(obj_str)
	 encrypted_bytes += cipher.final() # flush out the current 128 bit block
	 return base64.urlsafe_b64encode(encrypted_bytes)

  這樣在其他地方也可以呼叫:

	user_info = { "username": "...", "password": "..." }
	url = "http://example.com/?user_info=" + url_safe_encrypt(user_info)

  分離子函式是好習慣,但是也要適度,過度的分離成多個小函式,也會讓查詢變得困難。

  單任務

  程式碼應該是一次只完成一個任務

	var place = location_info["LocalityName"]; // e.g. "Santa Monica"
	if (!place) {
	 place = location_info["SubAdministrativeAreaName"]; // e.g. "Los Angeles"
	}
	if (!place) {
	 place = location_info["AdministrativeAreaName"]; // e.g. "California"
	}
	if (!place) {
	 place = "Middle-of-Nowhere";
	}
	if (location_info["CountryName"]) {
	 place += ", " + location_info["CountryName"]; // e.g. "USA"
	} else {
	 place += ", Planet Earth";
	}

	return place;

  這是一個用來拼地名的函式,有很多的條件判斷,讀起來非常吃力,有沒有辦法拆解任務呢?

	var town = location_info["LocalityName"]; // e.g. "Santa Monica"
	var city = location_info["SubAdministrativeAreaName"]; // e.g. "Los Angeles"
	var state = location_info["AdministrativeAreaName"]; // e.g. "CA"
	var country = location_info["CountryName"]; // e.g. "USA"

  先拆解第一個任務,將各變數分別儲存,這樣在後面使用中不需要去記憶那些繁長的key值了,第二個任務,解決地址拼接的後半部分:

	// Start with the default, and keep overwriting with the most specific value. var second_half = "Planet Earth";
	if (country) {
	 second_half = country; 
	}
	if (state && country === "USA") {
	 second_half = state; 
	}

  再來解決前半部分:

	var first_half = "Middle-of-Nowhere";
	if (state && country !== "USA") {
	 first_half = state; 
	}
	if (city) {
	 first_half = city;
	}
	if (town) {
	 first_half = town; 
	}

  大功告成:

	return first_half + ", " + second_half;

  如果注意到有USA這個變數的判斷的話,也可以這樣寫:

	var first_half, second_half;
	if (country === "USA") {
	 first_half = town || city || "Middle-of-Nowhere";
	 second_half = state || "USA";
	} else {
	 first_half = town || city || state || "Middle-of-Nowhere";
	 second_half = country || "Planet Earth";
	}
	return first_half + ", " + second_half;

  把想法轉換成程式碼

  要把一個複雜的東西解釋給別人,一些細節很容易就讓人產生迷惑,所以想象把你的程式碼用平實的語言解釋給別人聽,別人是否能懂,有一些準則可以幫助你讓程式碼更清晰:

  • 用最平實的語言描述程式碼的目的,就像給讀者講述一樣
  • 注意描述中關鍵的字詞
  • 讓你的程式碼符合你的描述

  下面這段程式碼用來校驗使用者的許可權:

	$is_admin = is_admin_request();
	if ($document) {
	 if (!$is_admin && ($document['username'] != $_SESSION['username'])) {
	 return not_authorized();
	 }
	} else {
	 if (!$is_admin) {
	 return not_authorized();
	 } 
	}
	// continue rendering the page ...

  這一段程式碼不長,裡面的邏輯巢狀倒是複雜,參考前面章節所述,巢狀太多非常影響閱讀理解,將這個邏輯用語言描述就是:

	有兩種情況有許可權:
	1、你是管理員(admin)
	2、你擁有這個文件
	否則就沒有許可權

  根據描述來寫程式碼:

	if (is_admin_request()) {
	 // authorized
	} elseif ($document && ($document['username'] == $_SESSION['username'])) {
	 // authorized
	} else {
	 return not_authorized();
	}
	// continue rendering the page ...

  寫更少的程式碼

  最易懂的程式碼就是沒有程式碼!

  • 去掉那些沒意義的feature,也不要過度設計
  • 重新考慮需求,解決最簡單的問題,也能完成整體的目標
  • 熟悉你常用的庫,週期性研究他的API

 最後

  還有一些與測試相關的章節,留給你自己去研讀吧,再次推薦此書:

相關文章