





1.匹配問題 matching problem (maximum-weight matching problem)


To be on the safe side, just let me emphasize that this greedy solution would not work in general, with an arbitrary set of weights. The distinct powers of two are key here.


In this case (or the bipartite case, for that matter), greed won’t work in general. However, by some freak coincidence, all the compatibility numbers happen to be distinct powers of two. Now, what happens?

Let’s first consider what a greedy algorithm would look like here and then see why it yields an optimal result. We’ll be building a solution piece by piece—let the pieces be pairs and a partial solution be a set of pairs. Such a partial solution is valid only if no person in it participates in two (or more) of its pairs. The algorithm will then be roughly as follows:

  1. List potential pairs, sorted by decreasing compatibility.
  2. Pick the first unused pair from the list.
  3. Is anyone in the pair already occupied? If so, discard it; otherwise, use it.
  4. Are there any more pairs on the list? If so, go to 2.

As you’ll see later, this is rather similar to Kruskal’s algorithm for minimum spanning trees (although that works regardless of the edge weights). It also is a rather prototypical greedy algorithm. Its correctness is another matter. Using distinct powers of two is sort of cheating, because it would make virtually any greedy algorithm work; that is, you’d get an optimal result as long as you could get a valid solution at all. Even though it’s cheating (see Exercise 7-3), it illustrates the central idea here: making the greedy choice is safe. Using the most compatible of the remaining couples will always be at least as good as any other choice.



There is, in fact, one classical matching problem that can be solved (sort of) greedily: the stable marriage problem. The idea is that each person in a group has preferences about whom he or she would like to marry. We’d like to see everyone married, and we’d like the marriages to be stable, meaning that there is no man who prefers a woman outside his marriage who also prefers him. (To keep things simple, we disregard same-sex marriages and polygamy here.)

There’s a simple algorithm for solving this problem, designed by David Gale and Lloyd Shapley. The formulation is quite gender-conservative but will certainly also work if the gender roles are reversed. The algorithm runs for a number of rounds, until there are no unengaged men left. Each round consists of two steps:

  1. Each unengaged man proposes to his favorite of the women he has not yet asked.
  2. Each woman is (provisionally) engaged to her favorite suitor and rejects the rest.

This can be viewed as greedy in that we consider only the available favorites (both of the men and women) right now. You might object that it’s only sort of greedy in that we don’t lock in and go straight for marriage; the women are allowed to break their engagement if a more interesting suitor comes along. Even so, once a man has been rejected, he has been rejected for good, which means that we’re guaranteed progress.

To show that this is an optimal and correct algorithm, we need to know that everyone gets married and that the marriages are stable. Once a woman is engaged, she stays engaged (although she may replace her fiancé). There is no way we can get stuck with an unmarried pair, because at some point the man would have proposed to the woman, and she would have (provisionally) accepted his proposal.

How do we know the marriages are stable? Let’s say Scarlett and Stuart are both married but not to each other. Is it possible they secretly prefer each other to their current spouses? No: if so, Stuart would already have proposed to her. If she accepted that proposal, she must later have found someone she liked better; if she rejected it, she would already have a preferable mate.

Although this problem may seem silly and trivial, it is not. For example, it is used for admission to some colleges and to allocate medical students to hospital jobs. There have, in fact, been written entire books (such as those by Donald Knuth and by Dan Gusfield and Robert W. Irwing) devoted to the problem and its variations.






There are two important cases of the integer knapsack problem—the bounded and unbounded cases. The bounded case assumes we have a fixed number of objects in each category,4 and the unbounded case lets us use as many as we want. Sadly, greed won’t work in either case. In fact, these are both unsolved problems, in the sense that no polynomial algorithms are known to solve them. There is hope, however. As you’ll see in the next chapter, we can use dynamic programming to solve the problems in pseudopolynomial time, which may be good enough in many important cases. Also, for the unbounded case, it turns out that the greedy approach ain’t half bad! Or, rather, it’s at least half good, meaning that we’ll never get less than half the optimum value. And with a slight modification, you can get as good results for the bounded version, too. This concept of greedy approximation is discussed in more detail in Chapter 11.




程式碼實現比較簡單,使用了heapq模組,樹結構是用list來儲存的,有意思的是其中zip函式的使用,其中統計函式count作為zip函式的引數,詳情見python docs

現在我們考慮另外一個問題,合併檔案問題,假設我們將大小為 m 和大小為 n 的兩個檔案合併在一起需要 m+n 的時間,現在給定一些檔案,求一個最優的合併策略使得所需要的時間最小。


consider how each leaf contributes to the sum over all nodes: the leaf weight occurs as a summand once in each of its ancestor nodes—which means that the sum is exactly the same! That is, sum(weight(node) for node in nodes) is exactly the same as sum(depth(leaf)*weight(leaf) for leaf in leaves).





[如果對最小生成樹問題的歷史感興趣的話作者推薦看這篇論文“On the History of the Minimum Spanning Tree Problem,” by Graham and Hell]







假設我們要考慮是否新增邊(u,v),一個最直接的想法就是遍歷已生成的樹,看是否能夠從 u 到 v,如果能,那麼就捨棄這條邊繼續考慮後面的邊,否則就新增這條邊。很顯然,採用遍歷的方式太費時了。

再假設我們用一個集合來儲存我們已經生成的樹中的節點,如果我們要考慮是否新增邊(u,v),那麼我們就看下集合中這兩個節點是否都存在,如果都存在的話說明這條邊加進來的話會形成環。這麼做可以在常數時間內確定是否會形成環,但是…它是錯誤的!除非我們每次新增一條邊之後得到的區域性解一直都只有一棵樹才對,如果之前加入的節點 u 和節點 v 在不同的分支上的話,上面的判斷不能確定新增這條邊之後會形成環![後面的Prim演算法採用的策略就能保證區域性解一直都是一棵樹]




首先,在合併(union)的時候我們讓“小”分支指向“大”分支,這樣平衡了之後平均查詢時間肯定有所下降,那麼怎麼確定分支的“大小”呢?這個可以用平衡樹的方式來思考,假設我們給每個節點都設定一個權重(rank or weight),其實重要的還是“代表”的權重,如果要合併的兩個分支的“代表”的權重相等的話,在將“小”分支指向“大”分支之後,還要將“大”分支的權重加1。

其次,在查詢(find)的時候我們一邊查詢一邊修正經過的點的指向,讓它直接指向“代表”,這個怎麼做到呢?使用遞迴就行了,因為遞迴在找到了之後會回溯,回溯的時候就可以設定其他節點的“代表”了,這個叫做path compression技術,是Kruskal演算法常用的一個技巧。


接下來就是Prim演算法了,它其實就是我們前面介紹的traversal演算法中的一種,不同點是它對待辦事項(to-do list,即前面提到的“邊緣節點”,也就是我們已經包含的這些節點能夠直接到達的那些節點)進行了一定的排序,我們在實現BFS時使用的是雙端佇列deque,此時我們只要把它改成一個優先佇列(priority queue)就行了,這裡選用heapq模組中的堆heap


• We’re using a priority queue, so if a node has been added multiple times, by the time we remove one of its entries, it will be the one with the lowest weight (at that time), which is the one we want.

• We make sure we don’t add the same node to our traversal tree more than once. This can be ensured by a constant-time membership check. Therefore, all but one of the queue entries for any given node will be discarded.

• The multiple additions won’t affect asymptotic running time

[重新新增一次權值減小了的節點就相當於是鬆弛(或者說是隱含了鬆弛操作在裡面),Re-adding a node with a lower weight is equivalent to a relaxation,這兩種方式是可以相互交換的,後面圖演算法中作者在實現Dijkstra演算法時使用的是relax,那其實我們還可以實現帶relex的Prim和不帶relax的Dijkstra]



In their historical overview of minimum spanning tree algorithms, Ronald L. Graham and Pavol Hell outline three algorithms that they consider especially important and that have played a central role in the history of the problem. The first two are the algorithms that are commonly attributed to Kruskal and Prim (although the second one was originally formulated by Vojtěch Jarník in 1930), while the third is the one initially described by Borůvka. Graham and Hell succinctly explain the algorithms as follows. A partial solution is a spanning forest, consisting of a set of fragments (components, trees). Initially, each node is a fragment. In each iteration, edges are added, joining fragments, until we have a spanning tree.

Algorithm 1: Add a shortest edge that joins two different fragments.

Algorithm 2: Add a shortest edge that joins the fragment containing the root to another fragment.

Algorithm 3: For every fragment, add the shortest edge that joins it to another fragment.

For algorithm 2, the root is chosen arbitrarily at the beginning. For algorithm 3, it is assumed that all edge weights are different to ensure that no cycles can occur. As you can see, all three algorithms are based on the same fundamental fact—that the shortest edge over a cut is safe. Also, in order to implement them efficiently, you need to be able to find shortest edges, detect whether two nodes belong to the same fragment, and so forth (as explained for algorithms 1 and 2 in the main text). Still, these brief explanations can be useful as a memory aid or to get the bird’s-eye perspective on what’s going on.

5.Greed Works. But When?



(1)Keeping Up with the Best

This is what Kleinberg and Tardos (in Algorithm Design) call staying ahead. The idea is to show that as you build your solution, one step at a time, the greedy algorithm will always have gotten at least as far as a hypothetical optimal algorithm would have. Once you reach the finish line, you’ve shown that greed is optimal.

(2)No Worse Than Perfect

This is a technique I used in showing the greedy choice property for Huffman’s algorithm. It involves showing that you can transform a hypothetical optimal solution to the greedy one, without reducing the quality. Kleinberg and Tardos call this an exchange argument.

(3)Staying Safe

This is where we started: to make sure a greedy algorithm is correct, we must make sure each greedy step along the way is safe. One way of doing this is the two-part approach of showing (1) the greedy choice property, that is, that a greedy choice is compatible with optimality, and (2) optimal substructure, that is, that the remaining subproblem is a smaller instance that must also be solved optimally.



