Python演算法：分治法

發表於2015-05-20

本節主要介紹分治法策略，提到了樹形問題的平衡性以及基於分治策略的排序演算法

本節的標題寫全了就是：divide the problem instance, solve subproblems recursively, combine the results, and thereby conquer the problem

簡言之就是將原問題劃分成幾個小問題，然後遞迴地解決這些小問題，最後綜合它們的解得到問題的解。分治法的思想我想大家都已經很清楚了，所以我就不過多地介紹它了，下面摘錄些原書中的重點內容。

1.平衡性是樹形問題的關鍵

如果我們將子問題看做節點，將問題之間的依賴關係(dependencies or reductions)看做邊，那麼我們就得到了子問題圖(subproblem graph )，最簡單的子問題圖就是樹形結構問題，例如我們之前提到過的遞迴樹的形式。也許子問題之間有依賴關係，但是對於每個子問題我們都是可以獨立求解的，根據我們前面學的內容，只要我們能夠找到合適的規約，我們就可以直接使用遞迴形式的演算法將這個問題解決。[至於子問題間有重疊的話我們後面會詳細介紹動態規劃的方法來解決這類問題，這裡我們不考慮]

前面我們學的內容已經完全足夠我們理解分治法了，第3節的Divide-and-conquer recurrences，第4節的Strong induction，還有第5節的Recursive traversal

The recurrences tell you something about the performance involved, the induction gives you a tool for understanding how the algorithms work, and the recursive traversal (DFS in trees) is a raw skeleton for the algorithms.

但是，我們前面介紹Induction時總是從 n-1 到 n，這節我們要考慮平衡性，我們希望從 n/2 到 n，也就是說我們假設我們能夠解決規模為原問題一半的子問題。

假設對於同一個問題，我們有下面兩個解決方案，哪個方案更好些呢？

(1)T(n)=T(n-1)+T(1)+n

(2)T(n)=2T(n/2)+n

如果從時間複雜度來評價的話，前者是O(n2)的，而後者是O(nlgn)的，所以是後者更好些。下圖以遞迴樹的形式顯示了兩種方案的不同

2.典型的分治法

下面是典型分治法的虛擬碼，很容易理解對吧

# Pseudocode(ish)
def divide_and_conquer(S, divide, combine):
    if len(S) == 1: return S
    L, R = divide(S)
    A = divide_and_conquer(L, divide, combine)
    B = divide_and_conquer(R, divide, combine)
    return combine(A, B)

# Pseudocode(ish)

def divide_and_conquer(S, divide, combine):

if len(S) == 1: return S

L, R = divide(S)

A = divide_and_conquer(L, divide, combine)

B = divide_and_conquer(R, divide, combine)

return combine(A, B)

用圖形來表示如下，上面部分是分(division)，下面部分是合(combination)

二分查詢是最常用的採用分治策略的演算法，我們經常使用的版本控制系統(evision control systems=RCSs)查詢程式碼中發生某個變化是在哪個版本時採用的正是二分查詢策略。

Python中bisect模組也正是利用了二分查詢策略，其中方法bisect的作用是返回要找到元素的位置，bisect_left是其左邊的那個位置，而bisect_right和bisect的作用是一樣的，函式insort也是這樣設計的。

from bisect import bisect
a = [0, 2, 3, 5, 6, 7, 8, 8, 9]
print bisect(a, 5) #4
from bisect import bisect_left, bisect_right
print bisect_left(a, 5) #3
print bisect_right(a, 5) #4

from bisect import bisect

a = [0, 2, 3, 5, 6, 7, 8, 8, 9]

print bisect(a, 5) #4

from bisect import bisect_left, bisect_right

print bisect_left(a, 5) #3

print bisect_right(a, 5) #4

二分查詢策略很好，但是它有個前提，序列必須是有序的才可以這樣做，為了高效地得到中間位置的元素，於是就有了二叉搜尋樹，這個我們在資料結構篇中已經詳細介紹過了，下面給出一份完整的二叉搜尋樹的實現，不過多介紹了。

class Node:
    lft = None
    rgt = None
    def __init__(self, key, val):
        self.key = key
        self.val = val

def insert(node, key, val):
    if node is None: return Node(key, val)      # Empty leaf: Add node here
    if node.key == key: node.val = val          # Found key: Replace val
    elif key < node.key:                        # Less than the key?
        node.lft = insert(node.lft, key, val)   # Go left
    else:                                       # Otherwise...
        node.rgt = insert(node.rgt, key, val)   # Go right
    return node

def search(node, key):
    if node is None: raise KeyError             # Empty leaf: It's not here
    if node.key == key: return node.val         # Found key: Return val
    elif key < node.key:                        # Less than the key?
        return search(node.lft, key)            # Go left
    else:                                       # Otherwise...
        return search(node.rgt, key)            # Go right

class Tree:                                     # Simple wrapper
    root = None
    def __setitem__(self, key, val):
        self.root = insert(self.root, key, val)
    def __getitem__(self, key):
        return search(self.root, key)
    def __contains__(self, key):
        try: search(self.root, key)
        except KeyError: return False
        return True

class Node:

lft = None

rgt = None

def __init__(self, key, val):

self.key = key

self.val = val

def insert(node, key, val):

if node is None: return Node(key, val) # Empty leaf: Add node here

if node.key == key: node.val = val # Found key: Replace val

elif key < node.key: # Less than the key?

node.lft = insert(node.lft, key, val) # Go left

else: # Otherwise...

node.rgt = insert(node.rgt, key, val) # Go right

return node

def search(node, key):

if node is None: raise KeyError # Empty leaf: It's not here

if node.key == key: return node.val # Found key: Return val

elif key < node.key: # Less than the key?

return search(node.lft, key) # Go left

else: # Otherwise...

return search(node.rgt, key) # Go right

class Tree: # Simple wrapper

root = None

def __setitem__(self, key, val):

self.root = insert(self.root, key, val)

def __getitem__(self, key):

return search(self.root, key)

def __contains__(self, key):

try: search(self.root, key)

except KeyError: return False

return True

比較：二分法，二叉搜尋樹，字典

三者都是用來提高搜尋效率的，但是各有區別。二分法只能作用於有序陣列(例如排序後的Python的list)，但是有序陣列較難維護，因為插入需要線性時間；二叉搜尋樹有些複雜，動態變化著，但是插入和刪除效率高了些；字典的效率相比而言就比較好了，插入刪除操作的平均時間都是常數的，只不過它還需要計算下hash值才能確定元素的位置。

3.順序統計量

在演算法導論中一組序列中的第 k 大的元素定義為順序統計量

如果我們想要線上性時間內找到一組序列中的前 k 大的元素怎麼做呢？很顯然，如果這組序列中的數字範圍比較大的話，我們就不能使用線性排序演算法，而其他的基於比較的排序演算法的最好的平均時間複雜度(O(nlgn))都超過了線性時間，怎麼辦呢？

[擴充套件知識：在Python中如果泥需要求前 k 小或者前 k 大的元素，可以使用heapq模組中的nsmallest或者nlargest函式，如果 k 很小的話這種方式會好些，但是如果 k 很大的話，不如直接去呼叫sort函式]

要想解決這個問題，我們還是要用分治法，採用類似快排中的partition將序列進行劃分(divide)，也就是說找一個主元(pivot)，然後用主元作為基準將序列分成兩部分，一部分小於主元，另一半大於主元，比較下主元最終的位置值和 k的大小關係，然後確定後面在哪個部分繼續進行劃分。如果這裡不理解的話請移步閱讀前面資料結構篇之排序中的快速排序

基於上面的想法就有了下面的實現，需要注意的是下面的partition函式不是就地劃分的喲

#A Straightforward Implementation of Partition and Select
def partition(seq):
    pi, seq = seq[0], seq[1:]                   # Pick and remove the pivot
    lo = [x for x in seq if x <= pi]            # All the small elements
    hi = [x for x in seq if x > pi]             # All the large ones
    return lo, pi, hi                           # pi is "in the right place"

def select(seq, k):
    lo, pi, hi = partition(seq)                 # [<= pi], pi, [> pi]
    m = len(lo)
    if m == k: return pi                        # We found the kth smallest
    elif m < k:                                 # Too far to the left
        return select(hi, k-m-1)                # Remember to adjust k
    else:                                       # Too far to the right
        return select(lo, k)                    # Just use original k here

seq = [3, 4, 1, 6, 3, 7, 9, 13, 93, 0, 100, 1, 2, 2, 3, 3, 2]
print partition(seq) #([1, 3, 0, 1, 2, 2, 3, 3, 2], 3, [4, 6, 7, 9, 13, 93, 100])
print select([5, 3, 2, 7, 1], 3) #5
print select([5, 3, 2, 7, 1], 4) #7
ans = [select(seq, k) for k in range(len(seq))]
seq.sort()
print ans == seq #True

#A Straightforward Implementation of Partition and Select

def partition(seq):

pi, seq = seq[0], seq[1:] # Pick and remove the pivot

lo = [x for x in seq if x <= pi] # All the small elements

hi = [x for x in seq if x > pi] # All the large ones

return lo, pi, hi # pi is "in the right place"

def select(seq, k):

lo, pi, hi = partition(seq) # [<= pi], pi, [> pi]

m = len(lo)

if m == k: return pi # We found the kth smallest

elif m < k: # Too far to the left

return select(hi, k-m-1) # Remember to adjust k

else: # Too far to the right

return select(lo, k) # Just use original k here

seq = [3, 4, 1, 6, 3, 7, 9, 13, 93, 0, 100, 1, 2, 2, 3, 3, 2]

print partition(seq) #([1, 3, 0, 1, 2, 2, 3, 3, 2], 3, [4, 6, 7, 9, 13, 93, 100])

print select([5, 3, 2, 7, 1], 3) #5

print select([5, 3, 2, 7, 1], 4) #7

ans = [select(seq, k) for k in range(len(seq))]

seq.sort()

print ans == seq #True

細讀上面的程式碼發現主元預設就是第一個元素，你也許會想這麼選科學嗎？事實證明這種隨機選擇的期望執行時間的確是線性的，但是如果每次都選擇的不好，導致劃分的時候每次都特別不平衡將會導致執行時間變成平方時間，那有沒有什麼選主元的辦法能夠保證演算法的執行時間是線性的？的確有！但是比較麻煩，實際使用的並不多，感興趣可以看下面的內容

[我還未完全理解，演算法導論上也有相應的介紹，感興趣不妨去閱讀下]

It turns out guaranteeing that the pivot is even a small percentage into the sequence (that is, not at either end, or a constant number of steps from it) is enough for the running time to be linear. In 1973, a group of algorists (Blum, Floyd, Pratt, Rivest, and Tarjan) came up with a version of the algorithm that gives exactly this kind of guarantee.

The algorithm is a bit involved, but the core idea is simple enough: first divide the sequence into groups of five (or some other small constant). Find the median in each, using (for example) a simple sorting algorithm. So far, we’ve used only linear time. Now, find the median among these medians, using the linear selection algorithm recursively. (This will work, because the number of medians is smaller than the size of the original sequence—still a bit mind-bending.) The resulting value is a pivot that is guaranteed to be good enough to avoid the degenerate recursion—use it as a pivot in your selection.

In other words, the algorithm is used recursively in two ways: first, on the sequence of medians, to find a good pivot, and second, on the original sequence, using this pivot.

While the algorithm is important to know about for theoretical reasons (because it means selection can be done in guaranteed linear time), you’ll probably never actually use it in practice.

3.二分排序

前面我們介紹了二分查詢，下面看看如何進行二分排序，這裡不再詳細介紹快排和合並排序的思想了，如果不理解的話請移步閱讀前面資料結構篇之排序

利用前面的partition函式快排程式碼呼之欲出

def quicksort(seq):
    if len(seq) <= 1: return seq                # Base case
    lo, pi, hi = partition(seq)                 # pi is in its place
    return quicksort(lo) + [pi] + quicksort(hi) # Sort lo and hi separately

seq = [7, 5, 0, 6, 3, 4, 1, 9, 8, 2]
print quicksort(seq) #[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

def quicksort(seq):

if len(seq) <= 1: return seq # Base case

lo, pi, hi = partition(seq) # pi is in its place

return quicksort(lo) + [pi] + quicksort(hi) # Sort lo and hi separately

seq = [7, 5, 0, 6, 3, 4, 1, 9, 8, 2]

print quicksort(seq) #[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

合併排序是更加典型的採用分治法策略來進行的排序，注意後半部分是比較誰大然後呼叫append函式，最後reverse一下，因為如果是比較誰小的話就要呼叫insert函式，它的效率不如append

# Mergesort, repeated from Chapter 3 (with some modifications)
def mergesort(seq):
    mid = len(seq)//2                           # Midpoint for division
    lft, rgt = seq[:mid], seq[mid:]
    if len(lft) > 1: lft = mergesort(lft)       # Sort by halves
    if len(rgt) > 1: rgt = mergesort(rgt)
    res = []
    while lft and rgt:                          # Neither half is empty
        if lft[-1] >= rgt[-1]:                  # lft has greatest last value
            res.append(lft.pop())               # Append it
        else:                                   # rgt has greatest last value
            res.append(rgt.pop())               # Append it
    res.reverse()                               # Result is backward
    return (lft or rgt) + res                   # Also add the remainder

# Mergesort, repeated from Chapter 3 (with some modifications)

def mergesort(seq):

mid = len(seq)//2 # Midpoint for division

lft, rgt = seq[:mid], seq[mid:]

if len(lft) > 1: lft = mergesort(lft) # Sort by halves

if len(rgt) > 1: rgt = mergesort(rgt)

res = []

while lft and rgt: # Neither half is empty

if lft[-1] >= rgt[-1]: # lft has greatest last value

res.append(lft.pop()) # Append it

else: # rgt has greatest last value

res.append(rgt.pop()) # Append it

res.reverse() # Result is backward

return (lft or rgt) + res # Also add the remainder

[擴充套件知識：Python內建的排序演算法TimSort，看起來好複雜的樣子啊，我果斷只是略讀了一下下]

[章節最後作者介紹了一些關於樹平衡的內容，提到2-3樹，我對樹平衡不是特別感興趣，也不是很明白，所以跳過不總結，感興趣的不妨閱讀下]

問題6-2. 三分查詢

Binary search divides the sequence into two approximately equal parts in each recursive step. Consider ternary search, which divides the sequence into three parts. What would its asymptotic complexity be? What can you say about the number of comparisons in binary and ternary search?

題目就是說讓我們分析下三分查詢的時間複雜度，和二分查詢進行下對比

The asymptotic running time would be the same. The number of comparison goes up, however. To see this, consider the recurrences B(n) = B(n/2) + 1 and T(n) = T(n/3) + 2 for binary and ternary search, respectively (with base cases B(1) = T(1) = 0 and B(2) = T(2) = 1). You can show (by induction) that B(n) < lg n + 1 < T(n).

分治法
2024-10-09
分治演算法
2021-09-09
演算法
演算法學習筆記-暴力搜尋和分治法
2018-07-04
演算法筆記
分治法求解問題
2023-10-29
演算法與資料結構基礎 - 分治法(Divide and Conquer)
2019-08-13
演算法資料結構IDE
分治演算法-骨牌鋪方格
2019-03-20
演算法
演算法學習-CDQ分治
2024-09-19
演算法
[演算法] 一些分治
2024-08-06
演算法
挖坑填數+分治法：快速排序
2020-10-18
排序
分治法演算法學習（一）——歸併排序、求最大子陣列和
2020-12-24
演算法排序陣列
分治演算法-眾數問題
2019-03-20
演算法
從分治演算法到 Hadoop MapReduce
2018-11-23
演算法Hadoop
遞迴 & 分治演算法深度理解
2020-09-01
遞迴演算法
【演算法】分治四步走
2021-03-26
演算法
遞迴與分治演算法練習
2020-03-09
遞迴演算法
【位操作筆記】位計數演算法分治法統計 4 另外一個版本
2020-12-23
筆記演算法
樹分治 - 點分治
2024-05-04
Python 一網打盡＜排序演算法＞之從希爾排序演算法的分治哲學開始
2022-04-16
Python排序演算法
五大常用演算法：一文搞懂分治演算法
2020-12-03
演算法
【五大常用演算法】一文搞懂分治演算法
2021-01-12
演算法
分治演算法基本原理和實踐
2020-08-09
演算法
分治演算法-求解棋盤覆蓋問題
2020-12-27
演算法
搞定面試演算法系列 —— 分治演算法三步走
2019-11-14
面試演算法
資料結構 8 基礎排序演算法詳解、快速排序的實現、瞭解分治法
2020-05-26
資料結構排序演算法
分治
2024-09-08
【20190326】【每天一道演算法題】求眾數（分治演算法）
2019-03-27
演算法
網球迴圈賽思路 - 分治法求解(無程式碼)
2021-03-13
Note - 樹分治（點分治、點分樹）
2024-08-16
洛谷題單演算法2-3 分治與倍增
2024-12-02
演算法
《演算法》系列—大白話聊分治、回溯，手撕八皇后
2021-01-28
演算法
演算法：利用分治演算法求解N個元素中的第M大元素
2020-10-25
演算法
點分治
2024-08-20
分治合集
2024-10-16
CDQ分治
2024-12-10
從零開始學資料結構和演算法 (五) 分治法 (二分查詢、快速排序、歸併排序)
2019-03-22
資料結構演算法排序
分治—快速排序
2020-10-11
排序
歸併分治
2024-12-06
根號分治
2024-06-09
「演算法思想」分治、動態規劃、回溯、貪心一鍋燉
2020-06-15
演算法動態規劃

Python演算法：分治法

相關文章