【論文考古】分散式優化 Communication Complexity of Convex Optimization

木坑發表於2022-02-27

J. N. Tsitsiklis and Z.-Q. Luo, “Communication complexity of convex optimization,” Journal of Complexity, vol. 3, no. 3, pp. 231–243, Sep. 1987, doi: 10.1016/0885-064x(87)90013-6.

問題描述

兩個使用者各自有一個凸函式\(f_i\),相互互動最少的二進位制訊息,從而找到\(f_i+f_2\)的最優點

基本定義

  • \(\mathscr{F}\):定義域\([0,1]^n\)上凸函式的一個集合

  • \(I(f;\epsilon)\in[0,1]^n\):定義域上,給定誤差\(\epsilon\)\(f\)最小值對應的自變數集合(\(f(x) \leq f(y)+\varepsilon, \forall y \in[0,1]^{n}\)

  • \(C(f_1,f_2;\epsilon,\pi)\):在協議\(\pi\)和精度\(\epsilon\)下,兩個函式通過交換資訊找到集合\(I\left(f_{1}+f_{2} ; \varepsilon\right)\)中元素所需的訊息數目

  • \(C(\mathscr{F} ; \varepsilon, \pi)\):該協議在最壞情況下找到目標所需交換的訊息數量

    \[C(\mathscr{F} ; \varepsilon, \pi)=\sup _{f_{1}, f_{2} \in \mathscr{F}} C\left(f_{1}, f_{2} ; \varepsilon, \pi\right) \]

  • \(C(\mathscr{F} ; \varepsilon)\):最優協議下所需的交換訊息的數量,又稱為\(\epsilon\)-communication complexity

    \[C(\mathscr{F} ; \varepsilon)=\inf _{\pi \in \mathrm{I}(\varepsilon)} C(\mathscr{F} ; \varepsilon, \pi) \]

  • 訊息傳輸的模式,通訊\(T\)

    • 每次傳播資訊的計算

      \[m_{i}(t)=M_{i, t}\left(f_{i}, m_{j}(0), \ldots, m_{j}(t-1)\right) \]

    • 最終最優點的確定

      \[x=Q\left(f_{1}, m_{2}(0), \ldots ., m_{2}(T-1)\right) \]

Straightforward Lower Bound

Lemma 1:\(\text { If } \mathscr{F} \subset \mathscr{G} \mathscr{\text { then }} C(\mathscr{F} ; \varepsilon) \leq C(\mathscr{G}; \varepsilon)\)

簡單函式所需傳輸的訊息數量更少

Proposition:\(C\left(\mathcal{F}_{Q} ; \varepsilon\right) \geq O(n(\log n+\log (1 / \varepsilon)))\)

其中\(\mathcal{F}_{Q}\)表示帶有\(f(x)=\|x-x^\star\|^2\)形式的二次函式的集合,其中\(x^\star\in [0,1]^n\)。根據Lemma知道,選擇最簡單的函式能找到下界。考慮\(f_1=0\),所以\(f_2\)的最小值需要控制在\(\epsilon^{1/2}\)的精度內,因此至少需要\(\left(A n / \varepsilon^{1 / 2}\right)^{B n}\)個半徑為\(\epsilon^{1/2}\)Euclidean ball來覆蓋中\([0,1]^n\)。因此最終\(Q\)的解集的勢至少就是\(\left(A n / \varepsilon^{1 / 2}\right)^{B n}\)。由於函式的值域的勢不會超過定義域的勢,所以\(Q\)的解集的勢不超過\(2^T\),也就有\(T \geq O(n(\log n+\log (1 / \varepsilon))\)

Naive Upper Bounds

The method of the centers of gravity (MCG) 在求解凸函式勢需要最小次數的梯度計算。將MCG方法擴充套件到了分散式的場景,得到上界。

一維下的最優演算法

演算法核心在於用訊息指示不同的計算步驟,而不是傳遞資料

演算法首先定義兩個區間,分別表示

  • \([a,b]\)\(f_1+f_2\)最優點所在的區間,\(x^\star \in [a,b]\)
  • \([c,d]\)\(f'(x^{\star})\)\(f'_1(\frac{a+b}{2})\)\(f'_2(\frac{a+b}{2})\)所在的區間

以區間\([c,d]\)為基準,分別計算訊息\(m_1,m_2\)

  • \(f'_1(\frac{a+b}{2})\in [c,\frac{c+d}{2}]\)\(m_1=0\),否則\(m_1=1\)
  • \(-f'_2(\frac{a+b}{2})\in[c,\frac{c+d}{2}]\)\(m_2=0\),否則\(m_2=1\)

根據訊息\(m_1,m_2\)的不同組合,分別縮減區間\([a,b]\)或者\([c,d]\)。縮減的設計總從兩個原則

  1. \((f_1+f_2)'=f'_1+f'_2\),導值的正負性來找最小值
  2. 通過壓縮\((f_1+f_2)'(\frac{a+b}{2})\)趨於零,從而確定\(\frac{a+b}{2}\)就是最小值

程式碼:

import numpy as np
import matplotlib.pyplot as plt


def f1(x):
    return (x - 2) ** 2


def df1(x):
    return 2 * (x - 2)


def f2(x):
    return (x + 1) ** 2


def df2(x):
    return 2 * (x + 1)


a, b, c, d = -1, 1, -3, 3
eps = 0.1

while b - a > eps and d - c > eps:
    if df1((a + b) / 2) <= (c + d) / 2:
        m1 = 0
    else:
        m1 = 1

    if -df2((a + b) / 2) <= (c + d) / 2:
        m2 = 0
    else:
        m2 = 1

    if m1 == 0 and m2 == 1:
        a = (a + b) / 2
    elif m1 == 1 and m2 == 0:
        b = (a + b) / 2
    elif m1 == 1 and m2 == 1:
        c = (c + d) / 2
    elif m1 == 0 and m2 == 0:
        d = (c + d) / 2

    print('傳輸訊息+2')
    print(a, b, c, d)

if b - a <= eps:
    optimum = a + eps
else:
    optimum = f1((a + b) / 2) + f2((a + b) / 2)

print(optimum)
print(f1(0.5) + f2(0.5))
# 直觀畫圖結果
x = np.linspace(-1, 2, 100)
y = f1(x) + f2(x)
plt.plot(x, y)
plt.show()

相關文章