tensorflow使用tf.GradientTape()求導

hzq201435發表於2020-11-15

自定義求導

求只含一個未知數的導數

利用初高中的求導方法,用函式在某範圍內左邊的值減去右邊的值,然後除以x座標之差距離。
例如求:y = 3x^2+2x-1 的倒數

def f(x):
    return 3. * x ** 2 + 2. * x - 1

def approximate_derivative(f, x, eps=1e-3):
    return (f(x + eps) - f(x - eps)) / (2. * eps)

print(approximate_derivative(f, 1.))

正確結果為6x+2=8, 可以看到已經和真實值很接近了

# output
7.999999999999119

求含有兩個未知數函式的偏導

def g(x1, x2):
    return (x1 + 5) * (x2 ** 2)

def approximate_gradient(g, x1, x2, eps=1e-3):
    dg_x1 = approximate_derivative(lambda x: g(x, x2), x1, eps)
    dg_x2 = approximate_derivative(lambda x: g(x1, x), x2, eps)
    return dg_x1, dg_x2

print(approximate_gradient(g, 2., 3.))
# output
(8.999999999993236, 41.999999999994486)

使用tf.GradientTape()求導

求只含一個變數的倒數

分兩次求導

def g(x1, x2):
    return (x1 + 5) * (x2 ** 2)
    
x1 = tf.Variable(2.0)
x2 = tf.Variable(3.0)
# 注意,要加persistent = True,不然使用一次tape就不能用了
with tf.GradientTape(persistent = True) as tape:
    z = g(x1, x2)

dz_x1 = tape.gradient(z, x1)
dz_x2 = tape.gradient(z, x2)
print(dz_x1, dz_x2)
# 因為加了persistent = True,所以最後要人工刪除,以免佔用資源
del tape

也可同時寫在一個裡面,如果是常量constant,就需要用watch來觀看

x1 = tf.constant(2.0)
x2 = tf.constant(3.0)
with tf.GradientTape() as tape:
    z = g(x1, x2)

dz_x1x2 = tape.gradient(z, [x1, x2])
print(dz_x1x2)
# output
[None, None]

使用watch觀看結果

x1 = tf.constant(2.0)
x2 = tf.constant(3.0)
with tf.GradientTape() as tape:
    tape.watch(x1)
    tape.watch(x2)
    z = g(x1, x2)

dz_x1x2 = tape.gradient(z, [x1, x2])

print(dz_x1x2)
# output
[<tf.Tensor: id=192, shape=(), dtype=float32, numpy=9.0>, <tf.Tensor: id=204, shape=(), dtype=float32, numpy=42.0>]

用兩個目標函式對一個變數求倒數

x = tf.Variable(5.0)
with tf.GradientTape() as tape:
    z1 = 3 * x
    z2 = x ** 2
tape.gradient([z1, z2], x)

# output
# 輸出值是兩個二函式二階倒數的和
<tf.Tensor: id=261, shape=(), dtype=float32, numpy=13.0>

求x1,x2的二階倒數

x1 = tf.Variable(2.0)
x2 = tf.Variable(3.0)
# 要使用兩個tape
with tf.GradientTape(persistent=True) as outer_tape:
    with tf.GradientTape(persistent=True) as inner_tape:
        z = g(x1, x2)
    inner_grads = inner_tape.gradient(z, [x1, x2])
outer_grads = [outer_tape.gradient(inner_grad, [x1, x2])
               for inner_grad in inner_grads]
print(outer_grads)
del inner_tape
del outer_tape

輸出值是一個列表

[[None, <tf.Tensor: id=324, shape=(), dtype=float32, numpy=6.0>], [<tf.Tensor: id=378, shape=(), dtype=float32, numpy=6.0>, <tf.Tensor: id=361, shape=(), dtype=float32, numpy=14.0>]]

用GradientTape執行梯度下降的過程

learning_rate = 0.1
x = tf.Variable(0.0)

for _ in range(100):
    with tf.GradientTape() as tape:
        z = f(x)
    # 左邊是函式,右邊是變數
    dz_dx = tape.gradient(z, x)
    # 執行相減的操作
    x.assign_sub(learning_rate * dz_dx)
print(x)
# output
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=-0.3333333>

使用keras和GradientTape結合完成梯度下降過程

learning_rate = 0.1
x = tf.Variable(0.0)

optimizer = keras.optimizers.SGD(lr = learning_rate)

for _ in range(100):
    with tf.GradientTape() as tape:
        z = f(x)
    dz_dx = tape.gradient(z, x)
    optimizer.apply_gradients([(dz_dx, x)])
print(x)
# output
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=-0.3333333>

相關文章