Keras(十一)梯度带(GradientTape)的基本使用方法，与tf.keras结合使用-白红宇

Keras(十一)梯度带(GradientTape)的基本使用方法，与tf.keras结合使用

阅读量：4202 次

发布时间：2019-05-26

本文共 9117 字，大约阅读时间需要 30 分钟。

本文将介绍如下内容：

近似求导

tf.GradientTape基本使用方法

tf.GradientTape与tf.keras结合使用

一，近似求导

在近似求导中，本质是取求导参数前后的两个偏移量的点，带入函数中，近似求导。该偏移量越小，其近似求导的数值越接近真实值。

1，对一元方程的近似求导

# 对一元方程的近似求导def f(x):    return 3. * x ** 2 + 2. * x - 1def approximate_derivative(f, x, eps=1e-3):    return (f(x + eps) - f(x - eps)) / (2. * eps)print(approximate_derivative(f, 1.))# ---output------7.999999999999119

f(x)为求导函数

approximate_derivative为求导方法，其中eps为x轴上的偏移量

2，对二元方程的近似求导

# 对二元方程的近似求导def approximate_derivative(f, x, eps=1e-3):    return (f(x + eps) - f(x - eps)) / (2. * eps)def g(x1, x2):    return (x1 + 5) * (x2 ** 2)def approximate_gradient(g, x1, x2, eps=1e-3):    dg_x1 = approximate_derivative(lambda x: g(x, x2), x1, eps)    dg_x2 = approximate_derivative(lambda x: g(x1, x), x2, eps)    return dg_x1, dg_x2print(approximate_gradient(g, 2., 3.))# ---output------(8.999999999993236, 41.999999999994486)

3，总结代码

import matplotlib as mplimport matplotlib.pyplot as pltimport numpy as npimport sklearnimport pandas as pdimport osimport sysimport timeimport tensorflow as tffrom tensorflow import keras# 1,打印使用的python库的版本信息print(tf.__version__)print(sys.version_info)for module in mpl, np, pd, sklearn, tf, keras:    print(module.__name__, module.__version__)# 2,对一元方程的近似求导def f(x):    return 3. * x ** 2 + 2. * x - 1def approximate_derivative(f, x, eps=1e-3):    return (f(x + eps) - f(x - eps)) / (2. * eps)print(approximate_derivative(f, 1.))# 3,对二元方程的近似求导def g(x1, x2):    return (x1 + 5) * (x2 ** 2)def approximate_gradient(g, x1, x2, eps=1e-3):    dg_x1 = approximate_derivative(lambda x: g(x, x2), x1, eps)    dg_x2 = approximate_derivative(lambda x: g(x1, x), x2, eps)    return dg_x1, dg_x2print(approximate_gradient(g, 2., 3.))#---output-----7.999999999999119(8.999999999993236, 41.999999999994486)

二，tf.GradientTape基本使用方法

梯度带是新版本tensorflow非常常用的一个特性了，因为一旦涉及到计算梯度的问题就离不开这个新的API。

1，使用tf.GradientTape()求函数导数

1）当函数GradientTape中的persistent可持久参数为False时

注意：当函数GradientTape中的persistent可持久参数为False时，GradientTape实例化出来的对象只能调用一次gradient求导方法。

def g(x1, x2):    return (x1 + 5) * (x2 ** 2)x1 = tf.Variable(2.0)x2 = tf.Variable(3.0)with tf.GradientTape(persistent=False,watch_accessed_variables=True) as tape:    z = g(x1, x2)dz_x1 = tape.gradient(target=z, sources=x1)print(dz_x1)try:    dz_x2 = tape.gradient(target=z,sources=x2)except RuntimeError as ex:    print(ex)try:    dz_x2 = tape.gradient(z, x2)except RuntimeError as ex:    print(ex)#---output--------------tf.Tensor(9.0, shape=(), dtype=float32)GradientTape.gradient can only be called once on non-persistent tapes.

persistent: 布尔值，用来指定新创建的gradient tape是否是可持续性的。默认是False，意味着只能够调用一次gradient（）函数。

watch_accessed_variables: 布尔值，表明这个gradien tap是不是会自动追踪任何能被训练（trainable）的变量。默认是True。要是为False的话，意味着你需要手动去指定你想追踪的那些变量。

2）当函数GradientTape中的persistent可持久参数为True时

当函数GradientTape中的persistent可持久参数为True时，GradientTape实例化出来的对象可以多次调用gradient求导方法。

注意： 由于在GradientTape中做了持久化，所以结束程序之前需要删除GradientTape实例化出来的对象，释放内存。

x1 = tf.Variable(2.0)x2 = tf.Variable(3.0)with tf.GradientTape(persistent = True) as tape:    z = g(x1, x2)dz_x1 = tape.gradient(target=z,sources=x1)dz_x2 = tape.gradient(target=z,sources=x2)print(dz_x1, dz_x2)del tape# ---output-----tf.Tensor(9.0, shape=(), dtype=float32) tf.Tensor(42.0, shape=(), dtype=float32)

2，对多元方程一次依次求偏导

注意：传入参数列表，返回导数结果列表

x1 = tf.Variable(2.0)x2 = tf.Variable(3.0)with tf.GradientTape() as tape:    z = g(x1, x2)dz_x1x2 = tape.gradient(z, [x1, x2])print(dz_x1x2)#---output------[
   
    , 
    
     ]

3，对传入参数为常数的方程求导

1）直接求导结果为None

x1 = tf.constant(2.0)x2 = tf.constant(3.0)with tf.GradientTape() as tape:    z = g(x1, x2)dz_x1x2 = tape.gradient(z, [x1, x2])print(dz_x1x2)#---output-----[None, None]

2）对实例化的对象调用watch()方法追踪常量

# 对实例化的对象调用watch()方法追踪常量x1 = tf.constant(2.0)x2 = tf.constant(3.0)with tf.GradientTape() as tape:    tape.watch(x1)    tape.watch(x2)    z = g(x1, x2)dz_x1x2 = tape.gradient(z, [x1, x2])print(dz_x1x2)#---output---------[
   
    , 
    
     ]

4，对多个方程的同一参数求导

注意：此时会将每个方程的求导结果相加

x = tf.Variable(5.0)with tf.GradientTape() as tape:    z1 = 3 * x    z2 = x ** 2print(tape.gradient([z1, z2], x))#---output-------tf.Tensor(13.0, shape=(), dtype=float32)

5，对一个二元方程求其二阶导数

注意：返回的顺序依次为(x1,x1)、(x2,x1)、(x1,x2)、(x2,x2)。所以最后x1，x2的二阶导数为第一个和最后一个。

x1 = tf.Variable(2.0)x2 = tf.Variable(3.0)with tf.GradientTape(persistent=True) as outer_tape:    with tf.GradientTape(persistent=True) as inner_tape:        z = g(x1, x2)    inner_grads = inner_tape.gradient(z, [x1, x2])outer_grads = [outer_tape.gradient(inner_grad, [x1, x2]) for inner_grad in inner_grads]print(outer_grads)del inner_tapedel outer_tape# ---output-----[[None, 
   
    ], [
    
     , 
     
      ]]

6，简单实现梯度下降算法

梯度下降算法分为两步：

求其导数

向导数方向前进单位步长

1）模拟损失函数

def f(x):    return 3. * x ** 2 + 2. * x - 1# 使用matplotlib画出函数图型import numpy as npimport matplotlib.pyplot as pltdef sigmoid(x):    return 3 * x ** 2 + 2 * x - 1x = np.arange(-5., 5., 0.2)y = sigmoid(x)plt.grid(True)plt.plot(x, y)

得到图形如下：

2）实现简单的梯度下降算法

如上图可知，当函数的斜率为0时，该函数的y值最小。

x -= learning_rate * derivatives 来实现x参数按学习率的步长向导数方向滑动。

def f(x):    return 3. * x ** 2 + 2. * x - 1learning_rate = 0.1x = tf.Variable(0.0)for _ in range(1000):    with tf.GradientTape() as tape:        z = f(x)    dz_dx = tape.gradient(z, x)    x.assign_sub(learning_rate * dz_dx)print(x)

3）使用tf中的optimizers优化器实现简单的梯度下降算法

learning_rate = 0.1x = tf.Variable(0.0)# 实例化optimizer对象optimizer = keras.optimizers.SGD(lr = learning_rate)for _ in range(100):    with tf.GradientTape() as tape:        z = f(x)    # 求得x在函数中当前的导数    dz_dx = tape.gradient(z, x)    # 向optimizer优化器传入求得的导数和x值，更新x数值    optimizer.apply_gradients([(dz_dx, x)])	# 梯度在前，变量在后。print(x)

三，tf.GradientTape与tf.keras结合使用

#!/usr/bin/env python3# -*- coding: utf-8 -*-import matplotlib as mplimport matplotlib.pyplot as pltimport numpy as npimport sklearnimport pandas as pdimport osimport sysimport timeimport tensorflow as tffrom tensorflow import keras# 1,打印使用的python库的版本信息print(tf.__version__)print(sys.version_info)for module in mpl, np, pd, sklearn, tf, keras:    print(module.__name__, module.__version__)    # 2,下载并使用sklearn中的“fetch_california_housing”数据集from sklearn.datasets import fetch_california_housinghousing = fetch_california_housing()print(housing.DESCR)print(housing.data.shape)print(housing.target.shape)# 3,拆分数据集中的数据为 训练数据、验证数据、测试数据from sklearn.model_selection import train_test_splitx_train_all, x_test, y_train_all, y_test = train_test_split(housing.data, housing.target, random_state = 7)x_train, x_valid, y_train, y_valid = train_test_split(x_train_all, y_train_all, random_state = 11)print(x_train.shape, y_train.shape)print(x_valid.shape, y_valid.shape)print(x_test.shape, y_test.shape)# 4,在将数据带入到模型之前，先进行预处理-训练、验证、测试数据标准化from sklearn.preprocessing import StandardScalerscaler = StandardScaler()x_train_scaled = scaler.fit_transform(x_train)x_valid_scaled = scaler.transform(x_valid)x_test_scaled = scaler.transform(x_test)# 5,metric的使用方法# metric的计算方法与mse_loss函数类似,使用均方误差计算metric = keras.metrics.MeanSquaredError()print(metric([5.], [2.]))print(metric([0.], [1.]))print(metric.result())metric.reset_states()metric([1.], [3.])print(metric.result())# # 1. batch 遍历训练集 metric#    1.1 自动求导# 2. epoch结束 验证集 metric# 6,tf.GradientTape与tf.keras的结合使用,替换tf.fit方法epochs = 100    # 训练次数batch_size = 32    # 每次训练的每个batch数量steps_per_epoch = len(x_train_scaled) // batch_size     # 每次训练的batch数量optimizer = keras.optimizers.SGD(learning_rate=3e-3)    # 定义优化器,并传入学习率metric = keras.metrics.MeanSquaredError()               # 实例化keras的测量函数# 从测试数据集中随机取batch_size个测试数据和标准数据def random_batch(x, y, batch_size=32):    idx = np.random.randint(0, len(x), size=batch_size)    return x[idx], y[idx]# 定义模型层级结构model = keras.models.Sequential([    keras.layers.Dense(30, activation='relu',input_shape=x_train.shape[1:]),    keras.layers.Dense(1),])for epoch in range(epochs):    # 每次训练之前,初始化keras的测量函数对象    metric.reset_states()    # 在每次训练中遍历 (训练集/每个batch数量) 次,正向反向传播训练    for step in range(steps_per_epoch):        # 调用自定义的随机取值方法,选取batch的测试数据和训练数据        x_batch, y_batch = random_batch(x_train_scaled, y_train,batch_size)        # 调用tf.GradientTape()方法,对损失函数求导,求出其导数        with tf.GradientTape() as tape:            """with tape中写入所有求导公式"""            # 将数据带入模型,得到预测值            y_pred = model(x_batch)            # 对预测值的维度为二维数据,将数据下降到一维数据            y_pred = tf.squeeze(y_pred, 1)            # 求算损失函数的公式如下            loss = keras.losses.mean_squared_error(y_batch, y_pred)        # 调用keras的测量函数方法,使用均方误差测量误差大小        metric(y_batch, y_pred)        # 使用tape方法对函数的所有变量求偏导        grads = tape.gradient(loss, model.variables)        # 将求出的导数和模型所有的参数变量打包为元组        grads_and_vars = zip(grads, model.variables)        # 使用优化器更新参数变量        optimizer.apply_gradients(grads_and_vars)        # 打印每次训练每个batch的训练数据的均方误差        print("\rEpoch",epoch+1," train mse:",metric.result().numpy(),end="")    # 在每次正反向传播训练后,更新好模型参数变量后,带入验证数据验证数据的均方误差    y_valid_pred = model(x_valid_scaled)    # 对预测值的维度为二维数据,将数据下降到一维数据    y_valid_pred = tf.squeeze(y_valid_pred, 1)    # 求得所有验证数据集上的loss损失函数的均方误差    valid_loss = keras.losses.mean_squared_error(y_valid_pred, y_valid)    print("\t", "valid mse: ", valid_loss.numpy())

转载地址：http://fvili.baihongyu.com/

你可能感兴趣的文章