使用 TensorFlow Serving 和 Docker 快速部署機器學習服務

weixin_33806914發表於2019-03-04

原文網址 : https://blog.csdn.net/weixin_33806914/article/details/88566593

從實驗到生產，簡單快速部署機器學習模型一直是一個挑戰。這個過程要做的就是將訓練好的模型對外提供預測服務。在生產中，這個過程需要可重現，隔離和安全。這裡，我們使用基於Docker的TensorFlow Serving來簡單地完成這個過程。TensorFlow 從1.8版本開始支援Docker部署，包括CPU和GPU，非常方便。

獲得訓練好的模型

獲取模型的第一步當然是訓練一個模型，但是這不是本篇的重點，所以我們使用一個已經訓練好的模型，比如ResNet。TensorFlow Serving 使用SavedModel這種格式來儲存其模型，SavedModel是一種獨立於語言的，可恢復，密集的序列化格式，支援使用更高階別的系統和工具來生成，使用和轉換TensorFlow模型。這裡我們直接下載一個預訓練好的模型：

$ mkdir /tmp/resnet
$ curl -s https://storage.googleapis.com/download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz | tar --strip-components=2 -C /tmp/resnet -xvz

如果是使用其他框架比如Keras生成的模型，則需要將模型轉換為SavedModel格式，比如：

from keras.models import Sequential
from keras import backend as K
import tensorflow as tf

model = Sequential()
# 中間省略模型構建

# 模型轉換為SavedModel
signature = tf.saved_model.signature_def_utils.predict_signature_def(
    inputs={'input_param': model.input}, outputs={'type': model.output})
builder = tf.saved_model.builder.SavedModelBuilder('/tmp/output_model_path/1/')
builder.add_meta_graph_and_variables(
    sess=K.get_session(),
    tags=[tf.saved_model.tag_constants.SERVING],
    signature_def_map={
        tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
            signature
    })
builder.save()

下載完成後，檔案目錄樹為：

$ tree /tmp/resnet
/tmp/resnet
└── 1538687457
    ├── saved_model.pb
    └── variables
        ├── variables.data-00000-of-00001
        └── variables.index

部署模型

使用Docker部署模型服務：

$ docker pull tensorflow/serving
$ docker run -p 8500:8500 -p 8501:8501 --name tfserving_resnet \
--mount type=bind,source=/tmp/resnet,target=/models/resnet \
-e MODEL_NAME=resnet -t tensorflow/serving

其中，8500埠對於TensorFlow Serving提供的gRPC埠，8501為REST API服務埠。-e MODEL_NAME=resnet指出TensorFlow Serving需要載入的模型名稱，這裡為resnet。上述命令輸出為

2019-03-04 02:52:26.610387: I tensorflow_serving/model_servers/server.cc:82] Building single TensorFlow model file config:  model_name: resnet model_base_path: /models/resnet
2019-03-04 02:52:26.618200: I tensorflow_serving/model_servers/server_core.cc:461] Adding/updating models.
2019-03-04 02:52:26.618628: I tensorflow_serving/model_servers/server_core.cc:558]  (Re-)adding model: resnet
2019-03-04 02:52:26.745813: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: resnet version: 1538687457}
2019-03-04 02:52:26.745901: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: resnet version: 1538687457}
2019-03-04 02:52:26.745935: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: resnet version: 1538687457}
2019-03-04 02:52:26.747590: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:363] Attempting to load native SavedModelBundle in bundle-shim from: /models/resnet/1538687457
2019-03-04 02:52:26.747705: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/resnet/1538687457
2019-03-04 02:52:26.795363: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2019-03-04 02:52:26.828614: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-03-04 02:52:26.923902: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:162] Restoring SavedModel bundle.
2019-03-04 02:52:28.098479: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:138] Running MainOp with key saved_model_main_op on SavedModel bundle.
2019-03-04 02:52:28.144510: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:259] SavedModel load for tags { serve }; Status: success. Took 1396689 microseconds.
2019-03-04 02:52:28.146646: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:83] No warmup data file found at /models/resnet/1538687457/assets.extra/tf_serving_warmup_requests
2019-03-04 02:52:28.168063: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: resnet version: 1538687457}
2019-03-04 02:52:28.174902: I tensorflow_serving/model_servers/server.cc:286] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2019-03-04 02:52:28.186724: I tensorflow_serving/model_servers/server.cc:302] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 237] RAW: Entering the event loop ...

我們可以看到，TensorFlow Serving使用1538687457作為模型的版本號。我們使用curl命令來檢視一下啟動的服務狀態，也可以看到提供服務的模型版本以及模型狀態。

$ curl http://localhost:8501/v1/models/resnet
{
 "model_version_status": [
  {
   "version": "1538687457",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": ""
   }
  }
 ]
}

檢視模型輸入輸出

很多時候我們需要檢視模型的輸出和輸出引數的具體形式，TensorFlow提供了一個saved_model_cli命令來檢視模型的輸入和輸出引數：

$ saved_model_cli show --dir /tmp/resnet/1538687457/ --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['predict']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['image_bytes'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: input_tensor:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['classes'] tensor_info:
        dtype: DT_INT64
        shape: (-1)
        name: ArgMax:0
    outputs['probabilities'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1001)
        name: softmax_tensor:0
  Method name is: tensorflow/serving/predict

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['image_bytes'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: input_tensor:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['classes'] tensor_info:
        dtype: DT_INT64
        shape: (-1)
        name: ArgMax:0
    outputs['probabilities'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1001)
        name: softmax_tensor:0
  Method name is: tensorflow/serving/predict

注意到signature_def，inputs的名稱，型別和輸出，這些引數在接下來的模型預測請求中需要。

使用模型介面預測：REST和gRPC

TensorFlow Serving提供REST API和gRPC兩種請求方式，接下來將具體這兩種方式。

REST

我們下載一個客戶端指令碼，這個指令碼會下載一張貓的圖片，同時使用這張圖片來計算服務請求時間。

$ curl -o /tmp/resnet/resnet_client.py https://raw.githubusercontent.com/tensorflow/serving/master/tensorflow_serving/example/resnet_client.py

以下指令碼使用requests庫來請求介面，使用圖片的base64編碼字串作為請求內容，返回圖片分類，並計算了平均處理時間。

from __future__ import print_function

import base64
import requests

# The server URL specifies the endpoint of your server running the ResNet
# model with the name "resnet" and using the predict interface.
SERVER_URL = 'http://localhost:8501/v1/models/resnet:predict'

# The image URL is the location of the image we should send to the server
IMAGE_URL = 'https://tensorflow.org/images/blogs/serving/cat.jpg'


def main():
  # Download the image
  dl_request = requests.get(IMAGE_URL, stream=True)
  dl_request.raise_for_status()

  # Compose a JSON Predict request (send JPEG image in base64).
  jpeg_bytes = base64.b64encode(dl_request.content).decode('utf-8')
  predict_request = '{"instances" : [{"b64": "%s"}]}' % jpeg_bytes

  # Send few requests to warm-up the model.
  for _ in range(3):
    response = requests.post(SERVER_URL, data=predict_request)
    response.raise_for_status()

  # Send few actual requests and report average latency.
  total_time = 0
  num_requests = 10
  for _ in range(num_requests):
    response = requests.post(SERVER_URL, data=predict_request)
    response.raise_for_status()
    total_time += response.elapsed.total_seconds()
    prediction = response.json()['predictions'][0]

  print('Prediction class: {}, avg latency: {} ms'.format(
      prediction['classes'], (total_time*1000)/num_requests))


if __name__ == '__main__':
  main()

輸出結果為

$ python resnet_client.py
Prediction class: 286, avg latency: 210.12310000000002 ms

gRPC

讓我們下載另一個客戶端指令碼，這個指令碼使用gRPC作為服務，傳入圖片並獲取輸出結果。這個指令碼需要安裝tensorflow-serving-api這個庫。

$ curl -o /tmp/resnet/resnet_client_grpc.py https://raw.githubusercontent.com/tensorflow/serving/master/tensorflow_serving/example/resnet_client_grpc.py
$ pip install tensorflow-serving-api

指令碼內容：

from __future__ import print_function

# This is a placeholder for a Google-internal import.

import grpc
import requests
import tensorflow as tf

from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc

# The image URL is the location of the image we should send to the server
IMAGE_URL = 'https://tensorflow.org/images/blogs/serving/cat.jpg'

tf.app.flags.DEFINE_string('server', 'localhost:8500',
                           'PredictionService host:port')
tf.app.flags.DEFINE_string('image', '', 'path to image in JPEG format')
FLAGS = tf.app.flags.FLAGS


def main(_):
  if FLAGS.image:
    with open(FLAGS.image, 'rb') as f:
      data = f.read()
  else:
    # Download the image since we weren't given one
    dl_request = requests.get(IMAGE_URL, stream=True)
    dl_request.raise_for_status()
    data = dl_request.content

  channel = grpc.insecure_channel(FLAGS.server)
  stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
  # Send request
  # See prediction_service.proto for gRPC request/response details.
  request = predict_pb2.PredictRequest()
  request.model_spec.name = 'resnet'
  request.model_spec.signature_name = 'serving_default'
  request.inputs['image_bytes'].CopyFrom(
      tf.contrib.util.make_tensor_proto(data, shape=[1]))
  result = stub.Predict(request, 10.0)  # 10 secs timeout
  print(result)


if __name__ == '__main__':
  tf.app.run()

輸出的結果可以看到圖片的分類，概率和使用的模型資訊：

$ python resnet_client_grpc.py
outputs {
  key: "classes"
  value {
    dtype: DT_INT64
    tensor_shape {
      dim {
        size: 1
      }
    }
    int64_val: 286
  }
}
outputs {
  key: "probabilities"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 1001
      }
    }
    float_val: 2.4162832232832443e-06
    float_val: 1.9012182974620373e-06
    float_val: 2.7247710022493266e-05
    float_val: 4.426385658007348e-07
    ...(中間省略)
    float_val: 1.4636580090154894e-05
    float_val: 5.812107133351674e-07
    float_val: 6.599806511076167e-05
    float_val: 0.0012952701654285192
  }
}
model_spec {
  name: "resnet"
  version {
    value: 1538687457
  }
  signature_name: "serving_default"
}

效能

通過編譯優化的TensorFlow Serving二進位制來提高效能

TensorFlows serving有時會有輸出如下的日誌：

Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

TensorFlow Serving已釋出Docker映象旨在儘可能多地使用CPU架構，因此省略了一些優化以最大限度地提高相容性。如果你沒有看到此訊息，則你的二進位制檔案可能已針對你的CPU進行了優化。根據你的模型執行的操作，這些優化可能會對你的服務效能產生重大影響。幸運的是，編譯優化的TensorFlow Serving二進位制非常簡單。官方已經提供了自動化指令碼，分以下兩部進行：

# 1. 編譯開發版本
$ docker build -t $USER/tensorflow-serving-devel -f Dockerfile.devel https://github.com/tensorflow/serving.git#:tensorflow_serving/tools/docker

# 2. 生產新的映象
$ docker build -t $USER/tensorflow-serving --build-arg TF_SERVING_BUILD_IMAGE=$USER/tensorflow-serving-devel https://github.com/tensorflow/serving.git#:tensorflow_serving/tools/docker

之後，使用新編譯的$USER/tensorflow-serving重新啟動服務即可。

總結

上面我們快速實踐了使用TensorFlow Serving和Docker部署機器學習服務的過程，可以看到，TensorFlow Serving提供了非常方便和高效的模型管理，配合Docker，可以快速搭建起機器學習服務。

參考

GitHub repo: qiwihui/blog
Follow me: @qiwihui

Site: QIWIHUI

TensorFlow Serving: 高效能機器學習模型部署利器
2024-10-09
機器學習模型
教程帖：使用TensorFlow服務和Flask部署Keras模型！
2018-10-26
FlaskKeras模型
PaddleOCR 服務化部署(基於PaddleHub Serving)
2024-03-08
【HMS Core】使用機器學習服務和搜尋服務識別植物
2023-02-09
機器學習
TensorFlow Serving
2021-04-15
使用Docker Swarm快速搭建與部署你的服務叢集
2019-01-21
DockerSwarm
[譯] 基於 TensorFlow.js 的無服務架構機器學習
2019-02-27
JS架構機器學習
基於TensorFlow Serving的深度學習線上預估
2018-10-18
深度學習
spring cloud-之使用docker部署服務
2021-09-09
SpringCloudDocker
如何使用華為機器學習服務和Kotlin實現語音合成
2020-08-17
機器學習Kotlin
機器學習模型python線上服務部署的兩種例項
2020-09-22
機器學習模型Python
從預處理到部署：如何使用Lore快速構建機器學習模型
2018-03-13
機器學習模型
Docker：Docker搭建Jenkins並共用宿主機Docker部署服務（五）跨伺服器遠端部署後端微服務多模組
2024-12-04
DockerJenkins伺服器後端微服務
從模型到部署，教你如何用Python構建機器學習API服務
2024-04-08
模型Python機器學習API
快速安裝Docker服務
2024-06-24
Docker
前端如何學習機器學習之TensorFlow.js
2019-03-03
前端機器學習JS
docker初體驗：docker部署nginx服務
2021-08-27
DockerNginx
TensorFlow Hub：探索機器學習元件化
2018-12-13
機器學習元件化
使用 k8s 快速部署應用服務
2024-04-26
K8S
Kubernetes入門(四)——如何在Kubernetes中部署一個可對外服務的Tensorflow機器學習模型
2020-09-19
機器學習模型
Docker學習之搭建MySql容器服務
2018-03-29
DockerMySql
使用Docker快速部署開源商城
2024-03-30
Docker
SpringBoot使用Docker快速部署專案
2019-07-11
Spring BootDocker
AWS 推出五項機器學習新服務，重塑和改進企業日常任務，無需機器學習經驗
2019-12-04
機器學習
《TensorFlow 機器學習方案手冊》（附 pdf 和完整程式碼）
2019-04-03
機器學習
【機器學習】在生產環境使用Kafka構建和部署大規模機器學習
2018-03-04
機器學習Kafka
工業場景全流程！機器學習開發並部署服務到雲端 ⛵
2022-11-19
機器學習
HMS Core機器學習服務助力Zaful使用者便捷購物
2022-02-15
機器學習
使用 TensorFlow Extended (TFX) 在生產環境中部署機器學習丨 Google 開發者大會 2018
2019-03-04
機器學習Go
Docker學習之搭建ActiveMQ訊息服務
2021-09-09
DockerMQ
【Kubernetes學習筆記】-使用Minikube快速部署K8S單機學習環境
2020-11-09
筆記K8S
機器學習PAI快速入門與業務實戰
2018-09-17
機器學習AI
使用 Docker 和 Node 快速實現一個線上的 QRCode 解碼服務
2018-12-09
Docker
Debian10快速部署DHCP服務
2020-11-29
使用SAP Leonardo上的機器學習服務提取圖片的特徵向量
2019-09-28
機器學習特徵
Docker構建服務之部署和備份Jekyll網站
2019-01-21
Docker網站
Docker部署深度學習模型
2024-06-05
Docker深度學習模型
機器學習PAI快速入門
2019-07-05
機器學習AI