TensorFlowVSTensorFlowMobileVSTensorFlowLite

zhang_sl發表於2018-10-16

TensorFlow的簡介

TensorFlow是一個機器學習框架,其整體架構設計主要分成Client,Master和Worker。解耦的架構使得它具有高度靈活性,使它可以方便地在機器叢集上部署。

TensorFlow的程式碼架構

TensorFlow整體架構如下(圖片來自官網)。
image.png

Client

Client是演算法工程師直接接觸使用的。有Python,C++,Java等不同的版本。它的主要作用是:

  • 將計算過程定義成計算圖。機器學習主要存在命令式和宣告式兩種不同的程式設計模型。指令式程式設計模型就是我們一般的程式設計方式。宣告式模型類似於RxJava那樣,先構建一個資料通道,等事件觸發時,才會真正有資料喂入,並執行。TensorFlow就是宣告式的程式設計模型。演算法工程師利用Client的API,構建一個計算圖。
  • 提供Session介面執行計算圖。
Distributed Master
  • 將計算圖切分成更小的子計算圖。
  • 將子計算圖進一步切分成更小的計算片段,使之能夠並行執行在不同的程式乃至不同的裝置上。
  • 將計算片段分發給不同的Worker。
  • 觸發Worker執行分配到的計算任務。
Worker Services
  • 呼叫TensorFlow核心,根據可用的硬體情況執行計算片段。
  • 和其他Worker進行互動,傳送和接收計算結果。
Kernel Implementations
  • 提供細粒度,獨立的計算功能(operation),例如加法,減法,字串切割。

移動端的TensorFlow

在端側直接執行模型有節省頻寬,響應及時,不受網路好壞通斷影響更加穩定,無需資料傳輸更加安全等優點。因此端側執行模型是有需求的。在移動裝置或者其他嵌入式裝置上執行TensorFlow,其關注點和雲端就有所不同。需要著重注意更低的功耗,更快的速度,更小的size。當前針對移動裝置,有TensorFlow Mobile和TensorFlow Lite兩種解決方案。TensorFlow Mobile比較早出來,比較穩定,但效能等方面沒有針對移動端作過多優化,目前已不推薦使用,預計到2019年初就會被廢棄。
根據官網的介紹,TensorFlow Mobile和TensorFlow Lite的主要區別是:

  • TensorFlow Lite是TensorFlow Mobile的進化版。在大多數情況下,TensorFlow Lite擁有跟小的二進位制大小,更少的依賴以及更好的效能。
  • TensorFlow Lite尚在開發階段,可能存在一些功能尚未補齊。不過官方承諾正在加大力度開發。
  • TensorFlow Lite支援的OP比較有限,相比之下TensorFlow Mobile更加全面。

從原始碼看區別

以上是官網的介紹,然而看這介紹依然比較模糊。TensorFlow Mobile到底精簡了啥,它支援哪些OP?TensorFlow Lite在實現上到底有何區別?為搞清這些問題,只有分析原始碼了。

TensorFlow 程式碼目錄介紹

Tensorflow/core目錄包含了TF核心模組程式碼。
public: API介面標頭檔案目錄,用於外部介面呼叫的API定義,主要是session.h 和tensor_c_api.h。
client: API介面實現檔案目錄。
platform: OS系統相關介面檔案,如file system, env等。
protobuf: 均為.proto檔案,用於資料傳輸時的結構序列化.
common_runtime: 公共執行庫,包含session, executor, threadpool, rendezvous, memory管理, 裝置分配演算法等。
distributed_runtime: 分散式執行模組,如rpc session, rpc master, rpc worker, graph manager。
framework: 包含基礎功能模組,如log, memory, tensor
graph: 計算流圖相關操作,如construct, partition, optimize, execute等
kernels: 核心Op,如matmul, conv2d, argmax, batch_norm等
lib: 公共基礎庫,如gif、gtl(google模板庫)、hash、histogram等。
ops: 基本ops運算,ops梯度運算,io相關的ops,控制流和資料流操作

Tensorflow/stream_executor目錄是平行計算框架,由google stream executor團隊開發。
Tensorflow/contrib目錄是contributor開發目錄,其中android目錄下是android版本的TensorFlow mobile。lite目錄下正是TensorFlow lite的原始碼。
Tensroflow/python目錄是python API客戶端指令碼。
Tensorflow/tensorboard目錄是視覺化分析工具,不僅可以模型視覺化,還可以監控模型引數變化。
third_party目錄是TF第三方依賴庫。
eigen3: eigen矩陣運算庫,TF基礎ops呼叫
gpus: 封裝了cuda/cudnn程式設計庫

TensorFlow Mobile精簡了啥?

TensorFlow採用bazel進行編譯,因此我們可以通過檢視編譯檔案來分析區別。

TensorFlow預設的編譯配置
===== /tensorflow/BUILD ===== 
tf_cc_shared_object(
    name = "libtensorflow.so",
    linkopts = select({
        "//tensorflow:darwin": [
            "-Wl,-exported_symbols_list",  # This line must be directly followed by the exported_symbols.lds file
            "$(location //tensorflow/c:exported_symbols.lds)",
            "-Wl,-install_name,@rpath/libtensorflow.so",
        ],
        "//tensorflow:windows": [],
        "//conditions:default": [
            "-z defs",
            "-Wl,--version-script",  #  This line must be directly followed by the version_script.lds file
            "$(location //tensorflow/c:version_script.lds)",
        ],
    }),
    visibility = ["//visibility:public"],
    deps = [
        "//tensorflow/c:c_api",
        "//tensorflow/c:c_api_experimental",
        "//tensorflow/c:exported_symbols.lds",
        "//tensorflow/c:version_script.lds",
        "//tensorflow/c/eager:c_api",
        "//tensorflow/core:tensorflow",
    ],
)

===== /tensorflow/c/BUILD ===== 
tf_cuda_library(
    name = "c_api",
    srcs = [
        "c_api.cc",
        "c_api_function.cc",
    ],
    hdrs = [
        "c_api.h",
    ],
    copts = tf_copts(),
    visibility = ["//visibility:public"],
    deps = select({
        "//tensorflow:android": [
            ":c_api_internal",
            "//tensorflow/core:android_tensorflow_lib_lite",
        ],
        "//conditions:default": [
            ":c_api_internal",
            "//tensorflow/cc/saved_model:loader",
            "//tensorflow/cc:gradients",
            "//tensorflow/cc:ops",
            "//tensorflow/cc:grad_ops",
            "//tensorflow/cc:scope_internal",
            "//tensorflow/cc:while_loop",
            "//tensorflow/core:core_cpu",
            "//tensorflow/core:core_cpu_internal",
            "//tensorflow/core:framework",
            "//tensorflow/core:op_gen_lib",
            "//tensorflow/core:protos_all_cc",
            "//tensorflow/core:lib",
            "//tensorflow/core:lib_internal",
        ],
    }) + select({
        "//tensorflow:with_xla_support": [
            "//tensorflow/compiler/tf2xla:xla_compiler",
            "//tensorflow/compiler/jit",
        ],
        "//conditions:default": [],
    }),
)


tf_cuda_library(
    name = "c_api_experimental",
    srcs = [
        "c_api_experimental.cc",
    ],
    hdrs = [
        "c_api_experimental.h",
    ],
    copts = tf_copts(),
    visibility = ["//visibility:public"],
    deps = [
        ":c_api",
        ":c_api_internal",
        "//tensorflow/c/eager:c_api",
        "//tensorflow/compiler/jit/legacy_flags:mark_for_compilation_pass_flags",
        "//tensorflow/contrib/tpu:all_ops",
        "//tensorflow/core:core_cpu",
        "//tensorflow/core:framework",
        "//tensorflow/core:lib",
        "//tensorflow/core:lib_platform",
        "//tensorflow/core:protos_all_cc",
    ],
)


===== /tensorflow/c/eager/BUILD ===== 
tf_cuda_library(
    name = "c_api",
    srcs = [
        "c_api.cc",
        "c_api_debug.cc",
        "c_api_internal.h",
    ],
    hdrs = ["c_api.h"],
    copts = tf_copts() + tfe_xla_copts(),
    visibility = ["//visibility:public"],
    deps = select({
        "//tensorflow:android": [
            "//tensorflow/core:android_tensorflow_lib_lite",
        ],
        "//conditions:default": [
            "//tensorflow/c:c_api",
            "//tensorflow/c:c_api_internal",
            "//tensorflow/core:core_cpu",
            "//tensorflow/core/common_runtime/eager:attr_builder",
            "//tensorflow/core/common_runtime/eager:context",
            "//tensorflow/core/common_runtime/eager:eager_executor",
            "//tensorflow/core/common_runtime/eager:execute",
            "//tensorflow/core/common_runtime/eager:kernel_and_device",
            "//tensorflow/core/common_runtime/eager:tensor_handle",
            "//tensorflow/core/common_runtime/eager:copy_to_device_node",
            "//tensorflow/core:core_cpu_internal",
            "//tensorflow/core:framework",
            "//tensorflow/core:framework_internal",
            "//tensorflow/core:lib",
            "//tensorflow/core:lib_internal",
            "//tensorflow/core:protos_all_cc",
        ],
    }) + select({
        "//tensorflow:with_xla_support": [
            "//tensorflow/compiler/tf2xla:xla_compiler",
            "//tensorflow/compiler/jit",
            "//tensorflow/compiler/jit:xla_device",
        ],
        "//conditions:default": [],
    }) + [
        "//tensorflow/core/common_runtime/eager:eager_operation",
        "//tensorflow/core/distributed_runtime/eager:eager_client",
        "//tensorflow/core/distributed_runtime/rpc/eager:grpc_eager_client",
        "//tensorflow/core/distributed_runtime/rpc:grpc_channel",
        "//tensorflow/core/distributed_runtime/rpc:grpc_server_lib",
        "//tensorflow/core/distributed_runtime/rpc:grpc_worker_cache",
        "//tensorflow/core/distributed_runtime/rpc:grpc_worker_service",
        "//tensorflow/core/distributed_runtime/rpc:rpc_rendezvous_mgr",
        "//tensorflow/core/distributed_runtime:remote_device",
        "//tensorflow/core/distributed_runtime:server_lib",
        "//tensorflow/core/distributed_runtime:worker_env",
        "//tensorflow/core:gpu_runtime",
    ],
)

===== /tensorflow/core/BUILD ===== 
cc_library(
    name = "tensorflow",
    visibility = ["//visibility:public"],
    deps = [
        ":tensorflow_opensource",
        "//tensorflow/core/platform/default/build_config:tensorflow_platform_specific",
    ],
)


tf_cuda_library(
    name = "tensorflow_opensource",
    copts = tf_copts(),
    visibility = ["//visibility:public"],
    deps = [
        ":all_kernels",
        ":core",
        ":direct_session",
        ":example_parser_configuration",
        ":gpu_runtime",
        ":lib",
    ],
)


cc_library(
    name = "all_kernels",
    visibility = ["//visibility:public"],
    deps = if_dynamic_kernels(
        [],
        otherwise = [":all_kernels_statically_linked"],
    ),
)


# This is a link-only library to provide a DirectSession
# implementation of the Session interface.
tf_cuda_library(
    name = "direct_session",
    copts = tf_copts(),
    linkstatic = 1,
    visibility = ["//visibility:public"],
    deps = [
        ":direct_session_internal",
    ],
    alwayslink = 1,
)

filegroup(
    name = "example_parser_configuration_testdata",
    srcs = [
        "example/testdata/parse_example_graph_def.pbtxt",
    ],
)

cc_library(
    name = "core",
    visibility = ["//visibility:public"],
    deps = [
        ":core_cpu",
        ":gpu_runtime",
        ":sycl_runtime",
    ],
)


cc_library(
    name = "lib",
    hdrs = [
        "lib/bfloat16/bfloat16.h",
        "lib/core/arena.h",
        "lib/core/bitmap.h",
        "lib/core/bits.h",
        "lib/core/casts.h",
        "lib/core/coding.h",
        "lib/core/errors.h",
        "lib/core/notification.h",
        "lib/core/raw_coding.h",
        "lib/core/status.h",
        "lib/core/stringpiece.h",
        "lib/core/threadpool.h",
        "lib/gtl/array_slice.h",
        "lib/gtl/cleanup.h",
        "lib/gtl/compactptrset.h",
        "lib/gtl/flatmap.h",
        "lib/gtl/flatset.h",
        "lib/gtl/inlined_vector.h",
        "lib/gtl/optional.h",
        "lib/gtl/priority_queue_util.h",
        "lib/hash/crc32c.h",
        "lib/hash/hash.h",
        "lib/histogram/histogram.h",
        "lib/io/buffered_inputstream.h",
        "lib/io/compression.h",
        "lib/io/inputstream_interface.h",
        "lib/io/path.h",
        "lib/io/proto_encode_helper.h",
        "lib/io/random_inputstream.h",
        "lib/io/record_reader.h",
        "lib/io/record_writer.h",
        "lib/io/table.h",
        "lib/io/table_builder.h",
        "lib/io/table_options.h",
        "lib/math/math_util.h",
        "lib/monitoring/collected_metrics.h",
        "lib/monitoring/collection_registry.h",
        "lib/monitoring/counter.h",
        "lib/monitoring/gauge.h",
        "lib/monitoring/metric_def.h",
        "lib/monitoring/sampler.h",
        "lib/random/distribution_sampler.h",
        "lib/random/philox_random.h",
        "lib/random/random_distributions.h",
        "lib/random/simple_philox.h",
        "lib/strings/numbers.h",
        "lib/strings/proto_serialization.h",
        "lib/strings/str_util.h",
        "lib/strings/strcat.h",
        "lib/strings/stringprintf.h",
        ":platform_base_hdrs",
        ":platform_env_hdrs",
        ":platform_file_system_hdrs",
        ":platform_other_hdrs",
        ":platform_port_hdrs",
        ":platform_protobuf_hdrs",
    ],
    visibility = ["//visibility:public"],
    deps = [
        ":lib_internal",
        "@com_google_absl//absl/container:inlined_vector",
        "@com_google_absl//absl/strings",
        "@com_google_absl//absl/types:optional",
    ],
)

# This includes implementations of all kernels built into TensorFlow.
cc_library(
    name = "all_kernels_statically_linked",
    visibility = ["//visibility:private"],
    deps = [
        "//tensorflow/core/kernels:array",
        "//tensorflow/core/kernels:audio",
        "//tensorflow/core/kernels:batch_kernels",
        "//tensorflow/core/kernels:bincount_op",
        "//tensorflow/core/kernels:boosted_trees_ops",
        "//tensorflow/core/kernels:candidate_sampler_ops",
        "//tensorflow/core/kernels:checkpoint_ops",
        "//tensorflow/core/kernels:collective_ops",
        "//tensorflow/core/kernels:control_flow_ops",
        "//tensorflow/core/kernels:ctc_ops",
        "//tensorflow/core/kernels:cudnn_rnn_kernels",
        "//tensorflow/core/kernels:data_flow",
        "//tensorflow/core/kernels:dataset_ops",
        "//tensorflow/core/kernels:decode_proto_op",
        "//tensorflow/core/kernels:encode_proto_op",
        "//tensorflow/core/kernels:fake_quant_ops",
        "//tensorflow/core/kernels:function_ops",
        "//tensorflow/core/kernels:functional_ops",
        "//tensorflow/core/kernels:grappler",
        "//tensorflow/core/kernels:histogram_op",
        "//tensorflow/core/kernels:image",
        "//tensorflow/core/kernels:io",
        "//tensorflow/core/kernels:linalg",
        "//tensorflow/core/kernels:list_kernels",
        "//tensorflow/core/kernels:lookup",
        "//tensorflow/core/kernels:logging",
        "//tensorflow/core/kernels:manip",
        "//tensorflow/core/kernels:math",
        "//tensorflow/core/kernels:multinomial_op",
        "//tensorflow/core/kernels:nn",
        "//tensorflow/core/kernels:parameterized_truncated_normal_op",
        "//tensorflow/core/kernels:parsing",
        "//tensorflow/core/kernels:partitioned_function_ops",
        "//tensorflow/core/kernels:random_ops",
        "//tensorflow/core/kernels:random_poisson_op",
        "//tensorflow/core/kernels:remote_fused_graph_ops",
        "//tensorflow/core/kernels:required",
        "//tensorflow/core/kernels:resource_variable_ops",
        "//tensorflow/core/kernels:rpc_op",
        "//tensorflow/core/kernels:scoped_allocator_ops",
        "//tensorflow/core/kernels:sdca_ops",
        "//tensorflow/core/kernels:searchsorted_op",
        "//tensorflow/core/kernels:set_kernels",
        "//tensorflow/core/kernels:sparse",
        "//tensorflow/core/kernels:state",
        "//tensorflow/core/kernels:stateless_random_ops",
        "//tensorflow/core/kernels:string",
        "//tensorflow/core/kernels:summary_kernels",
        "//tensorflow/core/kernels:training_ops",
        "//tensorflow/core/kernels:word2vec_kernels",
    ] + tf_additional_cloud_kernel_deps() + if_not_windows([
        "//tensorflow/core/kernels:fact_op",
        "//tensorflow/core/kernels:array_not_windows",
        "//tensorflow/core/kernels:math_not_windows",
        "//tensorflow/core/kernels:quantized_ops",
        "//tensorflow/core/kernels/neon:neon_depthwise_conv_op",
    ]) + if_mkl([
        "//tensorflow/core/kernels:mkl_concat_op",
        "//tensorflow/core/kernels:mkl_conv_op",
        "//tensorflow/core/kernels:mkl_cwise_ops_common",
        "//tensorflow/core/kernels:mkl_fused_batch_norm_op",
        "//tensorflow/core/kernels:mkl_identity_op",
        "//tensorflow/core/kernels:mkl_input_conversion_op",
        "//tensorflow/core/kernels:mkl_lrn_op",
        "//tensorflow/core/kernels:mkl_pooling_ops",
        "//tensorflow/core/kernels:mkl_relu_op",
        "//tensorflow/core/kernels:mkl_reshape_op",
        "//tensorflow/core/kernels:mkl_slice_op",
        "//tensorflow/core/kernels:mkl_softmax_op",
        "//tensorflow/core/kernels:mkl_transpose_op",
        "//tensorflow/core/kernels:mkl_tfconv_op",
        "//tensorflow/core/kernels:mkl_aggregate_ops",
    ]) + if_cuda([
        "//tensorflow/core/grappler/optimizers:gpu_swapping_kernels",
        "//tensorflow/core/grappler/optimizers:gpu_swapping_ops",
    ]),
)
TensorFlow Mobile的編譯配置
===== tensorflow/contrib/android/BUILD =====
cc_binary(
    name = "libtensorflow_inference.so",
    srcs = [],
    copts = tf_copts() + [
        "-ffunction-sections",
        "-fdata-sections",
    ],
    linkopts = if_android([
        "-landroid",
        "-latomic",
        "-ldl",
        "-llog",
        "-lm",
        "-z defs",
        "-s",
        "-Wl,--gc-sections",
        "-Wl,--version-script",  # This line must be directly followed by LINKER_SCRIPT.
        "$(location {})".format(LINKER_SCRIPT),
    ]),
    linkshared = 1,
    linkstatic = 1,
    tags = [
        "manual",
        "notap",
    ],
    deps = [
        ":android_tensorflow_inference_jni",
        "//tensorflow/core:android_tensorflow_lib",
        LINKER_SCRIPT,
    ],
)


cc_library(
    name = "android_tensorflow_inference_jni",
    srcs = if_android([":android_tensorflow_inference_jni_srcs"]),
    copts = tf_copts(),
    visibility = ["//visibility:public"],
    deps = [
        "//tensorflow/core:android_tensorflow_lib_lite",
        "//tensorflow/java/src/main/native",
    ],
    alwayslink = 1,
)


===== tensorflow/core/BUILD ===== 
cc_library(
    name = "android_tensorflow_lib",
    srcs = if_android([":android_op_registrations_and_gradients"]),
    copts = tf_copts(),
    tags = [
        "manual",
        "notap",
    ],
    visibility = ["//visibility:public"],
    deps = [
        ":android_tensorflow_lib_lite",
        ":protos_all_cc_impl",
        "//tensorflow/core/kernels:android_tensorflow_kernels",
        "//third_party/eigen3",
        "@protobuf_archive//:protobuf",
    ],
    alwayslink = 1,
)


cc_library(
    name = "android_tensorflow_lib_lite",
    srcs = if_android(["//tensorflow/core:android_srcs"]),
    copts = tf_copts(android_optimization_level_override = None),
    linkopts = ["-lz"],
    tags = [
        "manual",
        "notap",
    ],
    visibility = ["//visibility:public"],
    deps = [
        ":mobile_additional_lib_deps",
        ":protos_all_cc_impl",
        ":stats_calculator_portable",
        "//third_party/eigen3",
        "@double_conversion//:double-conversion",
        "@nsync//:nsync_cpp",
        "@protobuf_archive//:protobuf",
    ],
    alwayslink = 1,
)

alias(
    name = "android_srcs",
    actual = ":mobile_srcs",
    visibility = ["//visibility:public"],
)

filegroup(
    name = "mobile_srcs",
    srcs = [
        ":mobile_srcs_no_runtime",
        ":mobile_srcs_only_runtime",
    ],
    visibility = ["//visibility:public"],
)

# Core sources for Android builds.
filegroup(
    name = "mobile_srcs_no_runtime",
    srcs = [
        ":protos_all_proto_text_srcs",
        ":error_codes_proto_text_srcs",
        "//tensorflow/core/platform/default/build_config:android_srcs",
    ] + glob(
        [
            "client/**/*.cc",
            "framework/**/*.h",
            "framework/**/*.cc",
            "lib/**/*.h",
            "lib/**/*.cc",
            "platform/**/*.h",
            "platform/**/*.cc",
            "public/**/*.h",
            "util/**/*.h",
            "util/**/*.cc",
        ],
        exclude = [
            "**/*test.*",
            "**/*testutil*",
            "**/*testlib*",
            "**/*main.cc",
            "debug/**/*",
            "framework/op_gen_*",
            "lib/jpeg/**/*",
            "lib/png/**/*",
            "lib/gif/**/*",
            "util/events_writer.*",
            "util/stats_calculator.*",
            "util/reporter.*",
            "platform/**/cuda_libdevice_path.*",
            "platform/default/test_benchmark.*",
            "platform/cuda.h",
            "platform/google/**/*",
            "platform/hadoop/**/*",
            "platform/gif.h",
            "platform/jpeg.h",
            "platform/png.h",
            "platform/stream_executor.*",
            "platform/windows/**/*",
            "user_ops/**/*.cu.cc",
            "util/ctc/*.h",
            "util/ctc/*.cc",
            "util/tensor_bundle/*.h",
            "util/tensor_bundle/*.cc",
            "common_runtime/gpu/**/*",
            "common_runtime/eager/*",
            "common_runtime/gpu_device_factory.*",
        ],
    ),
    visibility = ["//visibility:public"],
)

filegroup(
    name = "mobile_srcs_only_runtime",
    srcs = [
        "//tensorflow/core/kernels:android_srcs",
        "//tensorflow/core/util/ctc:android_srcs",
        "//tensorflow/core/util/tensor_bundle:android_srcs",
    ] + glob(
        [
            "common_runtime/**/*.h",
            "common_runtime/**/*.cc",
            "graph/**/*.h",
            "graph/**/*.cc",
        ],
        exclude = [
            "**/*test.*",
            "**/*testutil*",
            "**/*testlib*",
            "**/*main.cc",
            "common_runtime/gpu/**/*",
            "common_runtime/eager/*",
            "common_runtime/gpu_device_factory.*",
            "graph/dot.*",
        ],
    ),
    visibility = ["//visibility:public"],
)

cc_library(
    name = "stats_calculator_portable",
    srcs = [
        "util/stat_summarizer_options.h",
        "util/stats_calculator.cc",
    ],
    hdrs = [
        "util/stats_calculator.h",
    ],
    copts = tf_copts(),
)

cc_library(
    name = "mobile_additional_lib_deps",
    deps = tf_additional_lib_deps() + [
        "@com_google_absl//absl/strings",
    ],
)


===== tensorflow/core/kernels/BUILD ===== 
cc_library(
    name = "android_tensorflow_kernels",
    srcs = select({
        "//tensorflow:android": [
            "//tensorflow/core/kernels:android_core_ops",
            "//tensorflow/core/kernels:android_extended_ops",
        ],
        "//conditions:default": [],
    }),
    copts = tf_copts(),
    linkopts = select({
        "//tensorflow:android": [
            "-ldl",
        ],
        "//conditions:default": [],
    }),
    tags = [
        "manual",
        "notap",
    ],
    visibility = ["//visibility:public"],
    deps = [
        "//tensorflow/core:android_tensorflow_lib_lite",
        "//tensorflow/core:protos_all_cc_impl",
        "//third_party/eigen3",
        "//third_party/fft2d:fft2d_headers",
        "@fft2d",
        "@gemmlowp",
        "@protobuf_archive//:protobuf",
    ],
    alwayslink = 1,
)


# Core kernels we want on Android. Only a subset of kernels to keep
# base library small.
filegroup(
    name = "android_core_ops",
    srcs = [
        "aggregate_ops.cc",
        "aggregate_ops.h",
        "aggregate_ops_cpu.h",
        "assign_op.h",
        "bias_op.cc",
        "bias_op.h",
        "bounds_check.h",
        "cast_op.cc",
        "cast_op.h",
        "cast_op_impl.h",
        "cast_op_impl_bfloat.cc",
        "cast_op_impl_bool.cc",
        "cast_op_impl_complex128.cc",
        "cast_op_impl_complex64.cc",
        "cast_op_impl_double.cc",
        "cast_op_impl_float.cc",
        "cast_op_impl_half.cc",
        "cast_op_impl_int16.cc",
        "cast_op_impl_int32.cc",
        "cast_op_impl_int64.cc",
        "cast_op_impl_int8.cc",
        "cast_op_impl_uint16.cc",
        "cast_op_impl_uint32.cc",
        "cast_op_impl_uint64.cc",
        "cast_op_impl_uint8.cc",
        "concat_lib.h",
        "concat_lib_cpu.cc",
        "concat_lib_cpu.h",
        "concat_op.cc",
        "constant_op.cc",
        "constant_op.h",
        "cwise_ops.h",
        "cwise_ops_common.cc",
        "cwise_ops_common.h",
        "cwise_ops_gradients.h",
        "dense_update_functor.cc",
        "dense_update_functor.h",
        "dense_update_ops.cc",
        "example_parsing_ops.cc",
        "fill_functor.cc",
        "fill_functor.h",
        "function_ops.cc",
        "function_ops.h",
        "gather_functor.h",
        "gather_nd_op.cc",
        "gather_nd_op.h",
        "gather_nd_op_cpu_impl.h",
        "gather_nd_op_cpu_impl_0.cc",
        "gather_nd_op_cpu_impl_1.cc",
        "gather_nd_op_cpu_impl_2.cc",
        "gather_nd_op_cpu_impl_3.cc",
        "gather_nd_op_cpu_impl_4.cc",
        "gather_nd_op_cpu_impl_5.cc",
        "gather_nd_op_cpu_impl_6.cc",
        "gather_nd_op_cpu_impl_7.cc",
        "gather_op.cc",
        "identity_n_op.cc",
        "identity_n_op.h",
        "identity_op.cc",
        "identity_op.h",
        "immutable_constant_op.cc",
        "immutable_constant_op.h",
        "matmul_op.cc",
        "matmul_op.h",
        "no_op.cc",
        "no_op.h",
        "non_max_suppression_op.cc",
        "non_max_suppression_op.h",
        "one_hot_op.cc",
        "one_hot_op.h",
        "ops_util.h",
        "pack_op.cc",
        "pooling_ops_common.h",
        "reshape_op.cc",
        "reshape_op.h",
        "reverse_sequence_op.cc",
        "reverse_sequence_op.h",
        "sendrecv_ops.cc",
        "sendrecv_ops.h",
        "sequence_ops.cc",
        "shape_ops.cc",
        "shape_ops.h",
        "slice_op.cc",
        "slice_op.h",
        "slice_op_cpu_impl.h",
        "slice_op_cpu_impl_1.cc",
        "slice_op_cpu_impl_2.cc",
        "slice_op_cpu_impl_3.cc",
        "slice_op_cpu_impl_4.cc",
        "slice_op_cpu_impl_5.cc",
        "slice_op_cpu_impl_6.cc",
        "slice_op_cpu_impl_7.cc",
        "softmax_op.cc",
        "softmax_op_functor.h",
        "split_lib.h",
        "split_lib_cpu.cc",
        "split_op.cc",
        "split_v_op.cc",
        "strided_slice_op.cc",
        "strided_slice_op.h",
        "strided_slice_op_impl.h",
        "strided_slice_op_inst_0.cc",
        "strided_slice_op_inst_1.cc",
        "strided_slice_op_inst_2.cc",
        "strided_slice_op_inst_3.cc",
        "strided_slice_op_inst_4.cc",
        "strided_slice_op_inst_5.cc",
        "strided_slice_op_inst_6.cc",
        "strided_slice_op_inst_7.cc",
        "unpack_op.cc",
        "variable_ops.cc",
        "variable_ops.h",
    ],
)

# Other kernels we may want on Android.
#
# The kernels can be consumed as a whole or in two groups for
# supporting separate compilation. Note that the split into groups
# is entirely for improving compilation time, and not for
# organizational reasons; you should not depend on any
# of those groups independently.
filegroup(
    name = "android_extended_ops",
    srcs = [
        ":android_extended_ops_group1",
        ":android_extended_ops_group2",
        ":android_quantized_ops",
    ],
    visibility = ["//visibility:public"],
)

filegroup(
    name = "android_extended_ops_headers",
    srcs = [
        "argmax_op.h",
        "avgpooling_op.h",
        "batch_matmul_op_impl.h",
        "batch_norm_op.h",
        "control_flow_ops.h",
        "conv_2d.h",
        "conv_ops.h",
        "data_format_ops.h",
        "depthtospace_op.h",
        "depthwise_conv_op.h",
        "fake_quant_ops_functor.h",
        "fused_batch_norm_op.h",
        "gemm_functors.h",
        "image_resizer_state.h",
        "initializable_lookup_table.h",
        "lookup_table_init_op.h",
        "lookup_table_op.h",
        "lookup_util.h",
        "maxpooling_op.h",
        "mfcc.h",
        "mfcc_dct.h",
        "mfcc_mel_filterbank.h",
        "mirror_pad_op.h",
        "mirror_pad_op_cpu_impl.h",
        "pad_op.h",
        "random_op.h",
        "reduction_ops.h",
        "reduction_ops_common.h",
        "relu_op.h",
        "relu_op_functor.h",
        "reshape_util.h",
        "resize_bilinear_op.h",
        "resize_nearest_neighbor_op.h",
        "reverse_op.h",
        "save_restore_tensor.h",
        "segment_reduction_ops.h",
        "softplus_op.h",
        "softsign_op.h",
        "spacetobatch_functor.h",
        "spacetodepth_op.h",
        "spectrogram.h",
        "string_util.h",
        "tensor_array.h",
        "tile_functor.h",
        "tile_ops_cpu_impl.h",
        "tile_ops_impl.h",
        "topk_op.h",
        "training_op_helpers.h",
        "training_ops.h",
        "transpose_functor.h",
        "transpose_op.h",
        "where_op.h",
        "xent_op.h",
    ],
)

filegroup(
    name = "android_extended_ops_group1",
    srcs = [
        "argmax_op.cc",
        "avgpooling_op.cc",
        "batch_matmul_op_real.cc",
        "batch_norm_op.cc",
        "bcast_ops.cc",
        "check_numerics_op.cc",
        "control_flow_ops.cc",
        "conv_2d.h",
        "conv_grad_filter_ops.cc",
        "conv_grad_input_ops.cc",
        "conv_grad_ops.cc",
        "conv_grad_ops.h",
        "conv_ops.cc",
        "conv_ops_fused.cc",
        "conv_ops_using_gemm.cc",
        "crop_and_resize_op.cc",
        "crop_and_resize_op.h",
        "cwise_op_abs.cc",
        "cwise_op_add_1.cc",
        "cwise_op_add_2.cc",
        "cwise_op_bitwise_and.cc",
        "cwise_op_bitwise_or.cc",
        "cwise_op_bitwise_xor.cc",
        "cwise_op_div.cc",
        "cwise_op_equal_to_1.cc",
        "cwise_op_equal_to_2.cc",
        "cwise_op_not_equal_to_1.cc",
        "cwise_op_not_equal_to_2.cc",
        "cwise_op_exp.cc",
        "cwise_op_floor.cc",
        "cwise_op_floor_div.cc",
        "cwise_op_floor_mod.cc",
        "cwise_op_greater.cc",
        "cwise_op_greater_equal.cc",
        "cwise_op_invert.cc",
        "cwise_op_isfinite.cc",
        "cwise_op_isnan.cc",
        "cwise_op_left_shift.cc",
        "cwise_op_less.cc",
        "cwise_op_less_equal.cc",
        "cwise_op_log.cc",
        "cwise_op_logical_and.cc",
        "cwise_op_logical_not.cc",
        "cwise_op_logical_or.cc",
        "cwise_op_maximum.cc",
        "cwise_op_minimum.cc",
        "cwise_op_mul_1.cc",
        "cwise_op_mul_2.cc",
        "cwise_op_neg.cc",
        "cwise_op_pow.cc",
        "cwise_op_reciprocal.cc",
        "cwise_op_right_shift.cc",
        "cwise_op_round.cc",
        "cwise_op_rsqrt.cc",
        "cwise_op_select.cc",
        "cwise_op_sigmoid.cc",
        "cwise_op_sign.cc",
        "cwise_op_sqrt.cc",
        "cwise_op_square.cc",
        "cwise_op_squared_difference.cc",
        "cwise_op_sub.cc",
        "cwise_op_tanh.cc",
        "cwise_op_xlogy.cc",
        "cwise_op_xdivy.cc",
        "data_format_ops.cc",
        "decode_wav_op.cc",
        "deep_conv2d.cc",
        "deep_conv2d.h",
        "depthwise_conv_op.cc",
        "dynamic_partition_op.cc",
        "encode_wav_op.cc",
        "fake_quant_ops.cc",
        "fifo_queue.cc",
        "fifo_queue_op.cc",
        "fused_batch_norm_op.cc",
        "listdiff_op.cc",
        "population_count_op.cc",
        "population_count_op.h",
        "winograd_transform.h",
        ":android_extended_ops_headers",
    ] + select({
        ":xsmm_convolutions": [
            "xsmm_conv2d.h",
            "xsmm_conv2d.cc",
        ],
        "//conditions:default": [],
    }),
)

filegroup(
    name = "android_extended_ops_group2",
    srcs = [
        "batchtospace_op.cc",
        "ctc_decoder_ops.cc",
        "decode_bmp_op.cc",
        "depthtospace_op.cc",
        "dynamic_stitch_op.cc",
        "in_topk_op.cc",
        "initializable_lookup_table.cc",
        "logging_ops.cc",
        "lookup_table_init_op.cc",
        "lookup_table_op.cc",
        "lookup_util.cc",
        "lrn_op.cc",
        "maxpooling_op.cc",
        "mfcc.cc",
        "mfcc_dct.cc",
        "mfcc_mel_filterbank.cc",
        "mfcc_op.cc",
        "mirror_pad_op.cc",
        "mirror_pad_op_cpu_impl_1.cc",
        "mirror_pad_op_cpu_impl_2.cc",
        "mirror_pad_op_cpu_impl_3.cc",
        "mirror_pad_op_cpu_impl_4.cc",
        "mirror_pad_op_cpu_impl_5.cc",
        "pad_op.cc",
        "padding_fifo_queue.cc",
        "padding_fifo_queue_op.cc",
        "queue_base.cc",
        "queue_op.cc",
        "queue_ops.cc",
        "random_op.cc",
        "reduction_ops_all.cc",
        "reduction_ops_any.cc",
        "reduction_ops_common.cc",
        "reduction_ops_max.cc",
        "reduction_ops_mean.cc",
        "reduction_ops_min.cc",
        "reduction_ops_prod.cc",
        "reduction_ops_sum.cc",
        "relu_op.cc",
        "reshape_util.cc",
        "resize_bilinear_op.cc",
        "resize_nearest_neighbor_op.cc",
        "restore_op.cc",
        "reverse_op.cc",
        "save_op.cc",
        "save_restore_tensor.cc",
        "save_restore_v2_ops.cc",
        "segment_reduction_ops.cc",
        "session_ops.cc",
        "softplus_op.cc",
        "softsign_op.cc",
        "spacetobatch_functor.cc",
        "spacetobatch_op.cc",
        "spacetodepth_op.cc",
        "sparse_fill_empty_rows_op.cc",
        "sparse_reshape_op.cc",
        "sparse_to_dense_op.cc",
        "spectrogram.cc",
        "spectrogram_op.cc",
        "stack_ops.cc",
        "string_join_op.cc",
        "string_util.cc",
        "summary_op.cc",
        "tensor_array.cc",
        "tensor_array_ops.cc",
        "tile_functor_cpu.cc",
        "tile_ops.cc",
        "tile_ops_cpu_impl_1.cc",
        "tile_ops_cpu_impl_2.cc",
        "tile_ops_cpu_impl_3.cc",
        "tile_ops_cpu_impl_4.cc",
        "tile_ops_cpu_impl_5.cc",
        "tile_ops_cpu_impl_6.cc",
        "tile_ops_cpu_impl_7.cc",
        "topk_op.cc",
        "training_op_helpers.cc",
        "training_ops.cc",
        "transpose_functor_cpu.cc",
        "transpose_op.cc",
        "unique_op.cc",
        "where_op.cc",
        "xent_op.cc",
        ":android_extended_ops_headers",
    ],
)

TensorFlow Mobile通過編譯選項,在完整的TensorFlow基礎上進行裁剪,在保留TensorFlow核心功能的同時去掉不必要的程式碼。例如分散式執行的邏輯,windows平臺的相容邏輯,利用gpu計算的邏輯等等。

TensorFlow Mobile的OP支援完整嗎?

TensorFlow Mobile並不包含所有的OP,只有一些核心必要的op,詳見上面android_core_ops和android_extended_ops。

TensorFlow Lite在實現上又有啥區別

TensorFlow Lite的原始碼在tensorflow/contrib/lite目錄下。其核心編譯邏輯如下

### tensorflow/contrib/lite/BUILD
cc_library(
    name = "framework",
    srcs = [
        "allocation.cc",
        "graph_info.cc",
        "interpreter.cc",
        "model.cc",
        "mutable_op_resolver.cc",
        "optional_debug_tools.cc",
        "stderr_reporter.cc",
    ] + select({
        "//tensorflow:android": [
            "nnapi_delegate.cc",
            "mmap_allocation.cc",
        ],
        "//tensorflow:windows": [
            "nnapi_delegate_disabled.cc",
            "mmap_allocation_disabled.cc",
        ],
        "//conditions:default": [
            "nnapi_delegate_disabled.cc",
            "mmap_allocation.cc",
        ],
    }),
    hdrs = [
        "allocation.h",
        "context.h",
        "context_util.h",
        "error_reporter.h",
        "graph_info.h",
        "interpreter.h",
        "model.h",
        "mutable_op_resolver.h",
        "nnapi_delegate.h",
        "op_resolver.h",
        "optional_debug_tools.h",
        "stderr_reporter.h",
    ],
    copts = tflite_copts(),
    linkopts = [
    ] + select({
        "//tensorflow:android": [
            "-llog",
        ],
        "//conditions:default": [
        ],
    }),
    deps = [
        ":arena_planner",
        ":graph_info",
        ":memory_planner",
        ":schema_fbs_version",
        ":simple_memory_arena",
        ":string",
        ":util",
        "//tensorflow/contrib/lite/c:c_api_internal",
        "//tensorflow/contrib/lite/core/api",
        "//tensorflow/contrib/lite/kernels:eigen_support",
        "//tensorflow/contrib/lite/kernels:gemm_support",
        "//tensorflow/contrib/lite/nnapi:nnapi_lib",
        "//tensorflow/contrib/lite/profiling:profiler",
        "//tensorflow/contrib/lite/schema:schema_fbs",
    ],
)

相比TensorFlow Mobile是對完整TensorFlow的裁減,TensorFlow Lite基本就是重新實現了。從內部實現來說,在TensorFlow核心最基本的OP,Context等資料結構,都是新的。從外在表現來說,模型檔案從PB格式改成了FlatBuffers格式,TensorFlow的size有大幅度優化,降至300K,然後提供一個converter將普通TensorFlow模型轉化成TensorFlow Lite需要的格式。因此,無論從哪方面看,TensorFlow Lite都是一個新的實現方案。

參考資料

TensorFlow Architecture
TensorFlow Mobile VS TensorFlow Lite
TensorFlow程式碼解析
TensorFlow Lite