TensorFlowVSTensorFlowMobileVSTensorFlowLite
TensorFlow的簡介
TensorFlow是一個機器學習框架,其整體架構設計主要分成Client,Master和Worker。解耦的架構使得它具有高度靈活性,使它可以方便地在機器叢集上部署。
TensorFlow的程式碼架構
TensorFlow整體架構如下(圖片來自官網)。
Client
Client是演算法工程師直接接觸使用的。有Python,C++,Java等不同的版本。它的主要作用是:
- 將計算過程定義成計算圖。機器學習主要存在命令式和宣告式兩種不同的程式設計模型。指令式程式設計模型就是我們一般的程式設計方式。宣告式模型類似於RxJava那樣,先構建一個資料通道,等事件觸發時,才會真正有資料喂入,並執行。TensorFlow就是宣告式的程式設計模型。演算法工程師利用Client的API,構建一個計算圖。
- 提供Session介面執行計算圖。
Distributed Master
- 將計算圖切分成更小的子計算圖。
- 將子計算圖進一步切分成更小的計算片段,使之能夠並行執行在不同的程式乃至不同的裝置上。
- 將計算片段分發給不同的Worker。
- 觸發Worker執行分配到的計算任務。
Worker Services
- 呼叫TensorFlow核心,根據可用的硬體情況執行計算片段。
- 和其他Worker進行互動,傳送和接收計算結果。
Kernel Implementations
- 提供細粒度,獨立的計算功能(operation),例如加法,減法,字串切割。
移動端的TensorFlow
在端側直接執行模型有節省頻寬,響應及時,不受網路好壞通斷影響更加穩定,無需資料傳輸更加安全等優點。因此端側執行模型是有需求的。在移動裝置或者其他嵌入式裝置上執行TensorFlow,其關注點和雲端就有所不同。需要著重注意更低的功耗,更快的速度,更小的size。當前針對移動裝置,有TensorFlow Mobile和TensorFlow Lite兩種解決方案。TensorFlow Mobile比較早出來,比較穩定,但效能等方面沒有針對移動端作過多優化,目前已不推薦使用,預計到2019年初就會被廢棄。
根據官網的介紹,TensorFlow Mobile和TensorFlow Lite的主要區別是:
- TensorFlow Lite是TensorFlow Mobile的進化版。在大多數情況下,TensorFlow Lite擁有跟小的二進位制大小,更少的依賴以及更好的效能。
- TensorFlow Lite尚在開發階段,可能存在一些功能尚未補齊。不過官方承諾正在加大力度開發。
- TensorFlow Lite支援的OP比較有限,相比之下TensorFlow Mobile更加全面。
從原始碼看區別
以上是官網的介紹,然而看這介紹依然比較模糊。TensorFlow Mobile到底精簡了啥,它支援哪些OP?TensorFlow Lite在實現上到底有何區別?為搞清這些問題,只有分析原始碼了。
TensorFlow 程式碼目錄介紹
Tensorflow/core目錄包含了TF核心模組程式碼。
public: API介面標頭檔案目錄,用於外部介面呼叫的API定義,主要是session.h 和tensor_c_api.h。
client: API介面實現檔案目錄。
platform: OS系統相關介面檔案,如file system, env等。
protobuf: 均為.proto檔案,用於資料傳輸時的結構序列化.
common_runtime: 公共執行庫,包含session, executor, threadpool, rendezvous, memory管理, 裝置分配演算法等。
distributed_runtime: 分散式執行模組,如rpc session, rpc master, rpc worker, graph manager。
framework: 包含基礎功能模組,如log, memory, tensor
graph: 計算流圖相關操作,如construct, partition, optimize, execute等
kernels: 核心Op,如matmul, conv2d, argmax, batch_norm等
lib: 公共基礎庫,如gif、gtl(google模板庫)、hash、histogram等。
ops: 基本ops運算,ops梯度運算,io相關的ops,控制流和資料流操作
Tensorflow/stream_executor目錄是平行計算框架,由google stream executor團隊開發。
Tensorflow/contrib目錄是contributor開發目錄,其中android目錄下是android版本的TensorFlow mobile。lite目錄下正是TensorFlow lite的原始碼。
Tensroflow/python目錄是python API客戶端指令碼。
Tensorflow/tensorboard目錄是視覺化分析工具,不僅可以模型視覺化,還可以監控模型引數變化。
third_party目錄是TF第三方依賴庫。
eigen3: eigen矩陣運算庫,TF基礎ops呼叫
gpus: 封裝了cuda/cudnn程式設計庫
TensorFlow Mobile精簡了啥?
TensorFlow採用bazel進行編譯,因此我們可以通過檢視編譯檔案來分析區別。
TensorFlow預設的編譯配置
===== /tensorflow/BUILD =====
tf_cc_shared_object(
name = "libtensorflow.so",
linkopts = select({
"//tensorflow:darwin": [
"-Wl,-exported_symbols_list", # This line must be directly followed by the exported_symbols.lds file
"$(location //tensorflow/c:exported_symbols.lds)",
"-Wl,-install_name,@rpath/libtensorflow.so",
],
"//tensorflow:windows": [],
"//conditions:default": [
"-z defs",
"-Wl,--version-script", # This line must be directly followed by the version_script.lds file
"$(location //tensorflow/c:version_script.lds)",
],
}),
visibility = ["//visibility:public"],
deps = [
"//tensorflow/c:c_api",
"//tensorflow/c:c_api_experimental",
"//tensorflow/c:exported_symbols.lds",
"//tensorflow/c:version_script.lds",
"//tensorflow/c/eager:c_api",
"//tensorflow/core:tensorflow",
],
)
===== /tensorflow/c/BUILD =====
tf_cuda_library(
name = "c_api",
srcs = [
"c_api.cc",
"c_api_function.cc",
],
hdrs = [
"c_api.h",
],
copts = tf_copts(),
visibility = ["//visibility:public"],
deps = select({
"//tensorflow:android": [
":c_api_internal",
"//tensorflow/core:android_tensorflow_lib_lite",
],
"//conditions:default": [
":c_api_internal",
"//tensorflow/cc/saved_model:loader",
"//tensorflow/cc:gradients",
"//tensorflow/cc:ops",
"//tensorflow/cc:grad_ops",
"//tensorflow/cc:scope_internal",
"//tensorflow/cc:while_loop",
"//tensorflow/core:core_cpu",
"//tensorflow/core:core_cpu_internal",
"//tensorflow/core:framework",
"//tensorflow/core:op_gen_lib",
"//tensorflow/core:protos_all_cc",
"//tensorflow/core:lib",
"//tensorflow/core:lib_internal",
],
}) + select({
"//tensorflow:with_xla_support": [
"//tensorflow/compiler/tf2xla:xla_compiler",
"//tensorflow/compiler/jit",
],
"//conditions:default": [],
}),
)
tf_cuda_library(
name = "c_api_experimental",
srcs = [
"c_api_experimental.cc",
],
hdrs = [
"c_api_experimental.h",
],
copts = tf_copts(),
visibility = ["//visibility:public"],
deps = [
":c_api",
":c_api_internal",
"//tensorflow/c/eager:c_api",
"//tensorflow/compiler/jit/legacy_flags:mark_for_compilation_pass_flags",
"//tensorflow/contrib/tpu:all_ops",
"//tensorflow/core:core_cpu",
"//tensorflow/core:framework",
"//tensorflow/core:lib",
"//tensorflow/core:lib_platform",
"//tensorflow/core:protos_all_cc",
],
)
===== /tensorflow/c/eager/BUILD =====
tf_cuda_library(
name = "c_api",
srcs = [
"c_api.cc",
"c_api_debug.cc",
"c_api_internal.h",
],
hdrs = ["c_api.h"],
copts = tf_copts() + tfe_xla_copts(),
visibility = ["//visibility:public"],
deps = select({
"//tensorflow:android": [
"//tensorflow/core:android_tensorflow_lib_lite",
],
"//conditions:default": [
"//tensorflow/c:c_api",
"//tensorflow/c:c_api_internal",
"//tensorflow/core:core_cpu",
"//tensorflow/core/common_runtime/eager:attr_builder",
"//tensorflow/core/common_runtime/eager:context",
"//tensorflow/core/common_runtime/eager:eager_executor",
"//tensorflow/core/common_runtime/eager:execute",
"//tensorflow/core/common_runtime/eager:kernel_and_device",
"//tensorflow/core/common_runtime/eager:tensor_handle",
"//tensorflow/core/common_runtime/eager:copy_to_device_node",
"//tensorflow/core:core_cpu_internal",
"//tensorflow/core:framework",
"//tensorflow/core:framework_internal",
"//tensorflow/core:lib",
"//tensorflow/core:lib_internal",
"//tensorflow/core:protos_all_cc",
],
}) + select({
"//tensorflow:with_xla_support": [
"//tensorflow/compiler/tf2xla:xla_compiler",
"//tensorflow/compiler/jit",
"//tensorflow/compiler/jit:xla_device",
],
"//conditions:default": [],
}) + [
"//tensorflow/core/common_runtime/eager:eager_operation",
"//tensorflow/core/distributed_runtime/eager:eager_client",
"//tensorflow/core/distributed_runtime/rpc/eager:grpc_eager_client",
"//tensorflow/core/distributed_runtime/rpc:grpc_channel",
"//tensorflow/core/distributed_runtime/rpc:grpc_server_lib",
"//tensorflow/core/distributed_runtime/rpc:grpc_worker_cache",
"//tensorflow/core/distributed_runtime/rpc:grpc_worker_service",
"//tensorflow/core/distributed_runtime/rpc:rpc_rendezvous_mgr",
"//tensorflow/core/distributed_runtime:remote_device",
"//tensorflow/core/distributed_runtime:server_lib",
"//tensorflow/core/distributed_runtime:worker_env",
"//tensorflow/core:gpu_runtime",
],
)
===== /tensorflow/core/BUILD =====
cc_library(
name = "tensorflow",
visibility = ["//visibility:public"],
deps = [
":tensorflow_opensource",
"//tensorflow/core/platform/default/build_config:tensorflow_platform_specific",
],
)
tf_cuda_library(
name = "tensorflow_opensource",
copts = tf_copts(),
visibility = ["//visibility:public"],
deps = [
":all_kernels",
":core",
":direct_session",
":example_parser_configuration",
":gpu_runtime",
":lib",
],
)
cc_library(
name = "all_kernels",
visibility = ["//visibility:public"],
deps = if_dynamic_kernels(
[],
otherwise = [":all_kernels_statically_linked"],
),
)
# This is a link-only library to provide a DirectSession
# implementation of the Session interface.
tf_cuda_library(
name = "direct_session",
copts = tf_copts(),
linkstatic = 1,
visibility = ["//visibility:public"],
deps = [
":direct_session_internal",
],
alwayslink = 1,
)
filegroup(
name = "example_parser_configuration_testdata",
srcs = [
"example/testdata/parse_example_graph_def.pbtxt",
],
)
cc_library(
name = "core",
visibility = ["//visibility:public"],
deps = [
":core_cpu",
":gpu_runtime",
":sycl_runtime",
],
)
cc_library(
name = "lib",
hdrs = [
"lib/bfloat16/bfloat16.h",
"lib/core/arena.h",
"lib/core/bitmap.h",
"lib/core/bits.h",
"lib/core/casts.h",
"lib/core/coding.h",
"lib/core/errors.h",
"lib/core/notification.h",
"lib/core/raw_coding.h",
"lib/core/status.h",
"lib/core/stringpiece.h",
"lib/core/threadpool.h",
"lib/gtl/array_slice.h",
"lib/gtl/cleanup.h",
"lib/gtl/compactptrset.h",
"lib/gtl/flatmap.h",
"lib/gtl/flatset.h",
"lib/gtl/inlined_vector.h",
"lib/gtl/optional.h",
"lib/gtl/priority_queue_util.h",
"lib/hash/crc32c.h",
"lib/hash/hash.h",
"lib/histogram/histogram.h",
"lib/io/buffered_inputstream.h",
"lib/io/compression.h",
"lib/io/inputstream_interface.h",
"lib/io/path.h",
"lib/io/proto_encode_helper.h",
"lib/io/random_inputstream.h",
"lib/io/record_reader.h",
"lib/io/record_writer.h",
"lib/io/table.h",
"lib/io/table_builder.h",
"lib/io/table_options.h",
"lib/math/math_util.h",
"lib/monitoring/collected_metrics.h",
"lib/monitoring/collection_registry.h",
"lib/monitoring/counter.h",
"lib/monitoring/gauge.h",
"lib/monitoring/metric_def.h",
"lib/monitoring/sampler.h",
"lib/random/distribution_sampler.h",
"lib/random/philox_random.h",
"lib/random/random_distributions.h",
"lib/random/simple_philox.h",
"lib/strings/numbers.h",
"lib/strings/proto_serialization.h",
"lib/strings/str_util.h",
"lib/strings/strcat.h",
"lib/strings/stringprintf.h",
":platform_base_hdrs",
":platform_env_hdrs",
":platform_file_system_hdrs",
":platform_other_hdrs",
":platform_port_hdrs",
":platform_protobuf_hdrs",
],
visibility = ["//visibility:public"],
deps = [
":lib_internal",
"@com_google_absl//absl/container:inlined_vector",
"@com_google_absl//absl/strings",
"@com_google_absl//absl/types:optional",
],
)
# This includes implementations of all kernels built into TensorFlow.
cc_library(
name = "all_kernels_statically_linked",
visibility = ["//visibility:private"],
deps = [
"//tensorflow/core/kernels:array",
"//tensorflow/core/kernels:audio",
"//tensorflow/core/kernels:batch_kernels",
"//tensorflow/core/kernels:bincount_op",
"//tensorflow/core/kernels:boosted_trees_ops",
"//tensorflow/core/kernels:candidate_sampler_ops",
"//tensorflow/core/kernels:checkpoint_ops",
"//tensorflow/core/kernels:collective_ops",
"//tensorflow/core/kernels:control_flow_ops",
"//tensorflow/core/kernels:ctc_ops",
"//tensorflow/core/kernels:cudnn_rnn_kernels",
"//tensorflow/core/kernels:data_flow",
"//tensorflow/core/kernels:dataset_ops",
"//tensorflow/core/kernels:decode_proto_op",
"//tensorflow/core/kernels:encode_proto_op",
"//tensorflow/core/kernels:fake_quant_ops",
"//tensorflow/core/kernels:function_ops",
"//tensorflow/core/kernels:functional_ops",
"//tensorflow/core/kernels:grappler",
"//tensorflow/core/kernels:histogram_op",
"//tensorflow/core/kernels:image",
"//tensorflow/core/kernels:io",
"//tensorflow/core/kernels:linalg",
"//tensorflow/core/kernels:list_kernels",
"//tensorflow/core/kernels:lookup",
"//tensorflow/core/kernels:logging",
"//tensorflow/core/kernels:manip",
"//tensorflow/core/kernels:math",
"//tensorflow/core/kernels:multinomial_op",
"//tensorflow/core/kernels:nn",
"//tensorflow/core/kernels:parameterized_truncated_normal_op",
"//tensorflow/core/kernels:parsing",
"//tensorflow/core/kernels:partitioned_function_ops",
"//tensorflow/core/kernels:random_ops",
"//tensorflow/core/kernels:random_poisson_op",
"//tensorflow/core/kernels:remote_fused_graph_ops",
"//tensorflow/core/kernels:required",
"//tensorflow/core/kernels:resource_variable_ops",
"//tensorflow/core/kernels:rpc_op",
"//tensorflow/core/kernels:scoped_allocator_ops",
"//tensorflow/core/kernels:sdca_ops",
"//tensorflow/core/kernels:searchsorted_op",
"//tensorflow/core/kernels:set_kernels",
"//tensorflow/core/kernels:sparse",
"//tensorflow/core/kernels:state",
"//tensorflow/core/kernels:stateless_random_ops",
"//tensorflow/core/kernels:string",
"//tensorflow/core/kernels:summary_kernels",
"//tensorflow/core/kernels:training_ops",
"//tensorflow/core/kernels:word2vec_kernels",
] + tf_additional_cloud_kernel_deps() + if_not_windows([
"//tensorflow/core/kernels:fact_op",
"//tensorflow/core/kernels:array_not_windows",
"//tensorflow/core/kernels:math_not_windows",
"//tensorflow/core/kernels:quantized_ops",
"//tensorflow/core/kernels/neon:neon_depthwise_conv_op",
]) + if_mkl([
"//tensorflow/core/kernels:mkl_concat_op",
"//tensorflow/core/kernels:mkl_conv_op",
"//tensorflow/core/kernels:mkl_cwise_ops_common",
"//tensorflow/core/kernels:mkl_fused_batch_norm_op",
"//tensorflow/core/kernels:mkl_identity_op",
"//tensorflow/core/kernels:mkl_input_conversion_op",
"//tensorflow/core/kernels:mkl_lrn_op",
"//tensorflow/core/kernels:mkl_pooling_ops",
"//tensorflow/core/kernels:mkl_relu_op",
"//tensorflow/core/kernels:mkl_reshape_op",
"//tensorflow/core/kernels:mkl_slice_op",
"//tensorflow/core/kernels:mkl_softmax_op",
"//tensorflow/core/kernels:mkl_transpose_op",
"//tensorflow/core/kernels:mkl_tfconv_op",
"//tensorflow/core/kernels:mkl_aggregate_ops",
]) + if_cuda([
"//tensorflow/core/grappler/optimizers:gpu_swapping_kernels",
"//tensorflow/core/grappler/optimizers:gpu_swapping_ops",
]),
)
TensorFlow Mobile的編譯配置
===== tensorflow/contrib/android/BUILD =====
cc_binary(
name = "libtensorflow_inference.so",
srcs = [],
copts = tf_copts() + [
"-ffunction-sections",
"-fdata-sections",
],
linkopts = if_android([
"-landroid",
"-latomic",
"-ldl",
"-llog",
"-lm",
"-z defs",
"-s",
"-Wl,--gc-sections",
"-Wl,--version-script", # This line must be directly followed by LINKER_SCRIPT.
"$(location {})".format(LINKER_SCRIPT),
]),
linkshared = 1,
linkstatic = 1,
tags = [
"manual",
"notap",
],
deps = [
":android_tensorflow_inference_jni",
"//tensorflow/core:android_tensorflow_lib",
LINKER_SCRIPT,
],
)
cc_library(
name = "android_tensorflow_inference_jni",
srcs = if_android([":android_tensorflow_inference_jni_srcs"]),
copts = tf_copts(),
visibility = ["//visibility:public"],
deps = [
"//tensorflow/core:android_tensorflow_lib_lite",
"//tensorflow/java/src/main/native",
],
alwayslink = 1,
)
===== tensorflow/core/BUILD =====
cc_library(
name = "android_tensorflow_lib",
srcs = if_android([":android_op_registrations_and_gradients"]),
copts = tf_copts(),
tags = [
"manual",
"notap",
],
visibility = ["//visibility:public"],
deps = [
":android_tensorflow_lib_lite",
":protos_all_cc_impl",
"//tensorflow/core/kernels:android_tensorflow_kernels",
"//third_party/eigen3",
"@protobuf_archive//:protobuf",
],
alwayslink = 1,
)
cc_library(
name = "android_tensorflow_lib_lite",
srcs = if_android(["//tensorflow/core:android_srcs"]),
copts = tf_copts(android_optimization_level_override = None),
linkopts = ["-lz"],
tags = [
"manual",
"notap",
],
visibility = ["//visibility:public"],
deps = [
":mobile_additional_lib_deps",
":protos_all_cc_impl",
":stats_calculator_portable",
"//third_party/eigen3",
"@double_conversion//:double-conversion",
"@nsync//:nsync_cpp",
"@protobuf_archive//:protobuf",
],
alwayslink = 1,
)
alias(
name = "android_srcs",
actual = ":mobile_srcs",
visibility = ["//visibility:public"],
)
filegroup(
name = "mobile_srcs",
srcs = [
":mobile_srcs_no_runtime",
":mobile_srcs_only_runtime",
],
visibility = ["//visibility:public"],
)
# Core sources for Android builds.
filegroup(
name = "mobile_srcs_no_runtime",
srcs = [
":protos_all_proto_text_srcs",
":error_codes_proto_text_srcs",
"//tensorflow/core/platform/default/build_config:android_srcs",
] + glob(
[
"client/**/*.cc",
"framework/**/*.h",
"framework/**/*.cc",
"lib/**/*.h",
"lib/**/*.cc",
"platform/**/*.h",
"platform/**/*.cc",
"public/**/*.h",
"util/**/*.h",
"util/**/*.cc",
],
exclude = [
"**/*test.*",
"**/*testutil*",
"**/*testlib*",
"**/*main.cc",
"debug/**/*",
"framework/op_gen_*",
"lib/jpeg/**/*",
"lib/png/**/*",
"lib/gif/**/*",
"util/events_writer.*",
"util/stats_calculator.*",
"util/reporter.*",
"platform/**/cuda_libdevice_path.*",
"platform/default/test_benchmark.*",
"platform/cuda.h",
"platform/google/**/*",
"platform/hadoop/**/*",
"platform/gif.h",
"platform/jpeg.h",
"platform/png.h",
"platform/stream_executor.*",
"platform/windows/**/*",
"user_ops/**/*.cu.cc",
"util/ctc/*.h",
"util/ctc/*.cc",
"util/tensor_bundle/*.h",
"util/tensor_bundle/*.cc",
"common_runtime/gpu/**/*",
"common_runtime/eager/*",
"common_runtime/gpu_device_factory.*",
],
),
visibility = ["//visibility:public"],
)
filegroup(
name = "mobile_srcs_only_runtime",
srcs = [
"//tensorflow/core/kernels:android_srcs",
"//tensorflow/core/util/ctc:android_srcs",
"//tensorflow/core/util/tensor_bundle:android_srcs",
] + glob(
[
"common_runtime/**/*.h",
"common_runtime/**/*.cc",
"graph/**/*.h",
"graph/**/*.cc",
],
exclude = [
"**/*test.*",
"**/*testutil*",
"**/*testlib*",
"**/*main.cc",
"common_runtime/gpu/**/*",
"common_runtime/eager/*",
"common_runtime/gpu_device_factory.*",
"graph/dot.*",
],
),
visibility = ["//visibility:public"],
)
cc_library(
name = "stats_calculator_portable",
srcs = [
"util/stat_summarizer_options.h",
"util/stats_calculator.cc",
],
hdrs = [
"util/stats_calculator.h",
],
copts = tf_copts(),
)
cc_library(
name = "mobile_additional_lib_deps",
deps = tf_additional_lib_deps() + [
"@com_google_absl//absl/strings",
],
)
===== tensorflow/core/kernels/BUILD =====
cc_library(
name = "android_tensorflow_kernels",
srcs = select({
"//tensorflow:android": [
"//tensorflow/core/kernels:android_core_ops",
"//tensorflow/core/kernels:android_extended_ops",
],
"//conditions:default": [],
}),
copts = tf_copts(),
linkopts = select({
"//tensorflow:android": [
"-ldl",
],
"//conditions:default": [],
}),
tags = [
"manual",
"notap",
],
visibility = ["//visibility:public"],
deps = [
"//tensorflow/core:android_tensorflow_lib_lite",
"//tensorflow/core:protos_all_cc_impl",
"//third_party/eigen3",
"//third_party/fft2d:fft2d_headers",
"@fft2d",
"@gemmlowp",
"@protobuf_archive//:protobuf",
],
alwayslink = 1,
)
# Core kernels we want on Android. Only a subset of kernels to keep
# base library small.
filegroup(
name = "android_core_ops",
srcs = [
"aggregate_ops.cc",
"aggregate_ops.h",
"aggregate_ops_cpu.h",
"assign_op.h",
"bias_op.cc",
"bias_op.h",
"bounds_check.h",
"cast_op.cc",
"cast_op.h",
"cast_op_impl.h",
"cast_op_impl_bfloat.cc",
"cast_op_impl_bool.cc",
"cast_op_impl_complex128.cc",
"cast_op_impl_complex64.cc",
"cast_op_impl_double.cc",
"cast_op_impl_float.cc",
"cast_op_impl_half.cc",
"cast_op_impl_int16.cc",
"cast_op_impl_int32.cc",
"cast_op_impl_int64.cc",
"cast_op_impl_int8.cc",
"cast_op_impl_uint16.cc",
"cast_op_impl_uint32.cc",
"cast_op_impl_uint64.cc",
"cast_op_impl_uint8.cc",
"concat_lib.h",
"concat_lib_cpu.cc",
"concat_lib_cpu.h",
"concat_op.cc",
"constant_op.cc",
"constant_op.h",
"cwise_ops.h",
"cwise_ops_common.cc",
"cwise_ops_common.h",
"cwise_ops_gradients.h",
"dense_update_functor.cc",
"dense_update_functor.h",
"dense_update_ops.cc",
"example_parsing_ops.cc",
"fill_functor.cc",
"fill_functor.h",
"function_ops.cc",
"function_ops.h",
"gather_functor.h",
"gather_nd_op.cc",
"gather_nd_op.h",
"gather_nd_op_cpu_impl.h",
"gather_nd_op_cpu_impl_0.cc",
"gather_nd_op_cpu_impl_1.cc",
"gather_nd_op_cpu_impl_2.cc",
"gather_nd_op_cpu_impl_3.cc",
"gather_nd_op_cpu_impl_4.cc",
"gather_nd_op_cpu_impl_5.cc",
"gather_nd_op_cpu_impl_6.cc",
"gather_nd_op_cpu_impl_7.cc",
"gather_op.cc",
"identity_n_op.cc",
"identity_n_op.h",
"identity_op.cc",
"identity_op.h",
"immutable_constant_op.cc",
"immutable_constant_op.h",
"matmul_op.cc",
"matmul_op.h",
"no_op.cc",
"no_op.h",
"non_max_suppression_op.cc",
"non_max_suppression_op.h",
"one_hot_op.cc",
"one_hot_op.h",
"ops_util.h",
"pack_op.cc",
"pooling_ops_common.h",
"reshape_op.cc",
"reshape_op.h",
"reverse_sequence_op.cc",
"reverse_sequence_op.h",
"sendrecv_ops.cc",
"sendrecv_ops.h",
"sequence_ops.cc",
"shape_ops.cc",
"shape_ops.h",
"slice_op.cc",
"slice_op.h",
"slice_op_cpu_impl.h",
"slice_op_cpu_impl_1.cc",
"slice_op_cpu_impl_2.cc",
"slice_op_cpu_impl_3.cc",
"slice_op_cpu_impl_4.cc",
"slice_op_cpu_impl_5.cc",
"slice_op_cpu_impl_6.cc",
"slice_op_cpu_impl_7.cc",
"softmax_op.cc",
"softmax_op_functor.h",
"split_lib.h",
"split_lib_cpu.cc",
"split_op.cc",
"split_v_op.cc",
"strided_slice_op.cc",
"strided_slice_op.h",
"strided_slice_op_impl.h",
"strided_slice_op_inst_0.cc",
"strided_slice_op_inst_1.cc",
"strided_slice_op_inst_2.cc",
"strided_slice_op_inst_3.cc",
"strided_slice_op_inst_4.cc",
"strided_slice_op_inst_5.cc",
"strided_slice_op_inst_6.cc",
"strided_slice_op_inst_7.cc",
"unpack_op.cc",
"variable_ops.cc",
"variable_ops.h",
],
)
# Other kernels we may want on Android.
#
# The kernels can be consumed as a whole or in two groups for
# supporting separate compilation. Note that the split into groups
# is entirely for improving compilation time, and not for
# organizational reasons; you should not depend on any
# of those groups independently.
filegroup(
name = "android_extended_ops",
srcs = [
":android_extended_ops_group1",
":android_extended_ops_group2",
":android_quantized_ops",
],
visibility = ["//visibility:public"],
)
filegroup(
name = "android_extended_ops_headers",
srcs = [
"argmax_op.h",
"avgpooling_op.h",
"batch_matmul_op_impl.h",
"batch_norm_op.h",
"control_flow_ops.h",
"conv_2d.h",
"conv_ops.h",
"data_format_ops.h",
"depthtospace_op.h",
"depthwise_conv_op.h",
"fake_quant_ops_functor.h",
"fused_batch_norm_op.h",
"gemm_functors.h",
"image_resizer_state.h",
"initializable_lookup_table.h",
"lookup_table_init_op.h",
"lookup_table_op.h",
"lookup_util.h",
"maxpooling_op.h",
"mfcc.h",
"mfcc_dct.h",
"mfcc_mel_filterbank.h",
"mirror_pad_op.h",
"mirror_pad_op_cpu_impl.h",
"pad_op.h",
"random_op.h",
"reduction_ops.h",
"reduction_ops_common.h",
"relu_op.h",
"relu_op_functor.h",
"reshape_util.h",
"resize_bilinear_op.h",
"resize_nearest_neighbor_op.h",
"reverse_op.h",
"save_restore_tensor.h",
"segment_reduction_ops.h",
"softplus_op.h",
"softsign_op.h",
"spacetobatch_functor.h",
"spacetodepth_op.h",
"spectrogram.h",
"string_util.h",
"tensor_array.h",
"tile_functor.h",
"tile_ops_cpu_impl.h",
"tile_ops_impl.h",
"topk_op.h",
"training_op_helpers.h",
"training_ops.h",
"transpose_functor.h",
"transpose_op.h",
"where_op.h",
"xent_op.h",
],
)
filegroup(
name = "android_extended_ops_group1",
srcs = [
"argmax_op.cc",
"avgpooling_op.cc",
"batch_matmul_op_real.cc",
"batch_norm_op.cc",
"bcast_ops.cc",
"check_numerics_op.cc",
"control_flow_ops.cc",
"conv_2d.h",
"conv_grad_filter_ops.cc",
"conv_grad_input_ops.cc",
"conv_grad_ops.cc",
"conv_grad_ops.h",
"conv_ops.cc",
"conv_ops_fused.cc",
"conv_ops_using_gemm.cc",
"crop_and_resize_op.cc",
"crop_and_resize_op.h",
"cwise_op_abs.cc",
"cwise_op_add_1.cc",
"cwise_op_add_2.cc",
"cwise_op_bitwise_and.cc",
"cwise_op_bitwise_or.cc",
"cwise_op_bitwise_xor.cc",
"cwise_op_div.cc",
"cwise_op_equal_to_1.cc",
"cwise_op_equal_to_2.cc",
"cwise_op_not_equal_to_1.cc",
"cwise_op_not_equal_to_2.cc",
"cwise_op_exp.cc",
"cwise_op_floor.cc",
"cwise_op_floor_div.cc",
"cwise_op_floor_mod.cc",
"cwise_op_greater.cc",
"cwise_op_greater_equal.cc",
"cwise_op_invert.cc",
"cwise_op_isfinite.cc",
"cwise_op_isnan.cc",
"cwise_op_left_shift.cc",
"cwise_op_less.cc",
"cwise_op_less_equal.cc",
"cwise_op_log.cc",
"cwise_op_logical_and.cc",
"cwise_op_logical_not.cc",
"cwise_op_logical_or.cc",
"cwise_op_maximum.cc",
"cwise_op_minimum.cc",
"cwise_op_mul_1.cc",
"cwise_op_mul_2.cc",
"cwise_op_neg.cc",
"cwise_op_pow.cc",
"cwise_op_reciprocal.cc",
"cwise_op_right_shift.cc",
"cwise_op_round.cc",
"cwise_op_rsqrt.cc",
"cwise_op_select.cc",
"cwise_op_sigmoid.cc",
"cwise_op_sign.cc",
"cwise_op_sqrt.cc",
"cwise_op_square.cc",
"cwise_op_squared_difference.cc",
"cwise_op_sub.cc",
"cwise_op_tanh.cc",
"cwise_op_xlogy.cc",
"cwise_op_xdivy.cc",
"data_format_ops.cc",
"decode_wav_op.cc",
"deep_conv2d.cc",
"deep_conv2d.h",
"depthwise_conv_op.cc",
"dynamic_partition_op.cc",
"encode_wav_op.cc",
"fake_quant_ops.cc",
"fifo_queue.cc",
"fifo_queue_op.cc",
"fused_batch_norm_op.cc",
"listdiff_op.cc",
"population_count_op.cc",
"population_count_op.h",
"winograd_transform.h",
":android_extended_ops_headers",
] + select({
":xsmm_convolutions": [
"xsmm_conv2d.h",
"xsmm_conv2d.cc",
],
"//conditions:default": [],
}),
)
filegroup(
name = "android_extended_ops_group2",
srcs = [
"batchtospace_op.cc",
"ctc_decoder_ops.cc",
"decode_bmp_op.cc",
"depthtospace_op.cc",
"dynamic_stitch_op.cc",
"in_topk_op.cc",
"initializable_lookup_table.cc",
"logging_ops.cc",
"lookup_table_init_op.cc",
"lookup_table_op.cc",
"lookup_util.cc",
"lrn_op.cc",
"maxpooling_op.cc",
"mfcc.cc",
"mfcc_dct.cc",
"mfcc_mel_filterbank.cc",
"mfcc_op.cc",
"mirror_pad_op.cc",
"mirror_pad_op_cpu_impl_1.cc",
"mirror_pad_op_cpu_impl_2.cc",
"mirror_pad_op_cpu_impl_3.cc",
"mirror_pad_op_cpu_impl_4.cc",
"mirror_pad_op_cpu_impl_5.cc",
"pad_op.cc",
"padding_fifo_queue.cc",
"padding_fifo_queue_op.cc",
"queue_base.cc",
"queue_op.cc",
"queue_ops.cc",
"random_op.cc",
"reduction_ops_all.cc",
"reduction_ops_any.cc",
"reduction_ops_common.cc",
"reduction_ops_max.cc",
"reduction_ops_mean.cc",
"reduction_ops_min.cc",
"reduction_ops_prod.cc",
"reduction_ops_sum.cc",
"relu_op.cc",
"reshape_util.cc",
"resize_bilinear_op.cc",
"resize_nearest_neighbor_op.cc",
"restore_op.cc",
"reverse_op.cc",
"save_op.cc",
"save_restore_tensor.cc",
"save_restore_v2_ops.cc",
"segment_reduction_ops.cc",
"session_ops.cc",
"softplus_op.cc",
"softsign_op.cc",
"spacetobatch_functor.cc",
"spacetobatch_op.cc",
"spacetodepth_op.cc",
"sparse_fill_empty_rows_op.cc",
"sparse_reshape_op.cc",
"sparse_to_dense_op.cc",
"spectrogram.cc",
"spectrogram_op.cc",
"stack_ops.cc",
"string_join_op.cc",
"string_util.cc",
"summary_op.cc",
"tensor_array.cc",
"tensor_array_ops.cc",
"tile_functor_cpu.cc",
"tile_ops.cc",
"tile_ops_cpu_impl_1.cc",
"tile_ops_cpu_impl_2.cc",
"tile_ops_cpu_impl_3.cc",
"tile_ops_cpu_impl_4.cc",
"tile_ops_cpu_impl_5.cc",
"tile_ops_cpu_impl_6.cc",
"tile_ops_cpu_impl_7.cc",
"topk_op.cc",
"training_op_helpers.cc",
"training_ops.cc",
"transpose_functor_cpu.cc",
"transpose_op.cc",
"unique_op.cc",
"where_op.cc",
"xent_op.cc",
":android_extended_ops_headers",
],
)
TensorFlow Mobile通過編譯選項,在完整的TensorFlow基礎上進行裁剪,在保留TensorFlow核心功能的同時去掉不必要的程式碼。例如分散式執行的邏輯,windows平臺的相容邏輯,利用gpu計算的邏輯等等。
TensorFlow Mobile的OP支援完整嗎?
TensorFlow Mobile並不包含所有的OP,只有一些核心必要的op,詳見上面android_core_ops和android_extended_ops。
TensorFlow Lite在實現上又有啥區別
TensorFlow Lite的原始碼在tensorflow/contrib/lite目錄下。其核心編譯邏輯如下
### tensorflow/contrib/lite/BUILD
cc_library(
name = "framework",
srcs = [
"allocation.cc",
"graph_info.cc",
"interpreter.cc",
"model.cc",
"mutable_op_resolver.cc",
"optional_debug_tools.cc",
"stderr_reporter.cc",
] + select({
"//tensorflow:android": [
"nnapi_delegate.cc",
"mmap_allocation.cc",
],
"//tensorflow:windows": [
"nnapi_delegate_disabled.cc",
"mmap_allocation_disabled.cc",
],
"//conditions:default": [
"nnapi_delegate_disabled.cc",
"mmap_allocation.cc",
],
}),
hdrs = [
"allocation.h",
"context.h",
"context_util.h",
"error_reporter.h",
"graph_info.h",
"interpreter.h",
"model.h",
"mutable_op_resolver.h",
"nnapi_delegate.h",
"op_resolver.h",
"optional_debug_tools.h",
"stderr_reporter.h",
],
copts = tflite_copts(),
linkopts = [
] + select({
"//tensorflow:android": [
"-llog",
],
"//conditions:default": [
],
}),
deps = [
":arena_planner",
":graph_info",
":memory_planner",
":schema_fbs_version",
":simple_memory_arena",
":string",
":util",
"//tensorflow/contrib/lite/c:c_api_internal",
"//tensorflow/contrib/lite/core/api",
"//tensorflow/contrib/lite/kernels:eigen_support",
"//tensorflow/contrib/lite/kernels:gemm_support",
"//tensorflow/contrib/lite/nnapi:nnapi_lib",
"//tensorflow/contrib/lite/profiling:profiler",
"//tensorflow/contrib/lite/schema:schema_fbs",
],
)
相比TensorFlow Mobile是對完整TensorFlow的裁減,TensorFlow Lite基本就是重新實現了。從內部實現來說,在TensorFlow核心最基本的OP,Context等資料結構,都是新的。從外在表現來說,模型檔案從PB格式改成了FlatBuffers格式,TensorFlow的size有大幅度優化,降至300K,然後提供一個converter將普通TensorFlow模型轉化成TensorFlow Lite需要的格式。因此,無論從哪方面看,TensorFlow Lite都是一個新的實現方案。
參考資料
TensorFlow Architecture
TensorFlow Mobile VS TensorFlow Lite
TensorFlow程式碼解析
TensorFlow Lite