google protocol buffer——protobuf的基本使用和模型分析

tera發表於2020-08-16

原文網址 : https://www.cnblogs.com/tera/p/13512232.html

GoProtocol模型

這一系列文章主要是對protocol buffer這種編碼格式的使用方式、特點、使用技巧進行說明，並在原生protobuf的基礎上進行擴充套件和優化，使得它能更好地為我們服務。

1.什麼是protobuf

protocol buffer是由google推出一種資料編碼格式，不依賴平臺和語言，類似於xml和json。然而與xml和json最大的不同之處在於，protobuf並非是一種可以完全自解釋的編碼格式，這點在之後會有說明。

2.為什麼要使用protobuf

和json或者xml相比，protocol buffer的解析速度更快，編碼後的位元組數更少。

其中解析速度的相關比較可以參看相關文章，這並不是本系列關心的重點，而位元組數的減少將會是後續擴充套件和優化的重點。

另外，比json和xml更便利的是，開發者只需要編寫一份.proto的描述檔案，就可以通過google提供的編譯器生成不同平臺的模型程式碼，包括java、C#等等，而不需要手動進行模型編寫。

本文後續的示例都是採用java進行展示。

3.如何使用protobuf

首先我們需要下載一個google提供的編譯器，下載地址：

https://github.com/protocolbuffers/protobuf/releases/tag/v3.12.1

選擇自己的系統下載相應的zip包

解壓後就能看到看到一個protoc的執行檔案，即是我們所需要的編譯器。

接著我們需要定義一份BasicUsage.proto的描述檔案，其結構和我們定義普通的類十分類似。

syntax = "proto3";

option java_package = "cn.tera.protobuf.model";
option java_outer_classname = "BasicUsage";

message Person {
  string name = 1;
  int32 id = 2;
  string email = 3;
}

第一行表示所使用的的語法版本，這裡選擇的是最新的proto3版本。

syntax = "proto3";

第三、四行表示最終生成的java的package名和外部class的類名（這裡外部class的意思之後會有程式碼解釋）

option java_package = "cn.tera.protobuf.model";
option java_outer_classname = "BasicUsage";

之後緊接著的就是我們所定義的模型，其中大部分都是我們所熟悉的內容。

這裡需要特別注意，特別注意，特別注意的是，在欄位的後面都跟著一個"= X"，這裡並不是指這個欄位的值，而是表示這個欄位的“序號”，和正確地編碼與解碼息息相關，在我看來是protocol buffer的靈魂，之後會有詳細的說明

message Person {
  string name = 1;
  int32 id = 2;
  string email = 3;
}

有了編譯器和.poto描述檔案，我們就可以生成java模型檔案了

編譯指令

protoc -I=$SRC_DIR --java_out=$DST_DIR $SRC_DIR/BasicUsage.proto

-I ：表示工作目錄，如果不指定，則就是當前目錄

--java_out：表示輸出.java檔案的目錄

這裡我比較習慣將.proto檔案放到java專案中，並且將.java檔案直接生成到相應的package資料夾中，即前文的java_package引數，這樣在使用的時候就可以不用再手動複製檔案了

protoc -I=/protocol_buffer/protobuf/proto --java_out=/protocol_buffer/protobuf/src/main/java/ /protocol_buffer/protobuf/proto/BasicUsage.proto

專案的目錄結構如下圖，其中BasicUsage的class檔案就是生成出來的

以上都是準備工作，接著我們就要進入程式碼相關部分

引入maven依賴

<!--這部分是protobuf的基本庫-->
<dependency>
    <groupId>com.google.protobuf</groupId>
    <artifactId>protobuf-java</artifactId>
    <version>3.9.1</version>
</dependency>
<!--這部分是protobuf和json相關的庫，這裡一併匯入，後面會用到-->
<dependency>
    <groupId>com.google.protobuf</groupId>
    <artifactId>protobuf-java-util</artifactId>
    <version>3.9.1</version>
</dependency>

接著我們建立一個Test方法

/**
 * protobuf的基礎使用
 */
@Test
void basicUse() {
    //建立一個Person物件
    BasicUsage.Person person = BasicUsage.Person.newBuilder()
            .setId(5)
            .setName("tera")
            .setEmail("tera@google.com")
            .build();
    System.out.println("Person's name is " + person.getName());

    //編碼
    //此時我們就可以通過我們想要的方式傳遞該byte陣列了
    byte[] bytes = person.toByteArray();

    //將編碼重新轉換回Person物件
    BasicUsage.Person clone = null;
    try {
        //解碼
        clone = BasicUsage.Person.parseFrom(bytes);
        System.out.println("The clone's name is " + clone.getName());
    } catch (InvalidProtocolBufferException e) {
    }

    //引用是不同的
    System.out.println("==:" + (person == clone));
    //equals方法經過了重寫，所以equals是相同的
    System.out.println("equals:" + person.equals(clone));

    //修改clone中的值
    clone = clone.toBuilder().setName("clone").build();
    System.out.println("The clone's new name is " + clone.getName());
}

在Test方法中，我們可以看到，訪問Person類是需要通過BasicUsage.Person進行訪問，這就是我們前面在定義.proto檔案時指定的java_outer_classname引數

因為在一個.proto檔案中，我們可以定義多個類，而多個.proto檔案也可以定義相同的類名，因此用這個java_outer_classname進行區分，可以認為是.proto的package名

這裡需要注意幾個點：

protobuf的物件的例項化和賦值必須通過newBuilder()返回的Builder物件進行，例項化最終物件需要通過build()方法。

BasicUsage.Person person = BasicUsage.Person.newBuilder()
            .setId(5)
            .setName("tera")
            .setEmail("tera@google.com")
            .build();

物件例項化完成之後就只能呼叫get方法而無法set，如果需要set值，則必須將其轉換回Builder物件才行。

clone = clone.toBuilder().setName("clone").build();

而物件的編碼和解碼，則分別通過toByteArray()方法和parseFrom()方法。

byte[] bytes = person.toByteArray();
...
BasicUsage.Person.parseFrom(bytes);

以上就是protocol buffer的基本使用方式，其實除了賦值比較麻煩意外，其他操作都很方便（如果我們需要在普通的模型中實現.setXX().setYY()這種連續操作，還得另外加個註解呢），特別是對於需要深度clone的物件，protocol buffer也是一個很好的選擇，可以避免很多clone引用的問題。

4.protocol buffer模型解析

當然，瞭解了基礎使用，原始碼的研究自然也是不能少的，不過遵照著循序漸進的原則，我們先看下生成的模型檔案中有些什麼

檢視Person的類，此時的你是不是嚇了一跳，這麼簡單的一個類的程式碼竟然有這麼多！為了不湊字數，我這裡就不貼全了，有興趣的同學自己去生成一個看看全貌，總計836行程式碼

下面主要看下幾個主要部分

1).BasicUsage

主類名是BasicUsage，其餘所有的類都作為了該主類的內部類，所以訪問Person時，需要通過BasicUsage.Person訪問

public final class BasicUsage {
      ...
}

2).PersonOrBuilder介面

PersonOrBuilder介面，定義了Person物件所有欄位的get方法以及其對應的位元組的get方法

public interface PersonOrBuilder extends
        // @@protoc_insertion_point(interface_extends:Person)
        com.google.protobuf.MessageOrBuilder {
    java.lang.String getName();
    com.google.protobuf.ByteString getNameBytes();

    int getId();

    java.lang.String getEmail();
    com.google.protobuf.ByteString getEmailBytes();
}

3).Person類

Person物件是實現了PersonOrBuilder介面的，因此Person只能get而不能set了

public static final class Person extends
            com.google.protobuf.GeneratedMessageV3 implements
            PersonOrBuilder {
    ...
}

Person類沒有public的建構函式，只有3個private的建構函式，因此在外部程式碼中是不能直接建立Person物件的

3個建構函式分為接受Builder物件、構造空物件、接受CodeInputStream物件

其中Builder物件正是之前提到過的，用於通過Builder建立Person

而CodeInputStream則是指位元組陣列，則是用於從byte[]中解碼出物件

這2個建構函式在後文中都可以看到使用場景

private Person(com.google.protobuf.GeneratedMessageV3.Builder<?> builder) {
    super(builder);
}

private Person() {
    name_ = "";
    email_ = "";
}
private Person(
        com.google.protobuf.CodedInputStream input,
        com.google.protobuf.ExtensionRegistryLite extensionRegistry)
        throws com.google.protobuf.InvalidProtocolBufferException {
    ...
}

檢視Person的getName方法，可以看到在這裡，name_是一個Object而不是String，在取值的時候需要做一個型別判斷

這麼實現的原因在於，因為物件是可以通過byte[]陣列解碼的，而byte[]陣列的內容是不可控的、靈活可變的，為了儘量相容這些情況，所以才會如此處理，這個問題後文會給出一些示例

@java.lang.Override
public java.lang.String getName() {
    java.lang.Object ref = name_;
    if (ref instanceof java.lang.String) {
        return (java.lang.String) ref;
    } else {
        com.google.protobuf.ByteString bs =
                (com.google.protobuf.ByteString) ref;
        java.lang.String s = bs.toStringUtf8();
        name_ = s;
        return s;
    }
}

檢視equals和hashcode方法，可以看到根據物件欄位的內容進行了相應的重寫，因此在之前的基本使用示例中，equals方法會返回true

@java.lang.Override
public boolean equals(final java.lang.Object obj) {
    if (obj == this) {
        return true;
    }
    if (!(obj instanceof cn.tera.protobuf.model.BasicUsage.Person)) {
        return super.equals(obj);
    }
    cn.tera.protobuf.model.BasicUsage.Person other = (cn.tera.protobuf.model.BasicUsage.Person) obj;

    if (!getName()
            .equals(other.getName())) return false;
    if (getId()
            != other.getId()) return false;
    if (!getEmail()
            .equals(other.getEmail())) return false;
    if (!unknownFields.equals(other.unknownFields)) return false;
    return true;
}

@java.lang.Override
public int hashCode() {
    if (memoizedHashCode != 0) {
        return memoizedHashCode;
    }
    int hash = 41;
    hash = (19 * hash) + getDescriptor().hashCode();
    hash = (37 * hash) + NAME_FIELD_NUMBER;
    hash = (53 * hash) + getName().hashCode();
    hash = (37 * hash) + ID_FIELD_NUMBER;
    hash = (53 * hash) + getId();
    hash = (37 * hash) + EMAIL_FIELD_NUMBER;
    hash = (53 * hash) + getEmail().hashCode();
    hash = (29 * hash) + unknownFields.hashCode();
    memoizedHashCode = hash;
    return hash;
}

檢視Person的toByteArray()方法，可以看到這個方法是在AbstractMessageLite的類中，這是所有Protobuf生成物件的父類中的方法

public byte[] toByteArray() {
    try {
        byte[] result = new byte[this.getSerializedSize()];
        CodedOutputStream output = CodedOutputStream.newInstance(result);
        this.writeTo(output);
        output.checkNoSpaceLeft();
        return result;
    } catch (IOException var3) {
        throw new RuntimeException(this.getSerializingExceptionMessage("byte array"), var3);
    }
}

此時檢視Person類中的this.writeTo方法，可以看到正是在這個方法中寫入了3個欄位的資料，這些方法的細節我們需要放到之後的文章中詳細分析，因為涉及到了protobuf的編碼原理等內容

@java.lang.Override
public void writeTo(com.google.protobuf.CodedOutputStream output)
        throws java.io.IOException {
    if (!getNameBytes().isEmpty()) {
        com.google.protobuf.GeneratedMessageV3.writeString(output, 1, name_);
    }
    if (id_ != 0) {
        output.writeInt32(2, id_);
    }
    if (!getEmailBytes().isEmpty()) {
        com.google.protobuf.GeneratedMessageV3.writeString(output, 3, email_);
    }
    unknownFields.writeTo(output);
}

對於Person類，我們最後再看一下parseFrom方法，這個方法有很多的過載，然而本質都是一樣的，通過PARSER去處理資料，這裡我就不全貼出來了

public static cn.tera.protobuf.model.BasicUsage.Person parseFrom(byte[] data)
        throws com.google.protobuf.InvalidProtocolBufferException {
    return PARSER.parseFrom(data);
}

檢視PARSER物件，這裡正是會呼叫Person的接受Stream引數的建構函式，和前文對應

private static final com.google.protobuf.Parser<Person>
        PARSER = new com.google.protobuf.AbstractParser<Person>() {
    @java.lang.Override
    public Person parsePartialFrom(
            com.google.protobuf.CodedInputStream input,
            com.google.protobuf.ExtensionRegistryLite extensionRegistry)
            throws com.google.protobuf.InvalidProtocolBufferException {
        return new Person(input, extensionRegistry);
    }
};

4).Builder類

Builder類為Person的內部類，一樣實現了PersonOrBuilder介面，不過額外定義了set的方法

public static final class Builder extends
        com.google.protobuf.GeneratedMessageV3.Builder<Builder> implements
        // @@protoc_insertion_point(builder_implements:Person)
        cn.tera.protobuf.model.BasicUsage.PersonOrBuilder {
    ...
}

這裡的get方法的邏輯和Person類一樣，不過特別注意的是，這裡的name_和Person的getName方法中的name_不是同一個物件，而是分別屬於Builder類和Person類的private欄位

public java.lang.String getName() {
    java.lang.Object ref = name_;
    if (!(ref instanceof java.lang.String)) {
        com.google.protobuf.ByteString bs =
                (com.google.protobuf.ByteString) ref;
        java.lang.String s = bs.toStringUtf8();
        name_ = s;
        return s;
    } else {
        return (java.lang.String) ref;
    }
}

檢視set方法，比較簡單，就是一個直接的賦值操作

public Builder setName(
        java.lang.String value) {
    if (value == null) {
        throw new NullPointerException();
    }

    name_ = value;
    onChanged();
    return this;
}

最後，我們來看下Builder的build方法，這裡呼叫了buildPartial方法

@java.lang.Override
public cn.tera.protobuf.model.BasicUsage.Person build() {
    cn.tera.protobuf.model.BasicUsage.Person result = buildPartial();
    if (!result.isInitialized()) {
        throw newUninitializedMessageException(result);
    }
    return result;
}

檢視buildPartial方法，可以看到這裡呼叫了Person獲取builder引數的建構函式，和前文對應

構造完成後，將Builder中的各種欄位賦值給Person中的相應欄位，即完成了構造

@java.lang.Override
public cn.tera.protobuf.model.BasicUsage.Person buildPartial() {
    cn.tera.protobuf.model.BasicUsage.Person result = new cn.tera.protobuf.model.BasicUsage.Person(this);
    result.name_ = name_;
    result.id_ = id_;
    result.email_ = email_;
    onBuilt();
    return result;
}

總結一下：

1.protocol buffer需要定義.proto描述檔案，然後通過google提供的編譯器生成特定的模型檔案，之後就可以作為正常的java物件使用了

2.不可以直接建立物件，需要通過Builder進行

3.只有Builder才可以進行set

4.可以通過物件的toByteArray()和parseFrom()方法進行編碼和解碼

5.模型檔案很大（至少在java這裡是如此），其中所有的程式碼都是定製的，這其實是它很大的缺點之一

這裡留了幾個伏筆，在maven引用中提到了json，在.proto描述檔案中提到了=X的序號很重要，在getName()方法中提到了靈活性，這些內容會在下一篇文章中繼續探究，本文主要是對protocol buffer進行初步瞭解

google protocol buffer——protobuf的使用特性及編碼原理
2020-08-24
GoProtocol
google protocol buffer——protobuf的問題和改進2
2020-09-20
GoProtocol
google protocol buffer——protobuf的編碼原理二
2020-08-30
GoProtocol
google protocol buffer——protobuf的問題及改進一
2020-09-07
GoProtocol
Protocol buffer---Protobuf3開發指南
2018-05-13
Protocol
Protocol Buffer 使用指北
2019-02-24
Protocol
前端後臺以及遊戲中使用Google Protocol Buffer詳解
2019-02-28
前端遊戲GoProtocol
protocol buffer
2024-05-10
Protocol
C#語言下使用gRPC、protobuf(Google Protocol Buffers)實現檔案傳輸
2020-10-23
C#RPCGoProtocol
Netty使用Google Protocol Buffer完成伺服器高效能資料傳輸
2019-07-19
NettyGoProtocol伺服器
Google Protocol buffer 學習筆記.下篇-動態編譯
2021-09-09
GoProtocol筆記編譯
TarsGo支援Protocol Buffer
2018-11-16
GoProtocol
TarsGo 支援 protocol buffer
2018-10-12
GoProtocol
解密gRPC: Protocol Buffer
2023-10-26
解密RPCProtocol
google protobuf的原理和思路提煉
2021-06-27
Go
protocol buffer的高效編碼方式
2021-08-24
Protocol
Spring Cloud OpenFeign整合Protocol Buffer
2019-02-28
SpringCloudProtocol
Android序列化：Google出品的序列化神器Protocol Buffer不瞭解一下？
2019-03-04
AndroidGoProtocol
使用Google Protocol Bufffers進行通訊(Ruby & C)
2018-06-09
GoProtocol
Cpp(九) gRPC protobuf for C++ 基本使用
2020-12-01
RPCC++
Protocol Buffer序列化Java框架-Protostuff
2021-11-29
ProtocolJava框架
如何在C#中使用Google.Protobuf工具
2021-11-24
C#Go
gRPC in ASP.NET Core 3.0 -- Protocol Buffer（1）
2019-07-28
RPCASP.NETProtocol
TensorFlow中結構化資料工具Protocol Buffer
2021-09-09
Protocol
protobuf的使用
2021-10-21
全圖文分析：如何利用Google的protobuf，來思考、設計、實現自己的RPC框架
2021-04-25
GoRPC框架
Buffer的建立及使用原始碼分析——ByteBuffer為例
2020-07-06
原始碼
Spring Cache的基本使用與分析
2020-05-16
Spring
wsl中ubuntu20.04下安裝google protobuf
2021-04-22
UbuntuGo
protobuf和gRPC
2024-03-19
RPC
Thrift 和 Protobuf
2019-01-29
還在用JSON? Google Protocol Buffers 更快更小 (實踐篇)
2018-03-27
JSONGoProtocol
還在用JSON? Google Protocol Buffers 更快更小 (原理篇)
2018-03-20
JSONGoProtocol
Protobuf的使用，結合idea
2023-12-11
Idea
WebService的概念和基本使用
2018-09-22
Web
NGINX的配置和基本使用
2022-12-12
Nginx
Protocol Buffers 在 iOS 中的使用
2019-03-03
ProtocoliOS
Buffer和Channel
2024-08-17