March 19, 2021

Introduction to Protocol Buffers, protobuf

What is Protocol Buffers, protobuf

"Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler." https://developers.google.com/protocol-buffers

Advantages

  • Data is binary (more efficient bandwidth)
  • Custom parser for binary data (more CPU efficient, compared with JSON)
  • Data is typed.
  • Wide variety of supported language (Java, C#, Go, Python, JavaScript).
  • Schema based (.proto)

Disadvantages

  • Since data is sent in binary format, you cannot read it as plaintext or JSON.
  • Some language is not supported.

Protocol Buffers, protobuf was deloped by Google and is used for almost all their internal applications.

Data Types (Scalar Value Types)

https://developers.google.com/protocol-buffers/docs/proto3#scalar

  • Integer numbers: int32, int64
  • Decimal numbers: float (32), double (64)
  • Boolean (True or False): bool
  • String: string (String must always be encoded UTF-8 or 7-bit ASCII (US))
  • Byte Array: bytes

Schema .proto file

"Files should be named lower_snake_case.proto" https://developers.google.com/protocol-buffers/docs/style

Message Type

In protobuf we define Messages, e.g. 'message Person'

Field Type, e.g. string

Field Name, e.g. first_name

Field Tag (order), e.g. '= 1;'

// The syntax for this file is proto3
syntax = "proto3";

/* Person is used to identity
 * across our system. */
message Person {
  int32 age = 1;
  string first_name = 2;
  string last_name = 3;
  bytes picture = 4;
  bool is_profile_verified = 5;
  float height = 6;

  // array/list
  repeated string phone_numbers = 7;

  enum EyeColor {
    // first enum is always the default
    EYE_COLOR_UNSPECIFIED = 0;
    EYE_COLOR_GREEN = 1;
    EYE_COLOR_BROWN = 2;
    EYE_COLOR_BLUE = 3;
  }

  EyeColor eye_color = 8;
}

Style Guide

https://developers.google.com/protocol-buffers/docs/style

  • "Keep the line length to 80 characters."
  • "Use an indent of 2 spaces."
  • "Package name should be in lowercase, and should correspond to the directory hierarchy. e.g., if a file is in my/package/, then the package name should be my.package."
  • "Use CamelCase (with an initial capital) for message names"
  • "Use underscore_separated_names for field names (including oneof field and extension names) – for example, song_name."
  • "Use pluralized names for repeated fields."
  • "Use CamelCase (with an initial capital) for enum type names and CAPITALS_WITH_UNDERSCORES for value names:"
  • "If your .proto defines an RPC service, you should use CamelCase (with an initial capital) for both the service name and any RPC method names"

Default Values for Fields

All fields, if not specified, will have default value.

  • bool: false
  • number (int32, etc): 0
  • string: empty string
  • bytes: empty byte array/list
  • enum: first value
  • repeated: empty array/list

No comments: