Protocol Buffers (protobuf) for Python
Google's language-neutral, platform-neutral mechanism for serializing structured data. You define message schemas in .proto files, compile them with protoc into _pb2.py modules, and use the runtime library (google.protobuf.*) to serialize, deserialize, and manipulate messages. Currently at version 7.34.1 (Python major version bumped from 6.x to 7.x in the 7.34.0 release). Releases follow a quarterly cadence; breaking major-version bumps are targeted at Q1 of each year.
Warnings
- breaking Python major version bumped to 7 with the 7.34.0 release (previous line was 6.x). Boolean values are now rejected when setting enum or int fields—the API raises a TypeError instead of implicitly converting them. The deprecated float_precision option in json_format and float_format/double_format in text_format were also removed.
- breaking Gencode/runtime version mismatch raises google.protobuf.runtime_version.VersionError at import time. Generated _pb2.py files embed a minimum runtime version; loading them against an older installed protobuf package fails hard. This is especially common when grpcio-tools generates code with a newer bundled protoc than the protobuf runtime you have installed.
- breaking Python 4.21.0 (2022) switched the C extension to the upb library. Sharing message objects between Python and C++ (e.g. via SWIG or pybind11) stopped working by default. Libraries like older TensorFlow that relied on this crash with AttributeError on import.
- breaking message.UnknownFields() was deprecated in v5.25 and removed in v6.26+. Calling it raises AttributeError.
- gotcha Accessing an undefined key in a proto map field creates that key with a zero/false/empty value (defaultdict-like behaviour). This silently mutates the message during read-only access, which can cause unexpected serialization differences and test failures.
- gotcha The Python package name declared in a .proto file does NOT affect generated Python module names or import paths. Python packages are determined purely by directory structure relative to the --proto_path flag. Hyphens in filenames are silently converted to underscores (foo-bar.proto → foo_bar_pb2.py).
- gotcha Do not subclass generated message classes. They use a metaclass and internal descriptor machinery that makes subclassing produce subtle bugs ('fragile base class' problems). The official docs explicitly warn against it.
Install
-
pip install protobuf -
pip install grpcio-tools -
# Install the standalone protoc compiler (platform-specific) # macOS: brew install protobuf # Ubuntu: apt-get install -y protobuf-compiler # Or download from https://github.com/protocolbuffers/protobuf/releases
Imports
- MessageToJson / ParseDict
from google.protobuf import json_format json_format.MessageToJson(msg) json_format.ParseDict(d, MyMessage())
- MessageToString / Parse (text format)
from google.protobuf import text_format text_format.MessageToString(msg) text_format.Parse(text, MyMessage())
- descriptor_pool / DescriptorPool
from google.protobuf import descriptor_pool pool = descriptor_pool.Default()
- Generated message class (user proto)
# After: protoc --python_out=. my_message.proto from my_message_pb2 import MyMessage msg = MyMessage(field1='hello', field2=42)
- Well-known types (Timestamp, Duration, Any, Struct …)
from google.protobuf.timestamp_pb2 import Timestamp from google.protobuf.any_pb2 import Any
- UnknownFieldSet
from google.protobuf import unknown_fields unk = unknown_fields.UnknownFieldSet(msg)
Quickstart
# pip install protobuf
# No custom .proto needed for this example — uses the built-in Timestamp well-known type.
from google.protobuf.timestamp_pb2 import Timestamp
from google.protobuf import json_format
import time
# --- Create and populate a message ---
ts = Timestamp()
ts.GetCurrentTime() # sets seconds + nanos to now
# --- Binary serialization round-trip ---
binary = ts.SerializeToString()
ts2 = Timestamp()
ts2.ParseFromString(binary) # returns number of bytes consumed
assert ts == ts2, "Round-trip failed"
# --- JSON serialization ---
json_str = json_format.MessageToJson(ts)
print("JSON:", json_str)
ts3 = json_format.Parse(json_str, Timestamp())
assert ts == ts3, "JSON round-trip failed"
print("All assertions passed.")
# --- Typical workflow with a custom proto ---
# 1. Write my_message.proto:
# syntax = "proto3";
# message Person { string name = 1; int32 id = 2; }
# 2. Compile:
# protoc --python_out=. my_message.proto
# 3. Use generated code:
# from my_message_pb2 import Person
# p = Person(name='Alice', id=42)
# data = p.SerializeToString()
# p2 = Person()
# p2.ParseFromString(data)