Schema Evolution in Apache Avro, Protobuf, and JSON Schema

Eleftheria DrosopoulouJune 23rd, 2025Last Updated: June 20th, 2025

0 1,443 3 minutes read

In modern distributed architectures—especially event-driven systems like Kafka or Pulsar—data is the contract. When systems scale and evolve, your data schemas will too. If not managed carefully, schema changes can break consumers, cause data loss, or disrupt analytics pipelines.

This post explores how schema evolution is handled across three common serialization formats: Apache Avro, Google Protobuf, and JSON Schema. We’ll walk through examples, common compatibility strategies, and tools to keep your contracts safe as they evolve.

Why Schema Evolution Matters

Changing data structures in production can break consumers that rely on an older version of the schema. A producer might send new fields or remove existing ones, while a consumer still expects the original version. A schema registry can help prevent this by enforcing compatibility rules and managing schema versions centrally.

Without evolution planning, you risk issues like:

Consumers failing due to unexpected fields.
Analytics pipelines producing inconsistent results.
Downtime during coordinated schema rollouts.

Apache Avro

Avro is widely used in data pipelines due to its compact binary format and support for embedded or externally referenced schemas. Schemas are defined in JSON and describe record structures, fields, and types.

Example schema:

{
  "type": "record",
  "name": "User",
  "fields": [
    {"name": "name", "type": "string"},
    {"name": "age", "type": "int"}
  ]
}

Evolving the schema:
Let’s say we want to add an optional email field:

{"name": "email", "type": ["null", "string"], "default": null}

This is a backward-compatible change because the default allows older consumers to process it safely.

Compatibility modes in Avro (via Confluent Schema Registry):

Mode	Description
Backward	New schema can read data written by the old one
Forward	Old schema can read data written by the new one
Full	Ensures both forward and backward compatibility

Useful reference: Confluent Schema Compatibility Guide.

Protocol Buffers (Protobuf)

Protobuf uses .proto files to define message schemas. Each field is assigned a unique numeric tag that acts as the identifier. Fields can be added, deprecated, or removed—but never reused.

Example:

message User {
  string name = 1;
  int32 age = 2;
  optional string email = 3;
}

Schema evolution in Protobuf:

New optional fields can be added without breaking old consumers.
Unknown fields are ignored during deserialization.
Removed fields must not reuse their tag number.

This makes Protobuf ideal for streaming systems and gRPC-based microservices. For a deeper dive into how Protobuf is used in event-based architectures, check out Deliveroo’s engineering post on streaming schema evolution with Protobuf.

JSON Schema

JSON Schema is popular in REST APIs and lightweight pub/sub systems. It defines structure, types, required fields, and value constraints for JSON data.

Example:

{
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "age": {"type": "integer"},
    "email": {"type": ["string", "null"]}
  },
  "required": ["name", "age"]
}

Challenges with evolution:

JSON Schema lacks a native versioning model.
Optionality and required fields must be managed manually.
Consumers need logic to handle multiple schema versions.

To evolve safely, it’s common to include a version field in the payload, enabling consumers to switch logic based on version. For in-depth rules, the JSON Schema documentation is a solid place to start.

Comparison Table

Feature	Avro	Protobuf	JSON Schema
Format	Binary	Binary	Text
Schema Definition	JSON	`.proto`	JSON
Evolution Support	Strong	Strong	Weak
Self-describing	Yes (optional)	No	No
Best for	Data lakes, Kafka	RPC, Streaming	REST APIs

Tools for Schema Evolution

Confluent Schema Registry: Supports Avro, Protobuf, JSON Schema with version control and compatibility checks.
https://docs.confluent.io/platform/current/schema-registry/index.html
AWS Glue Schema Registry: Works well with Protobuf and Avro in streaming jobs and analytics workflows.
https://docs.aws.amazon.com/glue/latest/dg/schema-registry.html
Karapace: Open-source alternative to Confluent’s registry.
https://github.com/aiven/karapace

Best Practices

Best Practice	Description
Use a Schema Registry	Centralizes schema versions and enforces compatibility rules automatically.
Add Fields as Optional or With Defaults	Prevents breaking older consumers when introducing new fields.
Avoid Reusing Field Identifiers	Especially important in Protobuf where field numbers must remain unique.
Remove Fields Carefully	Only remove fields when you’re certain no consumers depend on them.
Document Schema Versions	Maintain a changelog or version field to track schema changes over time.
Use Compatibility Modes	Enforce forward, backward, or full compatibility policies (e.g., in Avro).
Test Evolution Scenarios	Validate changes in staging with different producer and consumer versions.
Version Schemas Explicitly	Embed a `version` field in JSON payloads to guide deserialization logic.
Automate Validation in CI/CD	Integrate schema compatibility checks into your pipeline for safe deploys.

Final Thoughts

Schema evolution is an unavoidable reality in growing systems. By adopting tools like schema registries and serialization formats designed with evolution in mind, teams can decouple producers from consumers and roll out changes with confidence.

Whether you pick Avro, Protobuf, or JSON Schema, the key is the same: treat your schema like code—version it, test it, and validate it before deploying.

Schema Evolution in Apache Avro, Protobuf, and JSON Schema

Why Schema Evolution Matters

Apache Avro

Protocol Buffers (Protobuf)

JSON Schema

Comparison Table

Tools for Schema Evolution

Best Practices

Further Reading and Videos

Final Thoughts

Thank you!

Eleftheria Drosopoulou

Thank you!

Why Schema Evolution Matters

Apache Avro

Protocol Buffers (Protobuf)

JSON Schema

Comparison Table

Tools for Schema Evolution

Best Practices

Further Reading and Videos

Final Thoughts

Thank you!

Related Articles

Thank you!