Core Java

Comparison of Google Protobuf ByteString and Byte[]

Binary data is common in networked apps: images, protobuf payloads, encryption outputs, etc. When using Protocol Buffers in Java you often face a choice: store binary data as a plain Java byte[] or as Protobuf’s com.google.protobuf.ByteString. Let us delve into understanding google protobuf bytestring vs byte array in Java and how they differ in handling raw binary data.

1. Introduction

Protocol Buffers (protobuf) is a language-neutral, platform-neutral serialization library. In Java, protobuf generated message classes expose fields of type ByteString for bytes fields (proto type bytes). Java itself has the primitive array type byte[]. They represent raw binary data, but behave very differently.

1.1 What is byte[]?

  • Mutable sequence of bytes.
  • Simple and convenient for many Java APIs and libraries that expect arrays.
  • Copying is cheap to write (assignment is reference copy), but deep copying requires Arrays.copyOf().
  • No special protobuf integrations — when putting into protobuf messages you often must copy to/from ByteString.

1.2 What is ByteString?

  • Immutable, efficient container for binary data used by protobuf.
  • Provides convenient API: copyFrom(byte[]), toByteArray(), asReadOnlyByteBuffer(), writeTo(OutputStream), etc.
  • Designed for serialization contexts: many protobuf internals and builders accept or return ByteString directly.
  • Can avoid extra copies in some scenarios (e.g., using ByteString.copyFrom(ByteBuffer) or streaming APIs).

1.3 Key Differences

Aspectbyte[]ByteString
MutabilityMutableImmutable
Protobuf integrationNeeds conversion to/from ByteStringNative type for bytes fields
Copy behaviorAssignment shares same array referenceImmutable; safe to share
APIsLow-level array opsRich helpers (IO, buffers, equality, slicing)
Memory & performancePotentially cheaper to modify in place; careful with safetyOften avoids accidental mutation; may avoid extra copies in protobuf paths

Note: If you’re passing binary data into or out of protobuf messages, use ByteString at the protobuf boundary. Internally in your app or when interacting with libraries expecting arrays, use byte[] — but convert explicitly and account for copies.

1.4 Usecase

When to prefer ByteStringWhen to prefer byte[]
  • At the protobuf API boundary — generated messages use ByteString.
  • When you need immutability and safe sharing across threads.
  • When you want to use the provided serialization helpers (e.g., writeTo, asReadOnlyByteBuffer).
  • When interfacing with libraries or code that expect arrays (crypto libraries, legacy APIs, image libraries).
  • When you need in-place mutation of bytes (modify contents) — but be careful if those bytes are shared elsewhere.

2. Code Example

2.1 Proto definition (example.proto)

The following is a simple proto definition for sending raw bytes using Protocol Buffers:

// example.proto
syntax = "proto3";

package example;

message BinaryPayload {
// raw bytes field
bytes data = 1;
} 

This proto file defines a message BinaryPayload containing a single field data of type bytes, which is used to hold raw binary data such as files, images, or any arbitrary byte array.

2.2 Java usage

The following Java example demonstrates how to work with ByteString and byte[] when using the above proto:

// Java example: ByteString vs byte[]
import com.google.protobuf.ByteString;
import example.BinaryPayload;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.Arrays;

public class ByteStringExample {
    public static void main(String[] args) throws IOException {
        // 1) Start with a byte[]
        byte[] original = new byte[] { 0x10, 0x20, 0x30 };

        // 2) Convert to ByteString when building a protobuf message
        ByteString bs = ByteString.copyFrom(original); // makes an immutable copy
        BinaryPayload msg = BinaryPayload.newBuilder()
                .setData(bs)
                .build();

        // 3) Serialize message to bytes (protobuf wire format)
        byte[] serialized = msg.toByteArray();

        // 4) Deserialize and read ByteString
        BinaryPayload parsed = BinaryPayload.parseFrom(serialized);
        ByteString parsedBs = parsed.getData(); // still ByteString

        // 5) Convert ByteString back to byte[] if needed
        byte[] fromBs = parsedBs.toByteArray();

        System.out.println("original == fromBs? " + Arrays.equals(original, fromBs));

        // 6) Efficient access: read-only ByteBuffer (no copy if backed appropriately)
        ByteBuffer readOnlyBuf = parsedBs.asReadOnlyByteBuffer();
        System.out.println("Buffer remaining: " + readOnlyBuf.remaining());

        // 7) Streaming write (avoids creating intermediate arrays)
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        parsedBs.writeTo(out); // writes the raw bytes
        byte[] streamed = out.toByteArray();
        System.out.println("streamed equals original? " + Arrays.equals(original, streamed));
    }
}

This Java program starts with a raw byte[], converts it into a ByteString when creating a BinaryPayload message, and then shows how to serialize, deserialize, convert back to byte[], access it via a read-only ByteBuffer, and stream it efficiently to an output stream, ensuring immutability, memory efficiency, and safe handling of raw binary data.

2.3 Code Run and Output

The following output confirms that the original byte array is preserved after serialization and deserialization, the buffer length is accurate, and the streamed data remains identical to the original:

original == fromBs? true
Buffer remaining: 3
streamed equals original? true

4. Conclusion

byte[] and ByteString both represent binary data but are optimized for different use cases. ByteString is the protobuf-native, immutable, convenience wrapper with helpful I/O primitives and safer sharing semantics; byte[] is the low-level, mutable Java primitive array that many APIs expect. Use ByteString at protobuf boundaries and when immutability or protobuf helpers matter. Convert to byte[] only when you need to interoperate with an API that requires arrays or when you explicitly need mutability — and be mindful of copies.

Yatin Batra

An experience full-stack engineer well versed with Core Java, Spring/Springboot, MVC, Security, AOP, Frontend (Angular & React), and cloud technologies (such as AWS, GCP, Jenkins, Docker, K8).
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Back to top button