Avro and Protocol Buffer in ColdFusion

Avro

Avro is a data serialization system that provides a compact and efficient way to serialize data in various programming languages. It was developed as a part of the Apache Hadoop project and is commonly used in big data systems for data serialization, inter-process communication, and data storage.

Avro allows you to define a schema for your data in JSON format, which can be used to serialize and deserialize data in a language-independent manner. The schema includes information about the data types and the structure of the data. This schema is used to generate code in different programming languages, which can be used to serialize and deserialize data in those languages.

Traditional data formats like CSV and JSON have their advantages and disadvantages:

CSV

Advantages:

  • Easy to parse
  • Comparatively easier to read
  • Easy to interpret

Disadvantages:

  • Inconsistencies in data types
  • Tricky parsing
  • Difficult to enforce structure

JSON

Advantages:

  • Supports multi-format data (arrays, nested elements, etc.)
  • Recognized web standard for data communication
  • Read by any language
  • Easily shareable

Disadvantages:

  • No schema is enforced
  • Size can be large because of repeated keys

Avro

That’s where Avro steps in to minimize the disadvantages of XML/JSON and use a schema-based approach for data validation and transmission. The schema is written in JSON. Code written using Avro has both the schema and payload embedded. Avro has the following advantages:

  • Data is fully typed
  • Data is composed automatically
  • Documentation is embedded in the schema
  • Can be read across many languages
  • The schema can be safely evolved

However, there are a few disadvantages with Avro:

  • Limited adoption
  • May require additional tool to read the schema (serialization and compression)

Avro schema

Avro schema definitions are JSON records. Because it is a record, it can define multiple fields which are organized in a JSON array. Each such field identifies the field's name as well as its type. The type can be something simple, like an integer, or something complex, like another record.

{
    "type" : "record",
    "name" : "userInfo",
    "namespace" : "my.example",
    "fields" : [{"name" : "age", "type" : "int"}]
}

The list of primitive data types that Avro supports are:

  • null: no value
  • boolean: a binary value
  • int: 32-bit signed integer
  • long: 64-bit signed integer
  • float: single precision (32-bit) IEEE 754 floating-point number
  • double: double precision (64-bit) IEEE 754 floating-point number
  • bytes: sequence of 8-bit unsigned bytes
  • string: unicode character sequence

Avro supports six complex types: records, enums, arrays, maps, unions, and fixed. For more information, see the Avro docs.

ColdFusion methods for Avro

  1. serializeAvro
  2. deSerializeAvro

Protocol Buffer

Protocol Buffers, also known as protobuf, is a language-agnostic, platform-neutral, and extensible mechanism for serializing structured data. It was developed by Google as an alternative to XML and JSON for transmitting data over the wire between different services and applications.

Protocol Buffers use a simple and compact binary format that can be parsed efficiently by computers, making it ideal for high-performance, networked systems. They define a data structure using a language-agnostic schema and generate code for reading and writing that data structure in a variety of programming languages, such as Java, C++, Python, and others.

One of the advantages of using protobuf is that it provides a compact and efficient way of storing and transmitting data compared to traditional text-based formats such as XML and JSON. This can result in significant performance improvements in terms of bandwidth usage and data processing times.

Overall, a Protocol Buffer schema is a simple way to define a structured data format that can be used to serialize and deserialize data across different programming languages and platforms. The message definition contains three field definitions, each with a declared type, and an assigned tag number that is unique.

Protocol Buffer schema

A Protocol Buffer schema is a file that defines the structure of the data that will be serialized and deserialized using Protocol Buffers. The schema is defined using a simple language called Protocol Buffer Language (or proto language), which is designed to be platform-agnostic and easily understandable by both humans and machines.

Here's an example of a simple Protocol Buffer schema that defines a message called "Person" with three fields: name, id, and email:

syntax = "proto3"; 
message Person {
  string name = 1;
  int32 id = 2;
  string email = 3;
}

In this example, the first line specifies the version of the Protocol Buffer specification that is being used (in this case, proto3). The message keyword defines a new message type called "Person". Inside the message definition, there are three fields, each with a type and a unique tag number.

The string, int32, and float are some of the primitive data types supported by Protocol Buffers, but custom types can also be defined. Each field is given a unique tag number that is used to identify the field when it is serialized.

Note: Protocol Buffer is not supported on Solaris.

ColdFusion functions for protobuf

  1. serializeProtoBuf
  2. deserializeProtoBuf

Get help faster and easier

New user?