The protostuff-runtime module allows your existing pojos to be serialized to different formats.

For people who prefer not to have their messages code-generated from proto files, this fits your bill.

The preliminary modules needed:

  • protostuff-api
  • protostuff-collectionschema
  • protostuff-runtime

The advantages of using proto files is that you have explicit control of fields and their corresponding numbers (which is useful for schema evolution, e.g forward-backward compatibility).

With this module, the field’s number is ordered according to their declaration in the pojo (top to bottom).

Note that the order is not guaranteed on some (non-sun) vms (especially dalvik).

Sun jdk6 or higher is recommended for guaranteed ordering.

As of 1.0.5, @Tag annotations can be used on fields to have explicit control of the field numbers

// all or nothing.  
// Either you annotate all fields or you don't annotate at all (applies to the relevant class only).
// To exclude certain fields, use java's transient keyword
public final class Bar
{
    @Tag(8)
    int baz;

    // alias is available since 1.0.7
    // useful for json/xml/yaml where you can override the field names
    @Tag(value = 15, alias = "f")
    double foo;
}

// with this approach, versioning with inheritance is now fully supported.
// you simply reserve x-y (range) numbers for the fields of the parent class.
// internally it will be detected when you make mistakes tagging with the same number.

Note that if you have non-static inner classes and want to use @Tag annotations, mark that class as static instead.

Without @Tag annotations, forward-backward compatibility is still supported via “append-only” schema evolution.

  • To add new fields, append the field in the declaration
  • To remove existing fields, annotate with@Deprecated
  • To exclude fields from being serialized, use the java keyword: transient

Here’s an example:

public final class Entity
{
    int id;

    String name;

    @Deprecated
    String alias;

    long timestamp;
}

Schema evolution scenario:

  • v1: 3 initial fields (id=1, name=2, alias=3)
  • v2: Added a new field (timestamp=4)
  • v3: Removed the “alias” field

With v3, the field mapping would be (id=1, name=2, timestamp=4). When we encounter the alias field, it is ignored by the deserializer.

The field mapping is still intact despite schema evolution, which makes it forward-backward compatible to different versions.

3 possible types of Schema

Unlike a static hand-written/code-generated schema, there are 3 possible types of schema that can be used at runtime.

Below are the types ordered according to their efficiency and performance at runtime.

Static Schema

Used when the declared field is a concrete type. Compact since no extra metadata included on serialization

public enum SortOrder
{
    ASCENDING,
    DESCENDING;
}

public final class Bar
{
    Entity entity; // the example above
    List<Long> scalarList; // any scalar type
    List<byte[]> bytesList; // byte arrays are treated as scalar fields (use >= 1.0.4)
    List<Entity> entityList;
    Map<String,byte[]> bytesMapWithScalarKeys;
    Map<String,Entity> entityMapWithScalarKeys;
    Map<SortOrder,Entity> entityMapWithEnumKeys;
    Map<Entity,Date> entityMapWithPojoKeys;
    Map<Entity,Entity> entityMap;
}

DerivativeSchema

Used when the declared field is an abstract class. Less compact since the type metadata is written (field number: 127) on serialization.

public abstract class Instrument
{
    // ...
}

public final class BassGuitar extends Instrument
{
    // ...
}

public final class Piano extends Instrument
{
    // ...
}

// DerivativeSchema will be used on the fields below
public final class Baz
{
    Instrument instrument;
    List<Instrument> instrumentList;
    Map<String,Instrument> instrumentMapWithScalarKeys;
    Map<SortOrder,Instrument> instrumentMapWithEnumKeys;
    Map<BassGuitar,Instrument> instrumentMapWithPojoKeys;
    Map<Instrument,Instrument> instrumentMap;
}

IMPORTANT

If your object heirarchy involves a concrete class subclassing another concrete class (not using abstract classes), set: -Dprotostuff.runtime.morph_non_final_pojos=true

With that property set, DerivativeSchema will be used on non-final pojos (concrete types) similar to abstract classes.

For example:

class Base
{
    int id = 1;
}
class Child extends Base
{
    int status = 2;
}
class Pojo
{
    Base b = new Child();
}

// If you serialize Pojo, Child's "status" field will not be 
// serialized if the system property is not set.

// With that in mind, all pojos that aren't marked final will 
// have an overhead of extra type metadata on serialization.

// To ensure that no extra type metadata be will written, mark 
// your pojos final when you know there are no subclasses.

ObjectSchema (dynamic)

Used when the type of the declared fields:

  • are java.lang.Object
  • are interfaces
  • are arrays
  • are collections but did not define the generics
  • are too complex

All necessary metadata is included on serialization to be able to deserialize the message correctly.

public final class Dynamic
{
    Object entity;

    Object[] objectArray;
    int[] primitiveArray;
    Integer[] boxedArray;
    Entity[] entityArray;
    IEntity[] ientityArray;

    List noGenericsList;
    List<?> uselessGenericsList;
    List<Object> objectList;
    List<long[]> withArrayList;

    Map noGenericsMap;
    Map<?,?> uselessGenericsMap;
    Map<String,Object> withObjectMap;
    Map<?,SortOrder> dynamicKeyMap;
    Map<Entity,?> dynamicValueMap;
    Map<Integer[],int[]> withArrayMap;

    // and complex types
    List<List<String>> aListWithAList;
    Map<String,List<SortOrder>> complexMap;
    Map<Set<Entity>,Long> anotherComplexMap;
}

Updating fields

With the information above, be sure that you update your fields carefully.

For example, do not add/remove generics when you already have existing data because the deserialization will fail.

For scalar fields:

  • int can be updated to long (and vice versa), compatible with all suported formats
  • String can be updated to byte[] or ByteString (and vice versa), not compatible with text formats (e.g json/xml/yaml)
class Example
{
    int i;
    long l;
    Integer i2;
    Long l2;
    String s;
    byte[] b;
    ByteString bs;
}

Performance guidelines

As much as possible, use the concrete type when declaring a field.

For polymorhic datasets, prefer abstract classes vs interfaces.

Use ExplicitIdStrategy to write the type metadata as int (ser/deser will be faster and the serialized size will be smaller).

Register your concrete classes at startup via ExplicitIdStrategy.Registry.