Difference between revisions of "Codecs"

From Forge Community Wiki
m (Fix incorrect name in pair codec example)
 
(4 intermediate revisions by 4 users not shown)
Line 5: Line 5:
 
== Serialization and Deserialization ==
 
== Serialization and Deserialization ==
  
The primary use for Codecs is to serialize java objects to some serialized type, such as a JsonElement or a Tag, and to deserialize an serialized object back to its proper java type. This is accomplished with <code>Codec#encodeStart</code> and <code>Codec#parse</code>, respectively. Given a Codec<SomeJavaType> and a DynamicOps<SomeSerializedType>, we can convert instances of SomeJavaType to instances of SomeSerializedType and back.
+
The primary use for Codecs is to serialize java objects to some serialized type, such as a JsonElement or a Tag, and to deserialize a serialized object back to its proper java type. This is accomplished with <code>Codec#encodeStart</code> and <code>Codec#parse</code>, respectively. Given a Codec<SomeJavaType> and a DynamicOps<SomeSerializedType>, we can convert instances of SomeJavaType to instances of SomeSerializedType and back.
  
 
Each of these methods take a [[DynamicOps]] instance and an instance of the object we are serializing or deserializing, and returns a DataResult:
 
Each of these methods take a [[DynamicOps]] instance and an instance of the object we are serializing or deserializing, and returns a DataResult:
Line 18: Line 18:
  
 
// deserialize some Tag instance back to a proper java object
 
// deserialize some Tag instance back to a proper java object
DataResult<SomeJavaType> result = someCodec.parse(NBTOps.INSTANCE, someTag );
+
DataResult<SomeJavaType> result = someCodec.parse(NBTOps.INSTANCE, someTag);
  
 
// serialize some java object to a JsonElement
 
// serialize some java object to a JsonElement
Line 92: Line 92:
 
As previously mentioned, we can use <code>Codec.INT</code> for the integer codec, and <code>Registry.ITEM</code> for the Item codec. We don't have a builtin codec for list-of-blockpos, but we can use BlockPos.CODEC to create one.
 
As previously mentioned, we can use <code>Codec.INT</code> for the integer codec, and <code>Registry.ITEM</code> for the Item codec. We don't have a builtin codec for list-of-blockpos, but we can use BlockPos.CODEC to create one.
  
== Lists ==
+
==Lists==
 
The <code>Codec#listOf</code> instance method can be used to generate a codec for a List from an existing codec:
 
The <code>Codec#listOf</code> instance method can be used to generate a codec for a List from an existing codec:
  
 
<syntaxhighlight lang="java">
 
<syntaxhighlight lang="java">
 
// BlockPos.CODEC is a Codec<BlockPos>
 
// BlockPos.CODEC is a Codec<BlockPos>
Codec<List<BlockPos>> = BlockPos.CODEC.listOf();
+
Codec<List<BlockPos>> blockPosListCodec = BlockPos.CODEC.listOf();
 
</syntaxhighlight>
 
</syntaxhighlight>
  
Line 267: Line 267:
 
* BlockPlacer and BlockPlacerType
 
* BlockPlacer and BlockPlacerType
 
* ConfiguredDecorator and FeatureDecorator
 
* ConfiguredDecorator and FeatureDecorator
 +
 +
=== Registering MapCodecCodecs for Dispatch Subcodecs ===
 +
 +
When registering a subcodec to any dispatch codec registry, the registered subcodec should be an instance of MapCodecCodec, or the subcodec will be nested in its own object when serialized.
 +
 +
For example, suppose we register a single-field subcodec, where we use fieldOf-and-xmap to convert an int to our int-holding Thing:
 +
 +
<syntaxhighlight lang="java">
 +
public record Thing(int n){}
 +
public static Codec<Thing> CODEC = Codec.INT.fieldOf("n").codec().xmap(Thing::new, Thing::n);
 +
</syntaxhighlight>
 +
 +
This results in this json when serialized:
 +
 +
<syntaxhighlight lang="json">
 +
"some_thing":
 +
{
 +
"type": "ourmod:thing",
 +
"value": {
 +
"n": 5
 +
}
 +
}
 +
</syntaxhighlight>
 +
 +
This occurs because xmap does not produce a MapCodecCodec, and if this nested object is not desired, then our registered subcodec must be a MapCodecCodec.
 +
 +
However, fieldOf() produces a MapCodec, which has a codec() method, which does produce a MapCodecCodec. We can rearrange our codec builder, which then produces a cleaner json:
 +
 +
<syntaxhighlight lang="java">
 +
public static Codec<Thing> CODEC = Codec.INT // Primitive codec
 +
.fieldOf("n") // MapCodec
 +
.xmap(Thing::new, Thing::n) // MapCodec
 +
.codec(); // MapCodecCodec! That's what we want.
 +
</syntaxhighlight>
 +
 +
<syntaxhighlight lang="json">
 +
"some_thing":
 +
{
 +
"type": "ourmod:thing",
 +
"n": 5
 +
}
 +
</syntaxhighlight>
 +
 +
RecordCodecBuilder also produces MapCodecCodecs.
  
 
=External Links=
 
=External Links=
 
* [https://github.com/Mojang/DataFixerUpper/blob/master/src/main/java/com/mojang/serialization/Codec.java Codecs in Mojang's official public DataFixerUpper repository]
 
* [https://github.com/Mojang/DataFixerUpper/blob/master/src/main/java/com/mojang/serialization/Codec.java Codecs in Mojang's official public DataFixerUpper repository]
 
* [https://kvverti.github.io/Documented-DataFixerUpper/snapshot/com/mojang/serialization/Codec.html#flatXmap-java.util.function.Function-java.util.function.Function- Unofficial Codec Javadocs]
 
* [https://kvverti.github.io/Documented-DataFixerUpper/snapshot/com/mojang/serialization/Codec.html#flatXmap-java.util.function.Function-java.util.function.Function- Unofficial Codec Javadocs]

Latest revision as of 18:45, 10 January 2024

Codecs are a serialization tool from mojang's DataFixerUpper library. Codecs are used alongside DynamicOps to allow objects to be serialized to different formats and back, such as JSON or NBT. While the DynamicOps describes the format the object is to be serialized to, the Codec describes the manner in which the object is to be serialized; a single Codec can be used to serialize an object to any format for which a DynamicOps exists. Codecs and DynamicOps jointly form an abstraction layer over data serialization, simplifying the effort needed to serialize or deserialize data.

Using Codecs

Serialization and Deserialization

The primary use for Codecs is to serialize java objects to some serialized type, such as a JsonElement or a Tag, and to deserialize a serialized object back to its proper java type. This is accomplished with Codec#encodeStart and Codec#parse, respectively. Given a Codec<SomeJavaType> and a DynamicOps<SomeSerializedType>, we can convert instances of SomeJavaType to instances of SomeSerializedType and back.

Each of these methods take a DynamicOps instance and an instance of the object we are serializing or deserializing, and returns a DataResult:

// let someCodec be a Codec<SomeJavaType>
// let someJavaObject be an instance of SomeJavaType
// let someTag and someJsonElement be instances of Tag and JsonElement, respectively

// serialize some java object to Tag
DataResult<Tag> result = someCodec.encodeStart(NBTOps.INSTANCE, someJavaObject);

// deserialize some Tag instance back to a proper java object
DataResult<SomeJavaType> result = someCodec.parse(NBTOps.INSTANCE, someTag);

// serialize some java object to a JsonElement
DataResult<JsonElement> result = someCodec.encodeStart(JsonOps.INSTANCE, someJavaObject);

// deserialize a JsonElement back to a proper java object
DataResult<SomeJavaType> result = someCodec.parse(JsonOps.INSTANCE, someJsonElement);

A DataResult either holds the converted instance, or it holds some error data, depending on whether the conversion was successful or not, respectively. There are several things we can do with this DataResult; DataResult#result simply returns an Optional containing the converted object if the conversion was successful, while DataResult#resultOrPartial also runs a given function if the conversion was unsuccessful (in addition to returning the Optional). #resultOrPartial is particularly useful for logging errors during datapack deserialization:

// deserialize something from json
someCodec.parse(JsonOps.INSTANCE, someJsonElement)
	.resultOrPartial(errorMessage -> doSomethingIfBadData(errorMessage))
	.ifPresent(someJavaObject -> doSomethingIfGoodData(someJavaObject))

Builtin Codecs

Primitive Codecs

The Codec class itself contains static instances of codecs for all supported primitive types, e.g. Codec.STRING is the canonical Codec<String> implementation. Primitive codecs include:

  • BOOL, which serializes to a boolean value
  • BYTE, SHORT, INT, LONG, FLOAT, and DOUBLE, which serialize to numeric values
  • STRING, which serializes to a string
  • BYTE_BUFFER, INT_STREAM, and LONG_STREAM, which serialize to lists of numbers
  • EMPTY, which represents null objects

Other Builtin Codecs

Vanilla minecraft has many builtin codecs for objects that it frequently serializes. These codecs are typically static instances in the class the codec is serializing; e.g. ResourceLocation.CODEC is the canonical Codec<ResourceLocation>, while BlockPos.CODEC is the codec used for serializing a BlockPos.

Each vanilla Registry acts as the Codec for the type of object the registry contains; e.g. Registry.BLOCK is itself a Codec<Block>. Forge Registries, however, do not currently implement Codec and cannot yet be used in this way; custom codecs must be created for forge-specific registries that are not tied to specific vanilla registries.

Of particular note here is the CompoundTag.CODEC, which can be used to e.g. serialize a CompoundTag into a json file. This has a notable limitation in that CompoundTag.CODEC *cannot* safely deserialize lists of numbers from json, due to the strong typing of ListTag and the way that the NBTOps deserializer reads numeric values.

Creating Codecs

Suppose we have the following class, and we want to deserialize json files to instances of this class:

public class ExampleCodecClass {

    private final int someInt;
    private final Item item;
    private final List<BlockPos> blockPositions;

    public ExampleCodecClass(int someInt, Item item, List<BlockPos> blockPositions) {...}

    public int getSomeInt() { return this.someInt; }
    public Item getItem() { return this.item; }
    public List<BlockPos> getBlockPositions() { return this.blockPositions; }
}

Where a json file for an instance of this class might look like:

{
	"some_int": 42,
	"item": "minecraft:gold_ingot",
	"block_positions":
	[
		[0,0,0],
		[10,20,-100]
	]
}

We can assemble a codec for this class by building a new codec out of smaller codecs. We'll need a codec for each of these fields:

  • a Codec<Integer>
  • a Codec<Item>
  • a Codec<List<BlockPos>>

And then we'll need to assemble these into a Codec<ExampleCodecClass>.

As previously mentioned, we can use Codec.INT for the integer codec, and Registry.ITEM for the Item codec. We don't have a builtin codec for list-of-blockpos, but we can use BlockPos.CODEC to create one.

Lists

The Codec#listOf instance method can be used to generate a codec for a List from an existing codec:

// BlockPos.CODEC is a Codec<BlockPos>
Codec<List<BlockPos>> blockPosListCodec = BlockPos.CODEC.listOf();

Codecs created via listOf() serialize things to listlike objects, such as [] json arrays or ListTags.

Deserializing a list in this manner produces an immutable list. If a mutable list is needed, xmap can be used to convert the list after deserializing.

Records

RecordCodecBuilder is used to generate codecs that serialize instances of classes with explicitly named fields, like our example above. Codecs created via RecordCodecBuilder serialize things to maplike objects, such as {} json objects or CompoundTags.

RecordCodecBuilder can be used in several ways, but the simplest form is as follows:

public static final Codec<SomeJavaClass> = RecordCodecBuilder.create(instance -> instance.group(
		someFieldCodecA.fieldOf("field_name_a").forGetter(SomeJavaClass::getFieldA),
		someFieldCodecB.fieldOf("field_name_b").forGetter(SomeJavaClass::getFieldB),
		someFieldCodecC.fieldOf("field_name_c").forGetter(SomeJavaClass::getFieldC),
		// up to 16 fields can be declared here
	).apply(instance, SomeJavaClass::new));

Where each line in the group specifies a codec instance for the type of that field, the field name in the serialized object, and the corresponding getter function in the java class. The builder is concluded by specifying a constructor or factory for the java class whose arguments are the previously defined fields in the same order.

For example, using RecordCodecBuilder to create a codec for our example class above:

public static final Codec<ExampleCodecClass> = RecordCodecBuilder.create(instance -> instance.group(
		Codec.INT.fieldOf("some_int").forGetter(ExampleCodecClass::getSomeInt),
		Registry.ITEM.fieldOf("item").forGetter(ExampleCodecClass::getItem),
		BlockPos.CODEC.listOf().fieldOf("block_positions").forGetter(ExampleCodecClass::getBlockPositions)
	).apply(instance, ExampleCodecClass::new));

Optional and Default Values in Record Fields

When RecordCodecBuilder is used as shown above, all of the fields are *required* to be in the serialized object (the JsonObject/CompoundTag/etc), or the entire thing will fail to parse when the codec tries to deserialize it. If we wish to have optional or default values, we have several alternatives of fieldOf() we can use.

  • someCodec.optionalFieldOf("field_name") creates a field for an Optional. If the field in the json/nbt is not present or invalid, it will deserialize as an empty optional. Empty optionals will not be serialized; the field will be omitted from the json or nbt.
  • someCodec.optionalFieldOf("field_name", someDefaultValue) creates an optional field that deserializes as the given default value if the field is not present in the json/nbt. When serializing, if the field in the java object equals the default value, the value will not be serialized and the field will be omitted from the json or nbt.

When using optional fields, be wary that if the field contains bad data or otherwise fails to serialize, the error will be silently caught, and the field will serialize as the default value instead!

Boxing values as objects

In some situations, we may need to serialize a single value as a single-field object. We can use fieldOf to box a single value in this way without needing the entire RecordCodecBuilder process:

public static final Codec<Integer> BOXED_INT_CODEC = Codec.INT.fieldOf("value").codec();

JsonElement value = BOXED_INT_CODEC.encodeStart(JsonOps.INSTANCE, 5).result().get();

Which serializes the following output:

{"value":5}

Unit

The Codec.unit(defaultValue) codec creates a Codec that always deserializes a specified default value, regardless of input. When serializing, it serializes nothing.

Pair

The Codec.pair(codecA, codecB) static method takes two codecs and generates a Codec<Pair<A,B>> from them.

The only valid arguments for this method are codecs that serialize to objects with explicit fields, such as codecs created using RecordCodecBuilder or fieldOf. Codecs that serialize nothing (such as unit codecs) are also valid as they act as objects-with-no-fields.

The resulting Pair codec will serialize a single object that has all of the fields of the two original codecs. For example:

public static final Codec<Pair<Integer,String>> PAIR_CODEC = Codec.pair(
	Codec.INT.fieldOf("value").codec(),
	Codec.STRING.fieldOf("name").codec());

JsonElement encodedPair = PAIR_CODEC.encodeStart(JsonOps.INSTANCE, Pair.of(5, "cheese").result().get();

This codec serializes the above value to:

{
	"value": 5,
	"name": "cheese"
}

Codecs that serialize to objects with undefined fields such as unboundedMap may cause strange and unpredictable behaviour when used here; these objects should be boxed via fieldOf when used in a pair codec.

Either

The Codec.either(codecA, codecB) static method takes two codecs and generates a Codec<Either<A,B>> from them.

When this codec is used to de/serialize an object, it first attempts to use the first codec; if and only if that conversion fails, it then attempts to use the second codec. If that conversion also fails, then the returned DataResult will contain the error data from the *second* codec's conversion attempt.

Numeric Ranges

The Codec.intRange(min,max), Codec.floatRange(min,max), and Codec.doubleRange(min,max) static methods generate Codecs for Integers, Floats, or Doubles, respectively, for which only a specified inclusive range is valid, and values outside that range will fail to de/serialize.

Maps

Suppose we want to serialize a HashMap or other Map type, where we could have indefinitely many key-value pairs and we don't know what the keys are ahead of time.

We can create a Codec<Map<KEY,VALUE>> using the Codec.unboundedMap static method, which takes a key codec and a value codec and creates a codec for a map type:

public static final Codec<Map<String, BlockPos>> = Codec.unboundedMap(Codec.STRING, BlockPos.CODEC);

The serialized form of maps serialized by this codec will be a JsonObject or CompoundTag, whose fields are the key-value pairs in the map; the map's keys will be used as the field names, and the map's values will be the values of those fields

A limitation of using unboundedMap is that it only supports key codecs that serialize to Strings (including codecs for things like ResourceLocation that aren't Strings themselves but still serialize to strings). To create a codec for a Map whose keys are not fundamentally strings, the Map must be serialized as a list of key-value pairs instead of using unboundedMap.

Equivalent Types and xmap

Suppose we have two java classes, Amalgam and Box; any Amalgam instance can be converted to a Box, and vice-versa. Now suppose we have a Codec<Amalgam>, but we'd also like to have a Codec<Box>. Rather than creating an entirely new codec for Box from scratch, we can simply xmap our Amalgam codec instead.

The Codec#xmap instance method is used to generate a second codec for a fundamentally equivalent type to the first codec's type. The method takes two function objects as arguments, which are used to convert the first type to the second when deserializing, and converting the second type to the first when serializing:

public static final Codec<Box> = Amalgam.CODEC.xmap(Amalgam::toBox, Box::toAmalgam);

Codecs created in this manner will serialize objects in the same format as the starting codec.

Partially Equivalent Types, flatComapMap, comapFlatMap, and flatXMap

Consider the ResourceLocation: Any ResourceLocation can be converted to a String, but not all Strings can be converted to a ResourceLocation; ResourceLocations have strict limits on their format and allowed characters.

While we *could* use xmap to convert the Codec.STRING to a codec for ResourceLocations, this would cause attempts to parse an invalid string like SHOUTY:MOD:Invalid$Characters to throw a runtime exception, when we really should be returning a failed DataResult to the parser instead -- which indeed is what the vanilla ResourceLocation codec does.

Codecs have three additional instance methods for creating equivalent codecs for when we can *potentially* convert one type to another, but are not guaranteed to be able to do so. These take conversion function arguments that return DataResults, allowing validation to be performed during serialization and deserialization.

Codec Conversion Methods
Can A always be converted to B? Can B always be converted to A? Which method of codecA should be used to create codecB?
yes yes codecA.xmap
yes no codecA.flatComapMap
no yes codecA.comapFlatMap
no no codecA.flatXmap

Registry Dispatch

Registry Dispatch Codecs allow us to define a registry of codecs and delegate to a specific codec to deserialize a particular json based on a type field in that json. Dispatch codecs are used extensively when deserializing worldgen data.

To create a dispatch codec for a Thing class, the following steps can be performed:

  1. Create a Thing abstract class class and ThingType interface. The ThingType interface should have a method that supplies a Codec<Thing>, while Thing subclasses must define a method that supplies a ThingType.
  2. Create a map or registry of ThingTypes, and register a ThingType for each sub-codec we want to have.
  3. Create a Codec<ThingType>, or have the ThingType registry implement Codec.
  4. Create our Codec<Thing> master codec by invoking Codec#dispatch on our ThingType codec. This method's arguments are:
    1. A field name for the ID of the sub-codec (the example json below is using "type")
    2. A function to retrieve a ThingType from a Thing
    3. A function to retrieve a Codec<Thing> from a ThingType

We can then use our Codec<Thing> to create Thing fields in other codecs whose serialized format depends on the specific sub-codec used by a Thing instance.

As an example of this, consider the ExampleCodecClass earlier. Suppose we make this class extend Thing and register our codec for it to a codec dispatch registry with the id "ourmod:exampleclass". If we were to define an instance of this class in a Thing field in some json, it would look like

"some_thing":
{
	"type": "ourmod:exampleclass",
	"some_int": 42,
	"item": "minecraft:gold_ingot",
	"block_positions":
	[
		[0,0,0],
		[10,20,-100]
	]
}

Other ThingTypes we register would have different fields in this json object, but would still be valid for the "some_thing" field.

Several examples of vanilla classes that use dispatch codecs:

  • RuleTest and RuleTestType
  • BlockPlacer and BlockPlacerType
  • ConfiguredDecorator and FeatureDecorator

Registering MapCodecCodecs for Dispatch Subcodecs

When registering a subcodec to any dispatch codec registry, the registered subcodec should be an instance of MapCodecCodec, or the subcodec will be nested in its own object when serialized.

For example, suppose we register a single-field subcodec, where we use fieldOf-and-xmap to convert an int to our int-holding Thing:

public record Thing(int n){}
public static Codec<Thing> CODEC = Codec.INT.fieldOf("n").codec().xmap(Thing::new, Thing::n);

This results in this json when serialized:

"some_thing":
{
	"type": "ourmod:thing",
	"value": {
		"n": 5
	}
}

This occurs because xmap does not produce a MapCodecCodec, and if this nested object is not desired, then our registered subcodec must be a MapCodecCodec.

However, fieldOf() produces a MapCodec, which has a codec() method, which does produce a MapCodecCodec. We can rearrange our codec builder, which then produces a cleaner json:

public static Codec<Thing> CODEC = Codec.INT // Primitive codec
	.fieldOf("n") // MapCodec
	.xmap(Thing::new, Thing::n) // MapCodec
	.codec(); // MapCodecCodec! That's what we want.
"some_thing":
{
	"type": "ourmod:thing",
	"n": 5
}

RecordCodecBuilder also produces MapCodecCodecs.

External Links