Difference between revisions of "Toolchain"

Latest revision as of 23:48, 19 August 2021

This page is under construction.

This page is incomplete, and needs more work. Feel free to edit and improve this page!

The Forge toolchain is designed to clean up, merge, deobfuscate, rename, patch and provide the Minecraft source code for the usage of modders, researchers, or just those curious about how the game works.

This serves as an explanation as to how the development environment sets up your code, and aims to help you troubleshoot in case something goes wrong.

Overall process

Whenever a gradle refresh is triggered, a few things occur:

ForgeGradle downloads the jar and MCPConfig zip and triggers the extractSrg/createSrgToMcp task.
After that, it processes the jar - applies access transformers from forge/userdev, SAS from forge, and decompiles.
Finally, it patches and finalizes the code, ready for modder consumption.

This is triggered whenever the setup task is ran.

Needed knowledge

Obfuscation

Obfuscation is the process of renaming all of the fields, methods, and classes of the compiled code into unreadable, machine-generated names (such as aaa, bC), and removing package structures (this makes everything package-local and makes the obfuscated code smaller somewhat. This is commonly used by companies to prevent external entities from easily decompiling their released binaries/executables and retrieving their source code/intellectual property.

Additionally, due to the way the Local Variable Table (LVT) of Java bytecode is stored, every function-local variable name is turned to ☃ (that's right, a snowman) in the compiled files. This makes immediate recompilation of the game literally and physically impossible, as every Java compiler currently available requires that local variables have unique names.

Minecraft is a commercial game, which means the source code is unavailable to all except the developers of Mojang. To prevent piracy and the copying of their intellectual property, Mojang applies this obfuscation process to the game before they release it. They use a tool called ProGuard, which is both an optimizer^[1] and an obfuscator.

Problematic Naming

There are a few big problems with using these obfuscated names directly for modding.

It is incredibly difficult to create mods using these obfuscated names. It requires immense patience to reverse-engineer the meanings behind each and every name, and to keep relating those names to what was already reverse-engineered. Although, tools do exist to make this process easier, such as IntelliJ IDEA plugins that provide naming hints automatically.
Because the obfuscation process takes place after compilation (the obfuscator operates on the compiled classes), the obfuscated names are not handled by the compiler. Thus, obfuscated classes may contain member^[2] names that are invalid in the Java source language, but valid in compiled bytecode (like ☃ discussed earlier); this means that the decompiled source of the game is not immediately recompilable.
These obfuscated names are automatically generated by the obfuscator for each independent release. This means that the obfuscated names may change significantly between any two versions, making it harder for mod developers to update mods between releases.

SRGification

Some background on what SRG is and how it works:

SRG stands for Searge's Retro Guard; RetroGuard being an early attempt to reverse the ProGuard obfuscation, and Searge being co-author of the Mod Coder Pack or MCP^[3], who created this process. Each obfuscated class, method, and field is assigned a unique number by the backend, via a sequential counter. This unique number is called the SRG ID of that class/method/field (henceforth called member).

The SRG name of the member is then derived from its SRG ID and its type (method {given the prefix m_}, field {given the prefix f_}, or parameter {given the prefix p_}). ^[4]. This inclusion of the SRG ID into the name guarantees that the SRG name for all members are unique, and is the reason the ID is generated.

The actual conversion of obf names to SRG names is done by a tool called Vignette. More information on how it works can be found on that page.

The Setup

The process can be broken up into 3 steps; MCPConfig, patch and provide. The MCPConfig step is, understandably, the biggest and most prone to failure. An explanation of MCPConfig itself, how it works, what it's for (but NOT how to use it) can be found here. For the purpose of this guide, you need only know that its' goal is to get the game decompiled, and into a state where it can immediately be recompiled. This means it needs to fix and patch the source code before passing it onto Forge.

In this way, MCPConfig can be thought of the vanilla side of the setup. It does not modify the game.

The following steps are all executed in order of appearance.

Download and parsing MCPConfig

The first thing that ForgeGradle does upon initializing a first-time setup, is starting the SetupMCP task.

This task then seeks to download the MCPConfig.zip jar for the version you're setting up Once it is acquired, it parses the steps contained within the config.json. It does this by interpreting the file with the following rules:

Every key, except for libraries, is interpreted as the name of a step.
Steps are executed in order.
The version value is interpreted as a maven coordinate of a file to download.
If there is a repo value, it is used instead of the maven repositories defined in the buildscript, to retrieve the version.
The args array is parsed, and {values} like this are interpreted as inputs, which can be substituted accordingly.
Once the step is all parsed in, it is executed:
- java -jar <version> <args> <jvmArgs>

A config.json can be found here which defines the following steps:

fernflower
- version: net.minecraftforge:forgeflower:1.5.498.12
- args: -din=1 -rbr=1 -dgs=1 -asc=1 -rsy=1 -iec=1 -jvn=1 -isl=0 -iib=1 -log=TRACE -cfg {libraries} {input} {output}
- jvmargs: -Xmx4G
merge
- version: net.minecraftforge:mergetool:1.1.3:fatjar
- args: --client {client} --server {server} --ann {version} --output {output} --inject false
rename
- version: net.minecraftforge.lex:vignette:0.2.0.10
- args: --jar-in {input} --jar-out {output} --mapping-format tsrg2 --mappings {mappings} --fernflower-meta --cfg {libraries} --create-inits --fix-param-annotations

More information about each of these tools can be found at the link provided, as well as what each of these arguments do. A brief description is provided.

ForgeFlower

The decompiler used by ForgeGradle is a custom fork of Jetbrains' FernFlower, called ForgeFlower which searches the jar for files, cleans up the bytecode, and then converts it into a reasonable best-guess interpretation.

It also produces the following side effects:

Removes synthetic parameters from constructors
- In bytecode, inner classes have the outer class as their first constructor parameter, but Java source code does not.
Adds constructors for inner classes
- These are removed by Proguard sometimes, as they are not required in the bytecode if the parent has a default constructor.
Hides bridge methods
Decompiles generic signatures
Encodes non-ASCII characters in string and character literals as Unicode escaped characters
Hides synthetic class members
Prevents simple lambdas from being inlined

As you can see by the repository, a lot of work has gone into tuning it for Minecraft's needs, but it is still a far way from perfect. This is why the patches are needed.

If you look at the patches for 1.17.1, they are mostly incredibly simple changes. Adding generics, making types more strict.

This is all stuff that should be done by the decompiler, and PRs are always welcome at the ForgeFlower repository for changes and fixes that would reduce the amount of MCPConfig patches required to get the game to compile.

For now, it is a necessity.

Mergetool

The game is split into two distributions; server and client. Since the server is just a subset of the client, the client contains the server-only classes as well.

To get around this, we have a tool called Mergetool, which can search for the differences between two files (down to the function level) and merge them into one large (referred to as joined) jar file. It also can also annotate those files as necessary.

It is a simple program, but it works.

Vignette

Vignette is where SRG starts to come into play. It serves the role of our deobfuscator, performing deobfuscation.

This process is done with the help of a deobfusation map, a file generated by the original obfuscator (in this case, ProGuard) that contains a map of the obfuscated names to original, non-obfuscated names. This is commonly used on deobfuscating stack traces outputted by an obfuscated program, for debugging purposes.^[5]

We have three sets of deobfuscation maps available to us; the obf->SRG mappings distributed with the MCPConfig system, the Yarn intermediary system, or the official mappings.^[6].

To rectify this problem, Forge has it's own process to create deobfuscation mappings for the game, using the official mappings. This process is split into two separate parts: the SRG renaming, and the official mapping. During the SetupMCP task, only the SRG renaming is performed.

Vignette itself operates on the compiled jar and takes in a .tsrg file, like that contained in the MCPConfig zip. It does this because it's easier to rename LVT entries (from the snowman) to readable names while you can search for every other code path that references that specific entry; ergo renaming all accesses at once. This is impossible in source code, where every name is identical and string matching is impossible.

It also:

Handles adding annotations for parameters that have synthetic data
- In bytecode, these Nonnull (or whatever) annotations are attached to the parameters, not to the function that contains them.
Adds synthetic (invisible) constructors for classes without them

LVT (Local Variables) are initially renamed to lvt_<index>_<version>^[7]

First, the class names are renamed from their obfuscated target to the specified mapping output. Then, iterating the members of the class, it renames fields, methods, parameters, and inner classes.

A recap from earlier:

For classes -> C_###_ (but it is transparently remapped without the C_ being written to disk)
For functions/methods -> m_<ID>_
For fields -> f_<ID>_
For function/method parameters -> p_###_^[8].

For example, m_91087_ refers to a function with SRG ID 91087.^[9]

At this point, we have a completely remapped jar.

Patches

Once we have the source code ready to go, the final step in the setup is to apply patches. These are done trivially using DiffPatch.

The Forge Side

When Forge is first loaded by gradle, it prepares the files for setting up the workspace.

The initial setup is done through the gradlew setup task. It will run the SetupMCP tasks and then start applying the Forge system to the processed vanilla code. After the MCPConfig/SetupMCP tasks are finished, ForgeGradle will print MCP Environment Setup is complete and continue. The text pausing after requested step is also printed twice during this task such that some necessary data can be extracted.

First, it applies ATs. After that, patches. Finally, the official mappings, or some other mappings from a provider like Parchment, are applied.

All in all, compared to the MCPConfig setup, this is a string of extremely basic tasks - mostly just one-line commands.

Applying Access Transformers

Access Transformers are a way of changing the visibility and finality of classes and class members. A full explanation of how it works, what the specification is, and what exactly they're used for, can be found at that page.

They are applied by passing the AT Config (nowadays called accesstransformer.cfg) into AccessTransformers:

AccessTransformers.jar --inJar {input} --outJar {output} --logFile accesstransform.log --atFile {at.cfg}

The AT cfg for Forge can be found here.

Applying Patches

The patches used for Forge itself are different from those used by MCPConfig, which means there are two separate patching stages performed.

As opposed to the minimal MCPConfig patching, with the goal to make the code recompilable, the Forge patching is done to apply the API and mod loader to the code.

It does this in a very similar way, with the BinaryPatcher utility used at install time/during a CI build or DiffPatch in all other situations.

Applying Mappings

This step isn't strictly necessary, and it can be omitted. However, working with exclusively SRG names is confusing for most people, so we have an extra step to apply human names to the SRG.

These renames can come from any source; as mentioned earlier, the Yarn naming or the Mojang Obf-map naming. ForgeGradle does not care, as long as there is a valid SRG->names map.

How ForgeGradle retrieves these mappings is covered in the appropriate article.

The process of renaming itself is a simple regex substitution, performed by ForgeGradle itself. This is made possible by the assured uniqueness of SRG names.

Post-processing

At this point, the files are ready to go. We have processed the Minecraft jar such that it can be recompiled, we have applied appropriate access transformers and the Forge patches, and optionally renamed every applicable SRG name to whatever chosen distribution of mappings.

There is not much left to do but package the code into a jar file, and place it into the gradle cache (so that this process does not occur every single time the project is opened). It calculates this name based on many factors, and this is covered in the naming article.

Some Additional Info

You will never see C_XXX_ for classes, because the class names are picked beforehand using the official mapping class names
If you look into the JAR, you won't see any packages for the obfuscated classes, but the deobfuscated classes do have the packages, this is because the same process that names the classes, also decides what package they belong to