Changes

9,734 bytes added ,  05:29, 11 January 2021
Major overhaul. Still have yet to get into the Forge side of things.
Line 2: Line 2:  
{{Under construction}}
 
{{Under construction}}
   −
The toolchain of Forge automatically deobfuscates the base game code, from their obfuscated original names to deobfuscated and human-readable names. This process is done to allow mod developers to write their code based on readable and understandable class/method/field names, which greatly simplifies the modding process.
+
The Forge toolchain is designed to clean up, merge, deobfuscate, rename, patch and provide the Minecraft source code for the usage of modders, researchers, or just those curious about how the game works.
   −
== Obfuscation ==
+
This serves as an explanation as to how the development environment sets up your code, and aims to help you troubleshoot in case something goes wrong.
'''Obfuscation''' is the process of renaming all of the fields, methods, and classes of the source code into unreadable, machine-generated names (such as <code>aaa</code>, <code>bC</code>), and removing package structures. This is commonly used by companies to prevent external entities from decompiling their released binaries/executables and retrieving their source code/intellectual property.
+
 
 +
== Overall process ==
 +
When setting up the environment for the first time, a gradle refresh triggers three things:
 +
# ForgeGradle downloads the MCPConfig zip for the file you're using, and triggers the SetupMCP task.
 +
# After that, it processes the jar - applies access transformers, MCPCleanup and others
 +
# Finally, it patches and finalises the code, ready for modder consumption.
 +
 
 +
== Needed knowledge ==
 +
=== Obfuscation ===
 +
'''Obfuscation''' is the process of renaming all of the fields, methods, and classes of the compiled code into unreadable, machine-generated names (such as <code>aaa</code>, <code>bC</code>), and removing package structures (this makes everything package-local and makes the obfuscated code smaller somewhat. This is commonly used by companies to prevent external entities from easily decompiling their released binaries/executables and retrieving their source code/intellectual property, though it does have size advantages.
 +
 
 +
Additionally, due to the way the Local Variable Table (LVT) of Java bytecode is stored, every function-local variable name is turned to ☃ (that's right, a snowman) in the compiled files. This makes immediate recompilation of the game literally and physically impossible, as every Java compiler currently available requires that local variables have unique names.
    
Minecraft is a commercial game, which means the source code is unavailable to all except the developers of Mojang. To prevent piracy and the copying of their intellectual property, Mojang applies this obfuscation process to the game before they release it. They use a tool called [https://www.guardsquare.com/en/products/proguard ProGuard], which is both an optimizer<ref>An optimizer is a program that removes redundant/unused instructions and compacts the code to be faster and smaller.</ref> and an obfuscator.
 
Minecraft is a commercial game, which means the source code is unavailable to all except the developers of Mojang. To prevent piracy and the copying of their intellectual property, Mojang applies this obfuscation process to the game before they release it. They use a tool called [https://www.guardsquare.com/en/products/proguard ProGuard], which is both an optimizer<ref>An optimizer is a program that removes redundant/unused instructions and compacts the code to be faster and smaller.</ref> and an obfuscator.
   −
== Problematic Naming ==
+
=== Problematic Naming ===
 
There are a few big problems with using these obfuscated names directly for modding.
 
There are a few big problems with using these obfuscated names directly for modding.
   −
# It is incredibly difficult to create mods using these obfuscated names. It requires immense patience to reverse-engineer the meanings behind each and every name, and to keep relating those names to what was already reverse-engineered.
+
# It is incredibly difficult to create mods using these obfuscated names. It requires immense patience to reverse-engineer the meanings behind each and every name, and to keep relating those names to what was already reverse-engineered. Although, tools do exist to make this process easier, such as IntelliJ IDEA plugins that provide naming hints automatically.
# Because the obfuscation process takes place after compilation (the obfuscator operates on the compiled classes), the obfuscated names are not handled by the compiler. Thus, obfuscated classes may contain member<ref>'''member''' refers to class fields and methods.</ref> names that are invalid in the Java source language, but valid in compiled bytecode; this means that the decompiled source of the game may not be recompilable.
+
# Because the obfuscation process takes place after compilation (the obfuscator operates on the compiled classes), the obfuscated names are not handled by the compiler. Thus, obfuscated classes may contain member<ref>'''member''' refers to class fields and methods.</ref> names that are invalid in the Java source language, but valid in compiled bytecode (like ☃ discussed earlier); this means that the decompiled source of the game is not immediately recompilable.
 
# These obfuscated names are automatically generated by the obfuscator for each independent release. This means that the obfuscated names may change signficantly between any two versions, making it harder for mod developers to update mods between releases.
 
# These obfuscated names are automatically generated by the obfuscator for each independent release. This means that the obfuscated names may change signficantly between any two versions, making it harder for mod developers to update mods between releases.
   −
== Deobfuscation ==
+
=== SRGification ===
The solution to these problems is a process known as '''deobfuscation''', where these obfuscated names are transformed into more readable names.  
+
Some background on what SRG is and how it works:
 +
 
 +
'''SRG''' stands for '''S'''ea'''RG'''e, co-author of the Mod Creator Pack, who created this process.
 +
Each obfuscated class, method, and field is assigned a unique number by the [[Toolchain:MCPConfig|backend]], via a sequential counter. This unique number is called the '''SRG ID''' of that class/method/field (henceforth called member).
 +
 
 +
The SRG name of the member is then derived from its SRG ID, its type (function {given the prefix func_}, field {given the prefix field_}, parameter {given the prefix p_, or p_i if this is the parameter of a constructor}), and (optionally) the obfuscated name of the object at the time it was given its SRG name<ref>The SRG name for a given member is only created once, when it first appears in the code. Therefore, the SRG postfix may be different from the current obf name.</ref>. This inclusion of the SRG ID into the name guarantees that the SRG name for all members are unique, and is the reason the ID is generated.
 +
 
 +
The actual conversion of obf names to SRG names is done by a tool called [[Toolchain:SpecialSource|SpecialSource]]. More information on how it works can be found on that page.
 +
 
 +
== The Setup ==
 +
The process can be broken up into 3 steps; MCPConfig, patch and provide.
 +
The MCPConfig step is, understandably, the biggest and most prone to failure.
 +
An explanation of MCPConfig itself, how it works, what it's for (but NOT how to use it) can be found [[Toolchain:MCPConfig|here]]. For the purpose of this guide, you need only know that its' goal is to get the game decompiled, and into a state where it can immediately be recompiled. Due to certain flaws in the rest of the toolchain, this means it needs to fix and patch the source code before passing it onto Forge.
 +
 
 +
In this way, MCPConfig can be thought of the vanilla side of the setup. It does not modify the game.
 +
 
 +
=== Download and parsing MCPConfig ===
 +
The first thing that ForgeGradle does upon initialising a first-time setup, is starting the [https://github.com/MinecraftForge/ForgeGradle/blob/e2ed49546abced95650635f81be071441ec60995/src/mcp/java/net/minecraftforge/gradle/mcp/task/SetupMCPTask.java#L91 SetupMCP] task.
 +
This task then seeks to [https://github.com/MinecraftForge/ForgeGradle/blob/e2ed49546abced95650635f81be071441ec60995/src/mcp/java/net/minecraftforge/gradle/mcp/task/DownloadMCPConfigTask.java#L78 download the MCPConfig.zip jar] for the version you're setting up.
 +
Once it is acquired, it [https://github.com/MinecraftForge/ForgeGradle/blob/e2ed49546abced95650635f81be071441ec60995/src/mcp/java/net/minecraftforge/gradle/mcp/util/MCPRuntime.java#L74 parses the steps contained within the config.json]. It does this by interpreting the file with the following rules:
 +
* Every key, except for libraries, is interpreted as the name of a step.
 +
* Steps are executed in order.
 +
* The version value is interpreted as a maven coordinate of a file to download.
 +
* If there is a repo value, it is used instead of the maven repositories defined in the buildscript, to retrieve the version.
 +
* The args array is parsed, and {values} like this are interpreted as inputs, which can be substituted accordingly.
 +
* Once the step is all parsed in, it is executed:
 +
** java -jar &lt;version&gt; &lt;args&gt; &lt;jvmArgs&gt;
 +
 
 +
An example config.json can be found [https://github.com/MinecraftForge/MCPConfig/blob/master/versions/release/1.16.4/config.json here].
 +
 
 +
It defines the steps:
 +
* [[Toolchain:MCinjector|mcinjector]]
 +
** version: de.oceanlabs.mcp:mcinjector:3.8.0:fatjar
 +
** args: --in {input} --out {output} --log {log} --level=INFO --lvt=LVT --exc {exceptions} --acc {access} --ctr {constructors}
 +
* [[Toolchain:ForgeFlower|fernflower]]
 +
** version: net.minecraftforge:forgeflower:1.5.478.16
 +
** args: -din=1 -rbr=1 -dgs=1 -asc=1 -rsy=1 -iec=1 -jvn=1 -isl=0 -iib=1 -log=TRACE -cfg {libraries} {input} {output}
 +
** jvmargs: -Xmx4G
 +
* [[Toolchain:Mergetool|merge]]
 +
** version: net.minecraftforge:mergetool:1.1.1:fatjar
 +
** args: --client {client} --server {server} --ann {version} --output {output} --inject false"
 +
* [[Toolchain:MCInjector|rename]]
 +
** version: net.md-5:SpecialSource:1.8.3:shaded
 +
** args: --in-jar {input} --out-jar {output} --srg-in {mappings}
 +
** repo: https://repo1.maven.org/maven2/
 +
 
 +
More information about each of these tools can be found at the link provided, as well as what each of these arguments do. A brief description is provided.
 +
 
 +
=== MCInjector ===
 +
[https://github.com/ModCoderPack/MCInjector MCInjector] is the tool we use to apply various fixes to the code, while it is still in bytecode form. That meaning, it works on compiled code, not sourcecode. It does this because it's easier to rename LVT entries (from the snowman) to readable names while you can search for every other code path that references that specific entry; ergo renaming all accesses at once. This is impossible in sourcecode, where every name is identical and string matching is impossible.
 +
 
 +
It:
 +
* Removes synthetic parameters from constructors
 +
** In bytecode, inner classes have the outer class as their first constructor parameter, but Java source code does not.
 +
* Handles adding annotations for parameters that have synthetic data
 +
** In bytecode, these Nonnull (or whatever) annotations are attached to the parameters, not to the function that contains them.
 +
* Adds constructors for inner classes
 +
** These are removed by Proguard sometimes, as they are not required in the bytecode if the parent has a default constructor.
 +
* Adds synthetic (invisible) constructors for classes without them
 +
 
 +
 
 +
It also applies fixes for things like EXC;
 +
* Renaming parameters inside constructors, such that subclasses retain the ID of their parent.
 +
* Fixing access for select functions, where it is required for the proper recompilation.
 +
* Applying proper typing to constructor parameters
 +
* Applying proper external (LWJGL) exception data to functions and classes.
 +
 
 +
Note that it does NOT rename to SRG. LVT (Local Variables) are renamed to lvt_&lt;index&gt;_&lt;version&gt;<ref>Index means: the place where this item was found. If it is the 4th entry in the table, it is index 3.</ref>
 +
 
 +
=== ForgeFlower ===
 +
After the code has been cleaned up by MCInjector, to a state where it no longer conflicts with itself, it can be passed to the decompiler.
 +
 
 +
The decompiler used by ForgeGradle is a custom fork of [https://github.com/JetBrains/intellij-community/tree/master/plugins/java-decompiler/engine Jetbrains' FernFlower], called [https://github.com/MinecraftForge/ForgeFlower ForgeFlower].
 +
 
 +
It simply searches the jar for files, converts the bytecode into a reasonable best-guess interpretation.
 +
As you can see by the repository, a lot of work has gone into tuning it for Minecraft's needs, but it is still a far way from perfect. This is why the patches are needed.
 +
 
 +
If you [https://github.com/MinecraftForge/MCPConfig/blob/master/versions/release/1.16.4/patches/ look at the patches] for 1.16.4, they are mostly incredibly simple changes. Adding generics, making types more strict.
 +
 
 +
This is all stuff that should be done by the decompiler, and PRs are always welcome at the ForgeFlower repository for changes and fixes that would reduce the amount of MCPConfig patches required to get the game to compile.
 +
 
 +
For now, it is a necessity.
 +
 
 +
=== Mergetool ===
 +
The game is split into two distributions; server and client.
 +
 
 +
Because the server contains no rendering code, and the client contains none of the server-specific code (like the UI), this means there are differences between what can run on one side or the other.
 +
 
 +
To get around this, we have a tool called Mergetool, which can search for the differences between two files (down to the function level) and merge them into one large (referred to as joined) jar file.
 +
 
 +
It is a simple program, but it works.
 +
 
 +
=== SpecialSource ===
 +
SpecialSource is where SRG starts to come into play. It serves the role of our deobfuscator, performing deobfuscation.
 +
 
 +
This process is done with the help of a '''deobfusation map''', a file generated by the original obfuscator (in this case, ProGuard) that contains a map of the obfuscated names to original, non-obfuscated names. This is commonly used on debofuscating stack traces outputted by an obfuscated program, for debugging purposes.<ref name="retrace"/>
 +
 
 +
We have three sets of deobfuscation maps available to us; the obf->SRG mappings distributed with the MCPConfig system, the Yarn intermediary system, or the offical mappings.<ref name="mojmappings"/>.
 +
 
 +
To rectify this problem, Forge has it's own process to create deobfuscation mappings for the game, using community-sourced human-readable names. This process is split into two separate parts: the '''SRG renaming''', and the '''MCP mapping'''. During the SetupMCP task, only the SRG renaming is performed.
   −
Normally, this process is done with the help of a '''deobfusation map''', a file generated by the obfuscator that contains a map of the obfuscated names to the original, non-obfuscated names in the source code. This is commonly used on debofuscating stack traces outputted by an obfuscated program, for debugging purposes.<ref name="retrace"/> However, we don't have this deobfuscation map file, as this is only accessible to the developers of Minecraft<ref name="mojmappings"/>.
+
SpecialSource itself operates on the source jar, as can be gathered by the name, and takes in a .tsrg file, like that [https://github.com/MinecraftForge/MCPConfig/blob/master/versions/release/1.16.4/joined.tsrg contained in the MCPConfig zip].
   −
To rectify this problem, Forge has it's own process to create deobfuscation mappings for the game, using community-sourced human-readable names. This process is split into two separate parts: the '''SRG renaming''', and the '''MCP mapping'''.
+
It first renames classes, straight into MCP names.
 +
Then, iterating the members of the class, it renames fields, methods, parameters and inner classes.
   −
=== SRG Renaming ===
+
A recap from earlier:
First, the obfuscated names are renamed into '''SRG names'''.<ref>'''SRG''' stands for '''S'''ea'''RG'''e, co-author of the Mod Creator Pack, who created this process.</ref> Each obfuscated class, method, and field is assigned a unique number by the deobfuscator, through a sequential counter. This unique number is called the '''SRG ID''' of that class/method/field (henceforth called member).
+
* For classes -> <code>c_###_</code> (but it is immediately changed without the c_ being written to disk)
 +
* For functions/methods -> <code>func_&lt;ID&gt;_&lt;obf-name&gt;</code>
 +
* For fields -> <code>field_&lt;ID&gt;_&lt;obf-name&gt;</code>
 +
* For function/method parameters -> <code>p_###_#_</code> for normal methods, <code>p_i###_#</code> for constructors; the second number is the index<ref>The index is the position of the argument. 0 on the left, 1 after that, 2 after that. Note that double and long arguments increase their index by two, rather than one.</ref> of the parameter.
   −
The SRG name of the member is then derived from its SRG ID, its type (class, function, field, paramter), and (optionally) the obfuscated name of the object at the time it was given its SRG name<ref>The obfuscated name included with the SRG name should not be taken as the object's real obfuscated name, as it is only true for the version for which the object was first deobfuscated.</ref>. This inclusion of the SRG ID into the name guarantees that the SRG name for all members are unique.
+
For example, <code>func_71410_x</code> refers to a function with SRG ID 71410 and original obfuscated name of <code>x</code>.<ref>For version 1.16.2, <code>func_71410_x</code> refers to <code>Minecraft.getInstance()</code>, with real obfuscated name of <code>B</code>, but when the getInstance() function was first discovered in code, it was called x.</ref>
   −
* For classes -> <code>c_###_</code><ref name="classes_srg">Due to how the deobfuscation process is implemented in Forge, all classes are given their MCP names before being released to the public. Mod developers will never see the SRG names of classes outside of a deobfuscation workspace.</ref>
+
At this point, combined with the MCInjector step from earlier, we have a completely SRGed-up source jar.
* For functions/methods -> <code>func_###_</code>
  −
* For fields -> <code>field_###_</code>
  −
* For function/method parameters -> <code>p_###_#_</code> for normal methods, <code>p_i###_#</code> for constructors; the second number is the index<ref>The index of a parameter is determined by the preceeding parameters, where <code>double</code> and <code>long</code> increments it by 2, and all other primitives and reference types increments it by 1. </ref> of the parameter.
     −
For example, <code>func_71410_x</code> refers to a function with SRG ID 71410 and original obfuscated name of <code>x</code>.<ref>For version 1.16.2, <code>func_71410_x</code> refers to <code>Minecraft.getInstance</code>, with real obfuscated name of <code>B</code>.</ref>
+
=== Patches ===
 +
Once we have the source code ready to go, the final step in the setup is to apply patches.
 +
These are done trivially, using diffs and [https://github.com/CadixDev/gitpatcher gitpatcher].
   −
== MCP Mapping ==
+
=== Clean-up ===
WIP
+
Because of the way Proguard (or whatever obfuscator) and ForgeFlower mangle the source code, we need to take some steps to clean it up before we can proceed.
   −
== Decompilation ==
+
Assorted cleanup fixes are performed by the [https://github.com/MinecraftForge/MCPCleanup/ MCPCleanup] utility. These include:
More general [https://en.wikipedia.org/wiki/Decompiler  info]
+
* removing trailing whitespace at the end of lines
 +
* removing extra newlines at the start and end of files
 +
* removing extra newlines between every line of code (every set of concurrent newlines is replaced with a single)
 +
* removing comments (// hello, /* hello */)
 +
** the purpose of this step is unclear - it seems to be a preventative measure to ensure that Java changes do not interfere with the patch alignment in the future. Comments from the decompiler are still sometimes present in the source code.
 +
* removing imports from the package a class is in
 +
* removing comments that include the phrase <code>GL_[^*]+</code>
 +
* replacing [Toolchain:Magic Constants|magic constants] with their code substitutions
 +
* replacing <code>Character.valueOf(&lt;character&gt;)</code> with <code>&lt;character&gt;</code>
 +
* replacing OpenGL integer constants with their code representation
 +
* converting unicode character constants back into integer representation
 +
* formatting the code with JAStyle
   −
=== FernFlower/ForgeFlower ===
+
It also adjusts abstract functions in some way - after running the program on a jar, the parameters of a default abstract function get renamed, but it is unclear exactly what part of the source code does this.
Forge uses [https://github.com/MinecraftForge/ForgeFlower ForgeFlower], which is a Fork of [https://github.com/MinecraftForge/FernFlower FernFlower].
  −
FernFlower is a decompiler that takes the compiled Minecraft Jar and turn it in to semi readable source code.  
     −
Why semi readable?<br >
  −
Because the decompiler has no sense of formatting or indentation.
      
== Mappings ==
 
== Mappings ==
Line 65: Line 185:  
# If you look into the source, you'll see that parameters for lambdas don't have mapped names
 
# If you look into the source, you'll see that parameters for lambdas don't have mapped names
 
#* This is because of a complication in how the lambdas are compiled/decompiled; it's a more advanced topic which involves how the compiler compiles lambdas which we will not explain here.
 
#* This is because of a complication in how the lambdas are compiled/decompiled; it's a more advanced topic which involves how the compiler compiles lambdas which we will not explain here.
  −
== Reobfuscation ==
  −
TBD
      
== See also ==
 
== See also ==