Views
Actions
Difference between revisions of "Toolchain"
m (changed ref to right position) |
(major update on deobfucsation information (WIP)) |
||
Line 1: | Line 1: | ||
− | + | {{DISPLAYTITLE:Deobfuscation Process}} | |
− | + | The toolchain of Forge automatically deobfuscates the base game code, from their obfuscated original names to deobfuscated and human-readable names. This process is done to allow mod developers to write their code based on readable and understandable class/method/field names, which greatly simplifies the modding process. | |
− | |||
− | + | == Obfuscation == | |
− | + | '''Obfuscation''' is the process of renaming all of the fields, methods, and classes of the source code into unreadable, machine-generated names (such as <code>aaa</code>, <code>bC</code>), and removing package structures. This is commonly used by companies to prevent external entities from decompiling their released binaries/executables and retrieving their source code/intellectual property. | |
− | |||
− | + | Minecraft is a commercial game, which means the source code is unavailable to all except the developers of Mojang. To prevent piracy and the copying of their intellectual property, Mojang applies this obfuscation process to the game before they release it. They use a tool called [https://www.guardsquare.com/en/products/proguard ProGuard], which is both an optimizer<ref>An optimizer is a program that removes redundant/unused instructions and compacts the code to be faster and smaller.</ref> and an obfuscator. | |
+ | == Problematic Naming == | ||
+ | There are a few big problems with using these obfuscated names directly for modding. | ||
+ | |||
+ | # It is incredibly difficult to create mods using these obfuscated names. It requires immense patience to reverse-engineer the meanings behind each and every name, and to keep relating those names to what was already reverse-engineered. | ||
+ | # Because the obfuscation process takes place after compilation (the obfuscator operates on the compiled classes), the obfuscated names are not handled by the compiler. Thus, obfuscated classes may contain member<ref>'''member''' refers to class fields and methods.</ref> names that are invalid in the Java source language, but valid in compiled bytecode; this means that the decompiled source of the game may not be recompilable. | ||
+ | # These obfuscated names are automatically generated by the obfuscator for each independent release. This means that the obfuscated names may change signficantly between any two versions, making it harder for mod developers to update mods between releases. | ||
== Deobfuscation == | == Deobfuscation == | ||
− | First the obfuscated names | + | The solution to these problems is a process known as '''deobfuscation''', where these obfuscated names are transformed into more readable names. |
+ | |||
+ | Normally, this process is done with the help of a '''deobfusation map''', a file generated by the obfuscator that contains a map of the obfuscated names to the original, non-obfuscated names in the source code. This is commonly used on debofuscating stack traces outputted by an obfuscated program, for debugging purposes.<ref name="retrace"/> However, we don't have this deobfuscation map file, as this is only accessible to the developers of Minecraft<ref name="mojmappings"/>. | ||
+ | |||
+ | To rectify this problem, Forge has it's own process to create deobfuscation mappings for the game, using community-sourced human-readable names. This process is split into two separate parts: the '''SRG renaming''', and the '''MCP mapping'''. | ||
+ | |||
+ | === SRG Renaming === | ||
+ | First, the obfuscated names are renamed into '''SRG names'''.<ref>'''SRG''' stands for '''S'''ea'''RG'''e, co-author of the Mod Creator Pack, who created this process.</ref> Each obfuscated class, method, and field is assigned a unique number by the deobfuscator, through a sequential counter. This unique number is called the '''SRG ID''' of that class/method/field (henceforth called member). | ||
+ | |||
+ | The SRG name of the member is then derived from its SRG ID, its type (class, function, field, paramter), and (optionally) the obfuscated name of the object at the time it was given its SRG name<ref>The obfuscated name included with the SRG name should not be taken as the object's real obfuscated name, as it is only true for the version for which the object was first deobfuscated.</ref>. This inclusion of the SRG ID into the name guarantees that the SRG name for all members are unique. | ||
+ | |||
+ | * For classes -> <code>c_###_</code><ref name="classes_srg">Due to how the deobfuscation process is implemented in Forge, all classes are given their MCP names before being released to the public. Mod developers will never see the SRG names of classes outside of a deobfuscation workspace.</ref> | ||
+ | * For functions/methods -> <code>func_###_</code> | ||
+ | * For fields -> <code>field_###_</code> | ||
+ | * For function/method parameters -> <code>p_###_#_</code> for normal methods, <code>p_i###_#</code> for constructors; the second number is the index<ref>The index of a parameter is determined by the preceeding parameters, where <code>double</code> and <code>long</code> increments it by 2, and all other primitives and reference types increments it by 1. </ref> of the parameter. | ||
+ | |||
+ | For example, <code>func_71410_x</code> refers to a function with SRG ID 71410 and original obfuscated name of <code>x</code>.<ref>For version 1.16.2, <code>func_71410_x</code> refers to <code>Minecraft.getInstance</code>, with real obfuscated name of <code>B</code>.</ref> | ||
− | + | == MCP Mapping == | |
− | + | WIP | |
− | |||
− | |||
− | |||
− | |||
− | |||
== Decompilation == | == Decompilation == | ||
Line 53: | Line 68: | ||
TBD | TBD | ||
− | <references/> | + | == See also == |
+ | * [https://en.wikipedia.org/wiki/Obfuscation_(software) Obfuscation] | ||
+ | |||
+ | <references> | ||
+ | <ref name="retrace">For ProGuard users (such as Mojang), this is done using [https://www.guardsquare.com/en/products/proguard/manual/retrace ReTrace].</ref> | ||
+ | <ref name="mojmappings">Mojang recently released their deobfuscation mappings for Minecraft (colloquially named <code>mojmappings</code>), but the licensing for its uses is a bit ambiguous. [https://cpw.github.io/MinecraftMappingData See this post by cpw] for more information.</ref> | ||
+ | </references> |
Revision as of 06:16, 28 October 2020
The toolchain of Forge automatically deobfuscates the base game code, from their obfuscated original names to deobfuscated and human-readable names. This process is done to allow mod developers to write their code based on readable and understandable class/method/field names, which greatly simplifies the modding process.
Obfuscation
Obfuscation is the process of renaming all of the fields, methods, and classes of the source code into unreadable, machine-generated names (such as aaa
, bC
), and removing package structures. This is commonly used by companies to prevent external entities from decompiling their released binaries/executables and retrieving their source code/intellectual property.
Minecraft is a commercial game, which means the source code is unavailable to all except the developers of Mojang. To prevent piracy and the copying of their intellectual property, Mojang applies this obfuscation process to the game before they release it. They use a tool called ProGuard, which is both an optimizer[1] and an obfuscator.
Problematic Naming
There are a few big problems with using these obfuscated names directly for modding.
- It is incredibly difficult to create mods using these obfuscated names. It requires immense patience to reverse-engineer the meanings behind each and every name, and to keep relating those names to what was already reverse-engineered.
- Because the obfuscation process takes place after compilation (the obfuscator operates on the compiled classes), the obfuscated names are not handled by the compiler. Thus, obfuscated classes may contain member[2] names that are invalid in the Java source language, but valid in compiled bytecode; this means that the decompiled source of the game may not be recompilable.
- These obfuscated names are automatically generated by the obfuscator for each independent release. This means that the obfuscated names may change signficantly between any two versions, making it harder for mod developers to update mods between releases.
Deobfuscation
The solution to these problems is a process known as deobfuscation, where these obfuscated names are transformed into more readable names.
Normally, this process is done with the help of a deobfusation map, a file generated by the obfuscator that contains a map of the obfuscated names to the original, non-obfuscated names in the source code. This is commonly used on debofuscating stack traces outputted by an obfuscated program, for debugging purposes.[3] However, we don't have this deobfuscation map file, as this is only accessible to the developers of Minecraft[4].
To rectify this problem, Forge has it's own process to create deobfuscation mappings for the game, using community-sourced human-readable names. This process is split into two separate parts: the SRG renaming, and the MCP mapping.
SRG Renaming
First, the obfuscated names are renamed into SRG names.[5] Each obfuscated class, method, and field is assigned a unique number by the deobfuscator, through a sequential counter. This unique number is called the SRG ID of that class/method/field (henceforth called member).
The SRG name of the member is then derived from its SRG ID, its type (class, function, field, paramter), and (optionally) the obfuscated name of the object at the time it was given its SRG name[6]. This inclusion of the SRG ID into the name guarantees that the SRG name for all members are unique.
- For classes ->
c_###_
[7] - For functions/methods ->
func_###_
- For fields ->
field_###_
- For function/method parameters ->
p_###_#_
for normal methods,p_i###_#
for constructors; the second number is the index[8] of the parameter.
For example, func_71410_x
refers to a function with SRG ID 71410 and original obfuscated name of x
.[9]
MCP Mapping
WIP
Decompilation
More general info
FernFlower/ForgeFlower
Forge uses ForgeFlower, which is a Fork of FernFlower. FernFlower is a decompiler that takes the compiled Minecraft Jar and turn it in to semi readable source code.
Why semi readable?
Because the decompiler has no sense of formatting or indentation.
Mappings
We don't need this step but coding with the SRG
names is hard and not really fun.
So we need to take the SRG
names and make them more user friendly, the user friendly names are community sourced names.
Since SRG
names are uniquely identifiable, due to the unique ID, it literally does a search-and-replace on all the source code text to do that SRG
->MCP
(( MCP
stands for ModCoderPack (or previously, Minecraft Coder Pack) which was the toolchain that did all of this before the advent of ForgeGradle)) application.
Some additional Infos
- You will never see c_XXX_ for classes, because the class names are picked beforehand
- this is still crowdsourced, but before a new version is ready for Forge, all classes are given an
MCP
name - this
MCP
name stays constant throughout the game version it was picked for; it can change between versions, but it usually wont for already-named classes (except for misspells, typos, and misnames)
- this is still crowdsourced, but before a new version is ready for Forge, all classes are given an
- If you look into the JAR, you won't see any packages for the obfuscated classes, but the deobfuscated classes do have the packages, this is because the same process that names the classes, also decides what package they belong to
- parameters have special names
- there are two types of parameter names:
p_XXX_X_
andp_iXXX_X_
- the one with the i means that it's a parameter for a constructor
- the first set of numbers are the SRG ID of their parent method, and the second number denotes the index of the parameter
- the index of the parameter is a bit more involved, but this will not be explained here
- there are two types of parameter names:
- If you look into the source, you'll see that parameters for lambdas don't have mapped names
- This is because of a complication in how the lambdas are compiled/decompiled; it's a more advanced topic which involves how the compiler compiles lambdas which we will not explain here.
Reobfuscation
TBD
See also
- ↑ An optimizer is a program that removes redundant/unused instructions and compacts the code to be faster and smaller.
- ↑ member refers to class fields and methods.
- ↑ For ProGuard users (such as Mojang), this is done using ReTrace.
- ↑ Mojang recently released their deobfuscation mappings for Minecraft (colloquially named
mojmappings
), but the licensing for its uses is a bit ambiguous. See this post by cpw for more information. - ↑ SRG stands for SeaRGe, co-author of the Mod Creator Pack, who created this process.
- ↑ The obfuscated name included with the SRG name should not be taken as the object's real obfuscated name, as it is only true for the version for which the object was first deobfuscated.
- ↑ Due to how the deobfuscation process is implemented in Forge, all classes are given their MCP names before being released to the public. Mod developers will never see the SRG names of classes outside of a deobfuscation workspace.
- ↑ The index of a parameter is determined by the preceeding parameters, where
double
andlong
increments it by 2, and all other primitives and reference types increments it by 1. - ↑ For version 1.16.2,
func_71410_x
refers toMinecraft.getInstance
, with real obfuscated name ofB
.