Programmers guide to XCOM:EU 2012 modding
This is a high level view of what is generally involved in modding XCOM:EU 2012, presented to someone who already has a basic competency in programming in general. It is taken from the Nexus XCOM Modding forum Long War Discussion thread. While the topic is the Long War mod, it applies to all mods in general. The discussion began shortly after the "Enemy Within" expansion was released on 12 Nov 2013, while most of the changes that would be needed for Long War were still unknown.
Programs and Tools
At this point the biggest amount of work is simply updating for the (minor) "Enemy Unknown" patch. Basically all of the code-wise changes should be unchanged (code as in what UE Explorer kicks out). However jump offsets and function/variable references have changed so the before/after hex replacements don't work any more.
In the Miscellaneous files section of Long War is the Long War 2 hex and localization changes file. It contains basically every hex change made for Long War. Currently it's for the 2.12 beta4, which in terms of hex/localization changes is identical to 2.12 beta5 and 2.12 release.
Most (but not all) of the changes refer to the class.function that is being changed. In that function in the XCOM:EU with patch 5 should be very similar code but with function/variable references changed. The challenge is finding the new "before" hex, and then generating new "after" hex with correct references.
The file has 3000 lines and is ~780k, so there's a fair number of changes.
From what I gather, using this UE Explorer, you can decompile the .UPK files (which are apparently like .DLL files utilized by the XCom EXE) into readable code, modify the function parameters (as you can't physically change the functions themselves) and see how that effects the binary afterwards. Then, you have to match these references so you can get the code to perform "new functionality"?
I suppose in a way .UPK files are like .DLL files in terms of extensibility at runtime. However in a way they are also like java files.
.UPK stands for unreal package. This is compiled hexcode that is designed to be interpreted by the Unreal Engine, which is really a virtual machine optimized for displaying and animating 3D environments (with physics, shooting and all that). .UPKs aren't really quite as machine independent as something like java because 3D shooters are much more processing intensive, so there is a 'cooking' process which performs some low level optimizations on the UPK hex for a particular hardware platform (stuff like little-vs-big endian). So the code gets cooked slightly differently for PC vs XBOX vs PS3, for example.
There is also a part of the code that is "native" code, which is an executable designed to run specifically on the hardware/OS of the target platform. The Unreal Engine is designed to be able to make function calls back and forth to/from native/unreal code. Possible callbacks are determined at compile/link time. So the game launches with the executable but the 'thread of execution' can pass into the UPK code quite easily.
.UPKs also contain a lot of art assets. Most of the UPK data for XCOM is art assets: 3D meshs, textures, animations, and sound data. This is all packed/cooked into UPKs. Mapfiles are also UPKs. In Unreal Engine 3 there is also a movie player (plays .BIK files) and a flash/actionscript interpreter which allows running embedded actionscript (they are projected onto a 3D surface in-game).
So far we've done extremely limited things with regard to the executable (pretty much just stuff like unlocking the developer console, disabling the SHA checks in EU, and force-loading config data from files). Most of the changes we've made have been to the Unreal script, although we've made some progress in modding actionscript as well (allowing for limited UI changes).
Access is done via using UE Explorer (see Modding Tools - XCOM:EU 2012) to decompile the hex code. Fortunately the namelist in Unreal includes a text field, which allows the original variable/function names to be re-constructed on decompile (otherwise it would be near impossible to understand). Using the decompiler we can understand what the unrealscript portion of the game is doing.
Unfortunately there is no compiler. Creating such a compiler would pose some tricky issues, in particular having access to the namelist references (which in theory could be read from the upk). Currently my workflow is to:
- Decompile the UPK and find the section I'm interested in;
- Decipher the decompiled code functionality;
- Decide on changes desired;
- Write high level C++ style target code;
- Manually convert each line of target code into hex;
- Replace original hex with new hex;
- Decompile new hex to verify correctness;
- Iterate steps 4-7 until new code runs as desired.
It's definitely rather clunky and is slower than just writing code and running the compiler. Basically the (virtual) machine code is created by hand, and there is no IDE, which makes debugging more tedious. But it's the best we have so far.
To give you a better idea of what these changes entail, here's some example hex code.
This is from the function XComStrategyGame.upk / XGSoldierUI.UpdateHeader. The modlet is the one that adds the soldier's XP and Move stats to the display in the barracks.
In Enemy Unknown Patch 4 one change is the following line/hex:
m_kHeader.txtSpeed.StrValue = (m_kHeader.txtSpeed.StrValue $ "+") $ string(aModifiers) 0F 35 DD F9 FF FF 6C FA FF FF 00 00 35 47 15 00 00 4E 15 00 00 00 01 01 80 3F 00 00 70 70 35 DD F9 FF FF 6C FA FF FF 00 00 35 47 15 00 00 4E 15 00 00 00 01 01 80 3F 00 00 1F 2B 00 16 38 53 1A 24 03 00 FA 3F 00 00 16
In Enemy Within (in the XEW folder) the change is instead:
m_kHeader.txtSpeed.StrValue = (m_kHeader.txtSpeed.StrValue $ "+") $ string(aModifiers) 0F 35 0C F9 FF FF B9 F9 FF FF 00 00 35 49 1A 00 00 50 1A 00 00 00 01 01 7F 4C 00 00 70 70 35 0C F9 FF FF B9 F9 FF FF 00 00 35 49 1A 00 00 50 1A 00 00 00 01 01 7F 4C 00 00 1F 2B 00 16 38 53 1A 2C 03 00 02 4D 00 00 16
Hex values such as 0F (for assignment), 35 (for struct construction), 19 (for context construction), and 1B (for virtual function definition) remain unchanged.
However all of the 4-byte hex references for things such as aModifiers, m_kHeader, txtSpeed, etc are all different.
Is there a resources of values, as per your explanation of (0F = for assignment, 35 = for struct construction, ...) or is it basically trial and error?
There is definitely a list of what hex values are used for what: Hex values XCOM Modding. Quite a bit of other good stuff that has collected there. You should also reference these wiki articles:
Just keep in mind that a particular byte value might be used in different ways depending on its position/context.
So 0F might be an assignment operator if starting a new command, but 2C 0F would represent the integer 15, for example. 2C is the hex used to declare integer values (but only using a single following byte).
Or 07 0F 00, which would represent a conditional jump to offset 0x000F.
Fortunately it's not all that bad, as UE Explorer's token view breaks down the hex-to-code on a line-to-line correspondence.
I am going to assume we can't change the signature of a function because the EXE is looking for the return type and parameters involved, thus "rewriting" functions actually just means repurposing the statements within the scope of the function, correct? Or, is it possible to rewrite entire functions?
We have re-written the entire contents of a function before.
Sometimes we are totally changing how the function works. Other times we find unused functions (stuff like debug functions, or functions for features that were cut from the released game).
Things that can't be done:
- Increase the number of hex bytes inside a function
- Changing the number or type of parameters
- Changing the return value type
- Technically can't add new local variables, but it is possible to re-use local variables from other functions. Care has to be taken with this, however.
Also, it's not possible to add a new function to an existing class, or to add new classes to a UPK.
Ok, so the size of the function must remain the same (thus take up the same virtual space in memory), the signature of the function must stay the same (inputs/outputs), and it's pretty hard to create new local variables within a function?
Well, the really interesting part is that the memory size of the function does not have to remain the same. Only the file size of the function.
This is another interesting 'feature' of hex editing Unreal code. The file size of the hex and the memory size generally are not the same. Some constructions when loaded occupy additional memory space. For example a 00 ## ## ## ## local variable reference is 5 file bytes but 9 memory bytes. Memory size always increases in 4-byte word chunks.
I think that this is because the master table where all of the function references are looks them up based on file position. The loader at run-time then maintains pointers to the various functions after they are loaded.
Each function has a 48 byte header consisting of 12 4-byte words. The 11th word in the header is the function's memory size. If the function's memory size has changed (without the file size changing, of course), then this value has to be updated to the new value or the game will CTD immediately on launch.
All of this is core to how Unreal works, and so is unchanged for the EW expansion.
Referred to by this article:
- Long War Discussion
- Hex values XCOM Modding
That refer to this article:
- None as yet