UPK File Format - XCOM:EU 2012

From Nexus Mods Wiki
Jump to: navigation, search


Overview

This is a review of the Unreal Package File Format used in XCOM:EU 2012. It's derived from Eliot van Uytfanghe's website (see #References) and investigations in the Nexus Forums 'Mod Talk' thread UPK File Format.

wghost81 (primary forum investigator into the UPKFF for XCOM) has created a PDF version of this information from various sources, which is still in development. The latest version is available as UPK_Format.PDF and goes into greater detail. Her primary source was Unreal Package File Format, which is recommended reading for all interested parties.

Programs and Tools

Details

Unreal Engine 3 uses one format for all its own files. Every unreal file starts with a signature(0x9E2A83C1) which is used to avoid file extension conflicts.

  • The UPK file consists of:
    • package header
    • name table
    • export table
    • import table
    • actual objects data
Technically, lists are part of the header, as header size counts lists in.

The Unreal package file format is complex, and really only useful to those who want to read and write Unreal files. This is because:

  • Unreal files can contain a wide range of objects (there are hundreds of unique classes defined in Unreal), versus perhaps 20-30 for Quake.
  • Unreal objects can contain complex inter-dependencies: for example, a level package can refer to a texture package, which can refer to a class package, which can contain a class that refers to a sound in a sound package. So, each package file needs to keep track of not only the objects it contains, but also the external objects that it refers to.
  • Unreal objects are expandable by users, so we needed to create a generalized system for serializing (a.k.a. saving and loading) these complex user-defined objects.
  • Unreal objects are scoped within hierarchical packages (much like DOS and Windows files are scoped within a system of hierarchical directories). Therefore, the entire tree structure of the object scope hierarchy must be described within the file.

Data types stored in package files consist of:

  • The "int" type is signed, 4 bytes long, and is stored in Intel byte order (i.e. little endian).
  • The "float" type is signed, IEEE standard, 4 bytes long, and stored in Intel byte order.
  • The "string" type is stored as a bunch of characters (one nonzero byte each) terminated by a zero byte.
  • The "FCompactIndex type" corresponds to a 32-bit signed value but are stored in a compacted format. (See the Epic Games Unreal Packages Description for details.)
  • The "FGuid" type corresponds to a set of four consecutive int's defining a globally unique identifier.

Object and Name flags

Each object and name stored in a package file can contain any combination of the following bitflags:

  • RF_LoadForClient (0x00010000): Must be loaded for game client.
  • RF_LoadForServer (0x00020000): Must be loaded for game server.
  • RF_LoadForEdit (0x00040000): Must be loaded for editor.
  • RF_Public (0x00000004): Object may be imported by other package files.
  • RF_Standalone (0x00080000): Keep object around (don't garbage collect) for editor even if unreferenced.
  • RF_Intrinsic (0x04000000): Class or name is defined in C++ code and must be bound at load-time.
  • RF_SourceModified (0x00000020): The external data source corresponding to this object has been modified.
  • RF_Transactional (0x00000001): Object must be tracked in the editor by the Undo/Redo tracking system.
  • RF_HasStack (0x02000000): This object has an execution stack allocated and is ready to execute UnrealScript code.

Any other bitflags stored in package files are irrelevant, should be set to zero when saving a package file, and should be ignored when loading a package file.

Header

NOTE: All references here to offsets are file offsets, which are absolute; calculated from the start (byte 'zero') of the file.

  • Package Header structure
    • int Signature
    • int Version/LicenseeVersion (Low order WORD is the file version, high order WORD is the LicenseeVersion)
    • int HeaderSize
    • int FolderNameLength
    • string FolderName
    • int PackageFlags : This holds the packageflags such as AllowDownload and ServerSideOnly, etc.
    • int NameCount  : Number of names stored in the name table. Always >= 0.
    • int NameOffset  : Offset into the file of the name table, in bytes. 0 designates the first byte of the file, of course.
    • int ExportCount  : Number of exported objects in the export table. Always >= 0.
    • int ExportOffset : Offset into the file of the export table.
    • int ImportCount  : Number of imported objects in the import table. Always >= 0.
    • int ImportOffset : Offset into the file of the import table.
    • int DependsOffset: seems to point to some object inside import table.
    • int Unknown1 (seems to be equal to HeaderSize)
    • int Unknown2
    • int Unknown3
    • int Unknown4
    • FGuid GUID  : Package Globally Unique IDentifier
    • int GenerationsCount
    • <GenerationStructSize>xGenerationsCount generations
    • int EngineVersion
    • int CookerVersion
    • int CompressionFlags
    • variable size unknown data chunk


Package Header sub-structures

  • Generation is int x 3 structure:
    • int ExportCount
    • int NameCount
    • int NetObjectCount

Tables

The following tables reside in the file at the offset specified by the header. If a table contains zero entries (as specified in the header), the offset value is meaningless and should be ignored.

  • Name table: Begins with NameOffset and contains a list of NameCount human-readable Unreal names (which correspond to the UnrealScript "name" datatype and the C++ "FName" data type). Each entry is variable-size and has following format:
    • int NameLength
    • string NameString: String representation of the name, up to NAME_SIZE characters (currently 64, may increase with future versions).
    • int NameFlagsH: Name Flags
    • int NameFlagsL: Name Flags - Internal flags describing the name. (See "Object and Name Flags" above.)
  • Export table: Begins with ExportOffset and contains a list of ExportCount entries: objects contained in (a.k.a. "exported by") this file (similar to a Windows DLL's export table.). Each entry is variable-size and has following format:
    • int ObjTypeRef: Points to the class object describing the class of this object.
    • int ParentClassRef
    • int OwnerRef: Points to the owner object of this object.
    • int NameTableIdx: This object's name.
    • int NameCount: zero for unique object names, object count for non-unique object names. If non-zero, a string "_N" is added to object name, where N is (NameCount-1).
    • int Field6
    • int ObjectFlagsH: Object Flags
    • int ObjectFlagsL: Object Flags - Internal flags describing the object. (See "Object and Name Flags" above.)
    • int ObjectFileSize: Size (in bytes) of the object's serialized data stored in this file.
    • int DataOffset: Offset into this file of the start of the object's serialized data.
    • int Field11
    • int NumAdditionalFields
    • int Field13
    • int Field14
    • int Field15
    • int Field16
    • int Field17
    • intxNumAdditionalFields UnknownFields
  • Import table: begins with ImportOffset and contains a list of ImportCount entries: objects in other packages which this packages refers to (similar to a Windows DLL's import table.). Each entry is fixed-size and has following format:
    • int PackageIDIdx: The name of the package which this object's class object resides in (Name Table index)
    • int Unknown1
    • int ObjTypeIdx: The name of this object's class (Name Table index)
    • int Unknown2
    • int OwnerRef: Points to the owner object of this object.
    • int NameTableIdx: The name of this object.
    • int Unknown3

List/Table indexing

  • Name list is an array of data with a starting index of zero, found to begin at NameOffset (from the Header) and continuing for NameCount entries.
  • Object lists (AKA Export/Import Lists) are arrays of data with starting index of 1, as a zero-reference indicates a null-object. They are found at their respective Offset (from the Header) and continue for their respective Count entries.
  • Object references are positive values for Export Table and negative values for Import Table. Zero reference indicates a null-object.
  • Export and import lists occupy the same array space. Export starts at +1 and counts up, while Import starts at -1 and counts down. If they started at 0 there would be a collision where the 0 value reference could ambiguously be from either list.
  • When unpacked to memory arrays, object table entries should be pre-pended with a null-object with zero starting index, and then other objects from the Object (Export or Import) Table appended to the list. This way object indexes from corresponding tables can be used directly to access referenced object.

Objects data format

Object data format depends on object type. For example, Function has 0x30 bytes of header; the last 8 bytes are memory size and file size respectively. Then are FileSize bytes of actual code, ending with:

53 — end of function
00 00 00 — zeros
XX XX XX XX — function flags
YY YY YY YY — function name (index to namelist table)
00 00 00 00 — zeros

According to UE Explorer, 0x00000400 flag is Native function flag. (See section Function Flags, below.) So, theoretically, if we switch this flag off, we will be able to re-write native functions. Of course, there may be thousands of bytecodes still assuming its a native function. But in certain cases this has been shown to permit re-writing a native function into a simulated native function. (See PatcherGUI mod listing in UPK File Format thread post 34.) There is also a "Defined" flag in UE Explorer:

  • 0x00000002 Defined

Haven't found any reliable information on this, but it seems this flag is set if the function is not empty, i.e. "defined". So, it seems, if we want to re-define a native function as scripted, we need to remove the Native flag and add the Defined flag. At present it is not known what effect this has upon the scope of the original native function, particularly in cases where it is called by another native function.

Within the context of a function's hex are what might be referred to as absolute and relative jump offsets. An absolute jump offset measures a jump's target position relative to the first byte following the function header. A relative jump offset measures a jump target's position relative to the current position. Both of these are measured in memory byte size. As it turns out memory byte sizes aren't that hard to compute once you know where all of the references are, and how many. Every reference adds +4 memory bytes. (This applies only to Import/Export table references; not virtual function references).

Function Flags

Function flags are related to Unreal Function Specifiers.

Flags are by definition Bitmasks of bit flags, each of which is used to store a binary value (yes/no). Flags are combined using bitwise OR operation. (See the article What are bit flags? for examples of how these are applied.)

Function flags to specifiers mapping:

  • 0x00002000 Static
  • 0x00000020 Singular
  • 0x00000400 Native
  • 0x00004000 NoExport
  • 0x00000200 Exec
  • 0x00000008 Latent
  • 0x00000004 Iterator
  • 0x00000100 Simulated
  • 0x00200000 Server
  • 0x01000000 Client
  • 0x00000080 Reliable
  •  ??? Unreliable
  • 0x00020000 Public
  • 0x00040000 Private
  • 0x00080000 Protected
  • 0x00001000 Operator
  • 0x00000010 PreOperator
  •  ??? PostOperator
  • 0x00000800 Event
  • 0x00008000 Const
  • 0x00000001 Final

Separate Content

Be careful when referring to information from other (non-XCOM) sources, as they do not always correspond with what can be actually seen in XCOM UPKs.

See also the following UE Library articles on Eliot van Uytfanghe's website.

References

Referred to by this article:



That refer to this article: