Sprite decompression (Generation I)

This article is incomplete. Please feel free to add any missing information about the subject. It is missing: details of important steps.

In Pokémon Red, Blue, and Yellow, all front and back sprites of Pokémon and Trainers are stored in a compressed form in the ROM, and are decompressed only when used^[1]. The compression format first uses one of several different forms of differential encoding to increase the number of zero bits in the data, then encodes the data using a run-length encoding (RLE) on zero bits. The decompression routine makes use of part of SRAM bank 0 as a buffer, meaning that it needs to unlock the SRAM beforehand^[2], but in Red/Blue it never locks the SRAM afterwards^[3], making it easier for glitches to corrupt SRAM bank 0 (and to a lesser extent, the SRAM in general). Furthermore, the decompression routine does not check the size of the sprite adequately, so when decompressing sprites of glitch Pokémon or Trainers, the routine itself can easily overflow the buffer and cause various memory corruptions, most famously in the Hall of Fame (which is stored in SRAM bank 0 after the buffer), but also in the WRAM if the sprite is large enough.

Overview

Buffer usage

The sprite decompression routine makes use of a region at the beginning of SRAM bank 0 as a buffer, which is further divided into three buffers^[4], each of size 7×7×8 = 392 bytes (0x188 bytes in hexadecimal), meaning that each is able to hold (7×8)×(7×8) bits, which is equal to the number of pixels in the largest valid sprite (with 7×7 tiles) in the game. Since Game Boy sprites are in 4 colors (i.e. 2 bits of data per pixel), it takes two such buffers to hold the complete decompressed data for a sprite, and a third buffer is used to facilitate manipulation of the decompressed data. Furthermore, there is an unused 256-byte space after the buffers and before the Hall of Fame data^[5], which means that some glitch sprites with size only slightly larger than 7×7 won't corrupt the Hall of Fame. The memory addresses of these buffers are shown below:

`sSpriteBuffer0`	`sSpriteBuffer1`	`sSpriteBuffer2`	Unused space
$A000 – $A187	$A188 – $A30F	$A310 – $A497	$A498 – $A597

Steps

The decompression routine works on a stream of input bytes (which is for the most part regarded as a stream of bits), and consists of the following steps:

Clear sSpriteBuffer1 and sSpriteBuffer2, filling them with zeros.

This step is important because the run-length decoding procedure writes bits to the buffers in a peculiar order (the data is stored in vertical columns of 2 pixels each, but each byte represents a horizontal row of 8 pixels, so only 2 bits out of 8 in a byte is written each time) with bitwise OR, which assumes that those buffers start with zeros.

Determine the size of the sprite in pixels from the first byte of the input stream.
- Notably, this is different from the byte indicating the sprite size in the base stats data structure (the latter byte is loaded into the RAM in wMonHSpriteDim), which is used only for centering the decompressed front sprite, as mentioned below.
Use a run-length decoding procedure to decode input data into two bit planes, each with the same number of bits as the number of pixels in the sprite.
- The two bit planes are stored in sSpriteBuffer1 and sSpriteBuffer2 respectively. Which buffer is used first depends on a bit in the input stream.
- For each bit plane, the decoding algorithm runs as an endless loop, consuming bits from the input stream, and writing bits to the buffer by calling the functions WriteSpriteBitsToBuffer and MoveToNextBufferPosition^[6]. The latter function is responsible for checking the current position against the sprite size, and when the appropriate number of bits are written to the buffer, it forcibly terminates the endless loop by removing the return address from the stack, and instead jumping to either the beginning of this step (to decode a second bit plane), or the beginning of the next step.
"Unpack" the two bit planes with a differential decoding procedure, which depends on a "unpacking mode" read from the input stream to the RAM incidentally in the previous step.
- There are three different unpacking modes, but whether sSpriteBuffer1 or sSpriteBuffer2 is used first in the previous step will also affect the unpacking procedure, making 6 different ways of unpacking in total.

Immediately after decompression, the sprite is post-processed into an appropriate format, and loaded into the VRAM. This is not technically part of the decompression routine, but is so closely related that in practice it is an important part of the process:

Move the bit planes from sSpriteBuffer1 and sSpriteBuffer2 to sSpriteBuffer0 and sSpriteBuffer1 respectively, and in the process resizing them to 7×7 in a way depending on the type of the sprite:

For front sprites, this involves putting them in the bottom center of the available 7×7 space^[7]. All valid front sprites are either 5×5, 6×6, or 7×7 (6×6 sprites are right aligned), which is also reflected in the aforementioned byte loaded into wMonHSpriteDim. The amount of empty space to add is computed with a formula^[8], but the computation is prone to arithmetic overflow for invalid sprite sizes, which is responsible for the characteristic backwards "L" shape of the front sprite of MissingNo. and some other glitch Pokémon.
For back sprites, this involves scaling them up by 2 (which causes back sprites to have a lower resolution than front sprites)^[9]. All valid back sprites are 4×4, and when scaled up the last row and the last column of tiles in the result are discarded, resulting in a 7×7 size. (All those sizes are hardcoded in the function doing the scaling.)

Merge the 7×7 bit planes in sSpriteBuffer0 and sSpriteBuffer1 into a single chunk by interlacing the bytes, resulting in a format recognized by the Game Boy, and load the result into the VRAM.^[10]
- The result of this step is stored in the buffers sSpriteBuffer1 and sSpriteBuffer2 concatenated. The function works backwards from the ends of the buffers, so that the output pointer only "catches up with" with the second input pointer at the end of the process, and will not overwrite input bytes prematurely.

Sprite size

As mentioned above, the size of each valid Pokémon sprite is stored in two ways:

As the first byte of the compressed data stream, with the high nybble indicating the width and the low nybble indicating the height. For example, for a 5×5 sprite, the value of this byte will be 0x55.
For front sprites, there is a byte indicating the sprite size in the base stats data structure, with the high nybble indicating the height and the low nybble indicating the width (this is the opposite way around compared to the other byte, although for valid Pokémon sprites this does not actually matter since they all have equal width and height). Back sprites are always assumed to be 4×4 before scaling.

For glitch Pokémon, those two sizes often mismatch. The size indicated first byte of the compressed data stream is often called the actual sprite size, and the size indicated by the byte in the base stats data structure is often called the base stats sprite size. The front and back sprites for a glitch Pokémon each has its own actual sprite size, but there is only one base stats sprite size for the front sprite. Pokémon from the same family always share the same base stats sprite size, but their sprites may be different since those sprite data may be in different banks (even though they share the same two-byte pointer), in which case each sprite will have its own actual size.

The actual sprite size determines whether the sprite causes memory corruption, and how much. For an actual sprite size of w×h, the sprite decompression routine will always read data from the input stream until it has output (w×8)×(h×8) bits, i.e., w×h×8 bytes, for each of the two bit planes. Each bit plane is written to a contiguous region in the memory, starting from sSpriteBuffer1 and sSpriteBuffer2 respectively, meaning that they will overlap each other if w×h > 49. Then the bit planes are unpacked by operating on the same memory regions. Both decoding and unpacking are handled with double loops, and both w and h must be in the range of 0–15 since they are nybbles, so the only relevant possibility of arithmetic overflow is that when (w×8) and/or (h×8) are 0, they are usually treated as if they were 256, or equivalently that when w and/or h are 0, they are usually treated as if they were 32.

However, there is a single exception to this special treatment of 0: The delta decoding procedure, which is one of the two major subroutines of the unpacking procedure, actually uses arithmetic (i.e. adding the h×8 to the pointer) to move the pointer to the next column^[11], instead of increasing the pointer for each row in the sprite. This means that, when h is 0, this procedure would be unable to move to the next column (since it keeps adding 0 to the pointer), and instead keep decoding bytes corresponding to the first column of the sprite buffer. This special behavior is particularly important because the other major subroutine of the unpacking procedure, XOR unpacking, does not happen for unpacking mode 0, whereas delta decoding is guaranteed to happen at least once.

Since the distance between sSpriteBuffer2 and the start of the Hall of Fame data is 392+256 = 648, Hall of Fame corruption will happen when w×h×8 > 648, i.e., w×h > 81. Meanwhile, WRAM corruption will only happen when w×h > 926, which is only possible for a sprite with actual size 0×0 (e.g., Yellow MissingNo.), where both w and h are treated as 32. If WRAM corruption does happen, it will be able to affect WRAM addresses up to $C30F, which contain variables used by the sound engine as well as some sprite data^[12]. Corruption of those data is the causes of sound and/or sound bank corruption and the walking characters effect respectively.

A subtle point is that, even though the actual size of a sprite determines the memory region it may affect, its actual corruption effect will depend on both the detailed sprite data and the state of that memory region before corruption. In particular, some 0×0 sprites may not be able to touch certain WRAM addresses (e.g. the sound bank) due to a combination of several factors:

The run-length decoding procedure writes bits to the memory with bitwise OR, so if the bits it is supposed to write to a byte happen to be all zeros, it would not affect that byte. This is made more likely by the fact that the run-length decoding procedure is designed so that it is easier (i.e. more likely on random inputs) to produce zeros than ones.
The delta decoding procedure, as mentioned above, only affects the first column (i.e. first 256 bytes) of each sprite buffer when the sprite height is 0, meaning that it would not touch the WRAM in this case.
The XOR unpacking procedure may not happen at all if the unpacking mode (determined by the sprite data) happens to be 0.

Run-length decoding

The rest of the input stream is consumed as individual bits, in big endian order (i.e. the most significant bit of each byte comes first in the stream)^[13]. The overall layout of the data is:

A single bit, indicating which sprite buffer to use first: 0 means that sSpriteBuffer1 should be used first, and 1 means sSpriteBuffer2 should be used first^[14]. (As mentioned above, this will affect the differential decoding procedure later.)
A data chunk, which is the run-length encoding of a bit plane.
Either 1 or 2 bits, indicating the unpacking mode to use later: "0" for mode 0, "10" for mode 1, and "11" for mode 2^[15].
Another data chunk, encoding a second bit plane.

Chunk format

The run-length encoding works on pairs of bits from each bit plane. There are 4 possible bit pairs — 00, 01, 10, and 11 — and the differential encoding is designed to make 00 appear far more often than the other bit pairs, and mostly in long "runs". Therefore, each sequence of 00 pairs is encoded as a single 00 pair followed by a special encoding indicating the number of 00 pairs in the sequence, while the other three bit pairs are just encoded as themselves. As a small optimization, since the data is more likely than not to start with a run of 00 pairs, the first single bit of each chunk is special: A single 0 indicates that what follows should be interpreted as an encoded number of 00 pairs, while a single 1 indicates that the bit plane starts with non-00 bit pairs.^[16]

The special encoding for the number of 00 pairs is designed to be able to represent both small numbers and large numbers relatively efficiently. Each encoded number consists of two parts: The first part is any number (0 or more) of "1" bits followed by a "0" bit, and the second part is any bit string of the same length as the first part. The second part is interpreted as a binary number, with an offset added so that each positive integer is uniquely representable^[17]. For example, the bit string "00" encodes 1, "01" encodes 2, "1000" encodes 3, "1001" encodes 4, etc.

There is no terminator within the chunk itself; a chunk terminates when the decoder has written enough bits to fill the bit plane (the size of which being the same as the size of the sprite in pixels).

Bit order

The output of the decoder is written to the current sprite buffer in a peculiar order, presumably because the Game Boy hardware requires all graphics data to be in the format of 8×8 tiles. Specifically, this means:

Each byte contains data for 8 contiguous pixels in the same row of pixels.
Those bytes are best stored in column-major order (i.e. bytes from the same 8-pixel column should be contiguous in memory). This ensures that the 8 bytes of data for the same tile are stored together.

On the other hand, the compression algorithm works best on unbroken rows or columns of pixels. Therefore, the compromise solution used in Generation I Pokémon games is:

Each bit pair corresponds to two horizontally adjacent pixels (so that it falls in the same byte and is easier to write).
The bit pairs are processed in vertical 2-pixel columns that span the height of the entire sprite. This means that as long as the decoder is processing the same 2-pixel column, after writing the same bit pair, the output pointer (wSpriteOutputPtr) only needs to be increased by 1. However, when moving from one 2-pixel column to the next, the decoder needs to adjust an "offset" variable (wSpriteOutputBitOffset) that controls which two bits of a byte should be written to, and either move the output pointer back to the start of the column, or move forward normally every time four 2-pixel columns (sharing the same bytes) have been processed.

Within each byte, all bits are written in big endian order, which corresponds to a left-to-right pixel order (although there are subroutines to flip the bit order in case the entire sprite needs to be flipped). This means both that within each non-00 bit pair, the bit read first is written in a higher bit^[18], and that wSpriteOutputBitOffset counts down from 3 to 0^[19], where a higher value means that the decoder writes to a higher bit pair within the byte^[20].

All write operations are done with the bitwise OR operation. This works fine for valid sprites because both sprite buffers are filled with zeros beforehand. For glitch sprites that cause memory corruptions, this means that they don't cleanly overwrite the memory regions corrupted, but instead allow the original values in those memory region to play a part in the corruption during the following differential decoding step.

Differential decoding

Centering front sprites

YouTube videos


YouTube video by Retro Game Mechanics Explained


YouTube video by Retro Game Mechanics Explained

Attribution

Retro Game Mechanics Explained uploaded multiple YouTube videos (see above) that analyzed sprite decompression in detail, which formed one of the earliest publicly available documentations on this topic.
Kagamiin discovered that the delta decoding procedure (but not the XOR unpacking procedure) uses pointer arithmetic when moving to the next column, which does not treat a sprite height of 0 as 256, resulting in a "special" behavior for sprites with height 0.

External links

Retro Game Mechanics Explained's tool to compress and decompress sprites online (supports PNGs)
"Sprite import tools for RBY" can be downloaded within Evie's Google Sites backup here. It is based on the previous work of the GB Dev (for .2bpp encoding) and Skeetendo (for sprite compression) communities.

References

↑ The main function for decompressing Pokémon sprites
↑ The piece of code to unlock the SRAM and switch to bank 0 for sprite decompression in Red/Blue
↑ Yellow does correctly lock the SRAM after decompressing a sprite.
↑ The definition of the buffers in the SRAM
↑ The definition of the unused 256-byte space
↑ This pair of functions are called in two different places, but always in close succession.
↑ The function to copy a sprite buffer to another one (which is zeroed beforehand), and align it according to the given offset in hSpriteOffset
↑ The procedure to compute hSpriteOffset from wMonHSpriteDim
↑ The procedure to scale up back sprites by 2
↑ The function to both interlace the sprite buffers and load the result into the VRAM
↑ The lines of code to move to the next column in the delta decoding procedure
↑ The range of WRAM addresses that can be affected by sprite corruption, up to and including the entire 4-byte structure wShadowOAMSprite03
↑ The function to read the next input bit (note that the rlca instruction rotates the input byte left before taking its least significant bit)
↑ The lines of code to read and store this bit from the input stream
↑ The lines of code to read and store the unpacking mode (before decoding the second chunk)
↑ An alternative interpretation is that the chunk consists of two kinds of alternating "packets": RLE packets that include only the encoded number of 00 pairs, and data packets that include any number of non-00 bit pairs and are terminated by 00. The first bit of each chunk then just indicates whether the chuck starts with a RLE packet or a data packet.
↑ The offset is always 2ⁿ-1 where n is the length of each part, or alternatively L+1 where L is the first part interpreted as a binary number, but the implementation in Generation I Pokémon games uses a lookup table to store the offset values up to n=16.
↑ The lines of code to read a bit pair (the bit read first is stored in register c, shifted left once, then combined with the bit read second with bitwise OR)
↑ The variable wSpriteOutputBitOffset is initialized to 3, decreased for each 2-pixel column, and reset to 3 if it was already 0.
↑ The lines of code handling each possible value of wSpriteOutputBitOffset

[1] The main function for decompressing Pokémon sprites

[2] The piece of code to unlock the SRAM and switch to bank 0 for sprite decompression in Red/Blue

[3] Yellow does correctly lock the SRAM after decompressing a sprite.

[4] The definition of the buffers in the SRAM

[5] The definition of the unused 256-byte space

[6] This pair of functions are called in two different places, but always in close succession.

[7] The function to copy a sprite buffer to another one (which is zeroed beforehand), and align it according to the given offset in hSpriteOffset

[8] The procedure to compute hSpriteOffset from wMonHSpriteDim

[9] The procedure to scale up back sprites by 2

[10] The function to both interlace the sprite buffers and load the result into the VRAM

[11] The lines of code to move to the next column in the delta decoding procedure

[12] The range of WRAM addresses that can be affected by sprite corruption, up to and including the entire 4-byte structure wShadowOAMSprite03

[13] The function to read the next input bit (note that the rlca instruction rotates the input byte left before taking its least significant bit)

[14] The lines of code to read and store this bit from the input stream

[15] The lines of code to read and store the unpacking mode (before decoding the second chunk)

[16] An alternative interpretation is that the chunk consists of two kinds of alternating "packets": RLE packets that include only the encoded number of 00 pairs, and data packets that include any number of non-00 bit pairs and are terminated by 00. The first bit of each chunk then just indicates whether the chuck starts with a RLE packet or a data packet.

[17] The offset is always 2ⁿ-1 where n is the length of each part, or alternatively L+1 where L is the first part interpreted as a binary number, but the implementation in Generation I Pokémon games uses a lookup table to store the offset values up to n=16.

[18] The lines of code to read a bit pair (the bit read first is stored in register c, shifted left once, then combined with the bit read second with bitwise OR)

[19] The variable wSpriteOutputBitOffset is initialized to 3, decreased for each 2-pixel column, and reset to 3 if it was already 0.

[20] The lines of code handling each possible value of wSpriteOutputBitOffset

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

Sprite decompression (Generation I)

Contents

Overview

Buffer usage

Steps

Sprite size

Run-length decoding

Chunk format

Bit order

Differential decoding

Centering front sprites

YouTube videos

Attribution

See also

External links

References

Navigation menu

Sprite decompression (Generation I)

Overview

Buffer usage

Steps

Sprite size

Run-length decoding

Chunk format

Bit order

Differential decoding

Centering front sprites

YouTube videos

Attribution

See also

External links

References

Navigation menu

Search