GB Programming: Difference between revisions

From Glitch City Wiki
Jump to navigation Jump to search
Content added Content deleted
>Torchickens
No edit summary
 
(20 intermediate revisions by 9 users not shown)
Line 1: Line 1:
Welcome to this tutorial ! The purpose of this page is to allow everyone to program anything on a Game Boy. This can sound complex, but the Game Boy is a very well-documented system, and programming for it can be learned quite easily.
Welcome to this tutorial! The purpose of this page is to allow everyone to program anything on a Game Boy. This may sound complex, but the Game Boy is a very well-documented system, and programming for it can be learned quite easily.


Our goal will first be to create [[arbitrary code execution]] programs, but later sections of this tutorial will give you tools for more general-purpose programming, so you'll be able to create your own games.
Our goal will first be to create [[arbitrary code execution]] programs, but later sections of this tutorial will give you tools for more general-purpose programming, so you'll be able to create your own games.


Let's get started !
Let's get started!



==A new world==
==A new world==
In this part is a collection of different terms, concepts and notations that are '''vital''' for the rest of this tutorial. Do NOT skip anything here unless the text specifies you can. I mean it.
In this part is a collection of different terms, concepts and notations that are '''vital''' for the rest of this tutorial. Do NOT skip anything here unless the text specifies you can. I mean it.


If you don't understand something later on, read this part again, and chances are, you'll understand it.
If you don't understand something later on, read that part again, and chances are, you'll understand it.


===Numeric systems===
===Numeric systems===
Line 28: Line 27:
However, decimal is the base humans like to count in. But computers don't. Instead, they prefer '''binary'''. Binary is '''base 2''', that is, instead of working with powers of 10, we work with powers of 2. Also, only 2 symbols are allowed, 0 and 1. Each of them is called a '''bit'''.
However, decimal is the base humans like to count in. But computers don't. Instead, they prefer '''binary'''. Binary is '''base 2''', that is, instead of working with powers of 10, we work with powers of 2. Also, only 2 symbols are allowed, 0 and 1. Each of them is called a '''bit'''.


This paragraph is some trivia about why computers use binary instead of decimal. You can skip it if you will. So, why binary ? Because we need computers to be efficient. So we need to store information using electricity. The easiest variable to manipulate is "Is power running ?". The answer is either 0 (it doesn't) or 1 (it does). And there you got it ! Why do computers count in binary ? To keep them at reasonable prices !
This paragraph is some trivia about why computers use binary instead of decimal. You can skip it if you want. So, why binary? Because we need computers to be efficient. So we need to store information using electricity. The easiest variable to manipulate is "Is power running?". The answer is either 0 (it doesn't) or 1 (it does). And there you got it! Why do computers count in binary? To keep them at reasonable prices!


To differentiate decimal numbers from binary numbers, binary numbers will be prepended with a % symbol. So, 10 is decimal, and %10 is binary. Got it ? Okay.
To differentiate decimal numbers from binary numbers, binary numbers will be prepended with a % symbol. So, 10 is decimal, and %10 is binary. Got it? Okay.


Here is an example :
Here is an example :
Line 49: Line 48:
* A group of 32 bits is called a '''double word''' or '''dword'''.
* A group of 32 bits is called a '''double word''' or '''dword'''.
* A group of 64 bits is called a '''quadruple word''' or '''qword'''.
* A group of 64 bits is called a '''quadruple word''' or '''qword'''.
We will mostly be working with bytes, sometimes with words and rarely with nibbles. It is wery rare to work with other structures, so you may forget them if you will.
We will mostly be working with bytes, sometimes with words and rarely with nibbles. It is very rare to work with other structures, so you may forget them if you will.



Now, let's talk about '''hexadecimal'''. It is '''base 16''', so we will be working with 16 symbols : 0 1 2 3 4 5 6 7 8 9 A B C D E F. Again, we will prepend hex numbers with a $ to differentiate them.
Now, let's talk about '''hexadecimal'''. It is '''base 16''', so we will be working with 16 symbols : 0 1 2 3 4 5 6 7 8 9 A B C D E F. Again, we will prepend hex numbers with a $ to differentiate them.
Line 61: Line 59:
</pre>
</pre>


Why using hexadecimal ? Well, writing binary numbers is quite tedious. Take both examples together : we have 149 = %10010101 = $95. Now, consider the digits individually :
Why using hexadecimal? Well, writing binary numbers is quite tedious. Take both examples together : we have 149 = %10010101 = $95. Now, consider the digits individually :
<pre>
<pre>
$9 = %1001
$9 = %1001
Line 70: Line 68:
This way, we have a more readable way of writing numbers that can be converted to binary in a snap.
This way, we have a more readable way of writing numbers that can be converted to binary in a snap.


For the rest of this tutorial, we will mostly be using hexadecimal, but always remember the binary lying down below !
For the rest of this tutorial, we will mostly be using hexadecimal, but always remember the binary lying down below!




==A dip into technicals==
==A dip into technical information==
===Registers===
===Registers===
Registers are sections of RAM within the CPU itself. That is what you will be working with, alongside memory. But we'll see memory later.
Registers are sections of RAM within the CPU itself. That is what you will be working with, alongside memory. But we'll see memory later.


There are 8 different registers, which can be actually paired up. These are A, B, C, D, E, F, H and L. These are NOT hex digits, so beware !
There are 8 different registers, which can be actually paired up. These are A, B, C, D, E, F, H and L. These are NOT hex digits, so beware!


Any of these registers can hold an '''unsigned 8-bit value'''. That means :
Any of these registers can hold an '''unsigned 8-bit value'''. That means :
Line 95: Line 91:
* A is the '''Accumulator'''. It is the register you have to use to make arithmetic operations, and most of the time, memory access.
* A is the '''Accumulator'''. It is the register you have to use to make arithmetic operations, and most of the time, memory access.
* B is usually a 8-bit counter.
* B is usually a 8-bit counter.
* C is also used as a 8-bit counter, but also for port access. We'll see that wayy later.
* C is also used as a 8-bit counter, but also for port access. We'll see that way later.
* D, E, H and L have no special attribute as 8-bit. However, when paired, they do.
* D, E, H and L have no special attribute as 8-bit. However, when paired, they do.
* F holds the CPU's '''Flags'''. It is very special, as you cannot use it as a general-purpose register. You can't even directly access it ! We'll see how to use it later.
* F holds the CPU's '''Flags'''. It is very special, as you cannot use it as a general-purpose register. You can't even directly access it! We'll see how to use it later.
* HL is quite the equivalent of A, but is 16-bit. Its name is because it stores the '''High''' and '''Low''' bytes of a memory address.
* HL is quite the equivalent of A, but is 16-bit. Its name is because it stores the '''High''' and '''Low''' bytes of a memory address.
* BC is mostly used as a '''Byte Counter'''. It can also be used together with A to access memory.
* BC is mostly used as a '''Byte Counter'''. It can also be used together with A to access memory.
Line 108: Line 104:
|Store the value of ''source'' into ''destination''.
|Store the value of ''source'' into ''destination''.
|}
|}
Did I mention that nothing is case-sensitive ?
Did I mention that nothing is case-sensitive?


However, you can't do LD as you wish, there are restrictions :
However, you can't do LD as you wish, there are restrictions :
Line 127: Line 123:
|DE
|DE
|HL
|HL
|(BC)
|[BC]
|(DE)
|[DE]
|(HL)
|[HL]
|(imm16)
|[imm16]
|-
|-
|A
|A
Line 292: Line 288:
|Yes
|Yes
|-
|-
|(BC)
|[BC]
|Yes
|Yes
|No
|No
Line 308: Line 304:
|No
|No
|-
|-
|(DE)
|[DE]
|Yes
|Yes
|No
|No
Line 324: Line 320:
|No
|No
|-
|-
|(HL)
|[HL]
|Yes
|Yes
|Yes
|Yes
Line 340: Line 336:
|No
|No
|-
|-
|(imm16)
|[imm16]
|Yes
|Yes
|No
|No
Line 400: Line 396:
|Store value of register B into register D.
|Store value of register B into register D.
|-
|-
|''ld ($8325), a''
|''ld [$8325], a''
|Store the value of register A into memory address $8325.
|Store the value of register A into memory address $8325.
|}
|}
Line 408: Line 404:
Trying to do something like ''ld a, $100'' isn't possible. Like, physically impossible. You'll see why much, much later.
Trying to do something like ''ld a, $100'' isn't possible. Like, physically impossible. You'll see why much, much later.


Note that ''ld a, -1'' is valid, but actually, the "-1" wraps. Storing -1 will truly store 255 in a 8-bit register, and 65535 in a 16-bit register. Why ? Coming soon.
Note that ''ld a, -1'' is valid, but actually, the "-1" wraps. Storing -1 will truly store 255 in a 8-bit register, and 65535 in a 16-bit register. Why? Coming soon.


Notice that F and AF aren't usable anywhere. Actually, only a few instructions use them.
Notice that F and AF aren't usable anywhere. Actually, only a few instructions use them.



===Negative numbers===
===Negative numbers===
Time to confess : I've lied to you. Actually, 8-bit and 16-bit registers can hold negative numbers.
Time to confess: I've lied to you. Actually, 8-bit and 16-bit registers can hold negative numbers.


I've told you, "individual registers can hold unsigned 8-bit values, and pairs unsigned 16-bit values". However, these aren't true : these values can be signed. How does that work ?
I've told you, "individual registers can hold unsigned 8-bit values, and pairs unsigned 16-bit values". However, these aren't true: these values can be signed. How does that work?


What we will be doing is cutting our number range in half, and telling one half is composed of negative numbers. But how to distinguish positive and negative numbers ? Well, we tell the MSB is no longer meaning the symbol in front of 2^7, but it will give away the sign of the number (0 = positive, 1 = negative). So, instead of having values in ranges 0 - 255 and 0 - 65535, we will have values in ranges -128 - 127 and -32768 - 32767. Neat !
What we will be doing is cutting our number range in half, and telling one half is composed of negative numbers. But how to distinguish positive and negative numbers? Well, we tell the MSB is no longer meaning the symbol in front of 2^7, but it will give away the sign of the number (0 = positive, 1 = negative). So, instead of having values in ranges 0 to 255 and 0 to 65535, we will have values in ranges -128 to 127 and -32768 to 32767. Neat!


How to multiply by -1 ? Easy ! You can either :
How to multiply by -1? Easy! You can either :
* Calculate zero minus your number (just like in real life). However, you should consider 0 the same as 256 (in 8-bit mode) or 65536 (in 16-bit mode).
* Calculate zero minus your number (just like in real life). However, you should consider 0 the same as 256 (in 8-bit mode) or 65536 (in 16-bit mode).
* Flip the state of every bit, then add one.
* Flip the state of every bit, then add one.
Line 429: Line 424:
Add 1 %10000000
Add 1 %10000000
</pre>
</pre>
So, uh, -(-128) = -128 ? Oops. You cannot negate -128 with only 8 bits. Try with 16 bits, and you'll see it works ! However, with 16 bits, you can't negate -32768 for the same reason.
So, uh, -(-128) = -128? Oops. You cannot negate -128 with only 8 bits. Try with 16 bits, and you'll see it works! However, with 16 bits, you can't negate -32768 for the same reason.


Now, let's see how the CPU handles the difference between unsigned and signed values. Surprise, it doesn't ! Why ? Because making the same operations using signed or unsigned values give the same result !
Now, let's see how the CPU handles the difference between unsigned and signed values. Surprise, it doesn't! Why? Because making the same operations using signed or unsigned values give the same result!
<pre>
<pre>
unsigned signed
unsigned signed
Line 438: Line 433:
%1 00000000 = 256 = 0 (Disqualify ninth bit)
%1 00000000 = 256 = 0 (Disqualify ninth bit)
</pre>
</pre>
And, you just saw why I told you to consider 0 the same as 256 : they are similar ! As 256 uses 8 zero bits preceded by a 1... but the 1 is discarded.
And, you just saw why I told you to consider 0 the same as 256 : they are similar! As 256 uses 8 zero bits preceded by a 1... but the 1 is discarded.



===Memory===
===Memory===
Finally ! So what is "memory", you ask ? It's a stream of bytes. A stream of numbers. And guess what ? Everything is a stream of numbers. From numbers to this cute kitten video, everything is numbers. Including programs. That's why [[Arbitrary code execution]] happens. Remember this sentence : ''Data is whatever you define it to be.''
Finally! So what is "memory", you ask? It's a stream of bytes. A stream of numbers. And guess what? Everything is a stream of numbers. From numbers to this cute kitten video, everything is numbers. Including programs. That's why [[arbitrary code execution]] happens. Remember this sentence : ''Data is whatever you define it to be.''


Ever wondered why you could open a JPEG in Word ? Now you know.
Ever wondered why you could open a JPEG in Word? Now you know.


How is every byte differentiated from its neighbors ? Well, everyone gets a 16-bit unsigned integer called their "address" (thus ranging from $0000 to $FFFF). To access a byte, use its address, just like to reach your friend, you use his e-mail address.
How is every byte differentiated from its neighbors? Well, everyone gets a 16-bit unsigned integer called their "address" (thus ranging from $0000 to $FFFF). To access a byte, use its address, just like to reach your friend, you use his e-mail address.


So, how does running a program works ? What happens is that a special register is incremented (its value is raised by one), then the processor fetches the byte located at the address held by that register, and processes it as an opcode ; when done, everyting is repeated. Instructions can be one to three opcodes (bytes) large, so this cycle may repeat for a single instruction.
So, how does running a program works? What happens is that a special register is incremented (its value is raised by one), then the processor fetches the byte located at the address held by that register, and processes it as an opcode ; when done, everything is repeated. Instructions can be one to three opcodes (bytes) large, so this cycle may repeat for a single instruction.


So now, how to access memory ? With parentheses ! To access memory address $CD38, you just have to use ($CD38). Yay !
So now, how to access memory? With brackets! To access memory address $CD38, you just have to use [$CD38]. Yay!


To access the memory location pointed to by HL, just do... (hl) ! It's the same with BC and DE.
To access the memory location pointed to by HL, just do... [hl]! It's the same with BC and DE.


So, to retrieve the value at memory address $5611 into register A : ''ld a, ($6511)''
So, to retrieve the value at memory address $6511 into register A : ''ld a, [$6511]''


And to store the value of register C into the memory pointed to by HL : ''ld (hl), c''
And to store the value of register C into the memory pointed to by HL : ''ld [hl], c''


Remember to refer to the chart above for the legal LD combinations.
Remember to refer to the chart above for the legal LD combinations.


Obviously, ''ld ($6511), a'' will overwrite the previous value stored here. But ''ld ($6511), hl'' will store a 16-bit value, which is a word long, that is two bytes long ! So, not only will ($6511) be overwritten, but ($6512) too ! Always be very careful about the memory you're touching. Otherwise, stuff like the [[ZZAZZ Glitch]] happen.
Obviously, ''ld [$6511], a'' will overwrite the previous value stored here. But ''ld [$6511], hl'' will store a 16-bit value, which is a word long, that is two bytes long! So, not only will [$6511] be overwritten, but [$6512] too! Always be very careful about the memory you're touching. Otherwise, stuff like the [[ZZAZZ glitch]] happen.

For those wondering, ''ld a, ($6511)'' leaves ($6511) untouched.


For those wondering, ''ld a, [$6511]'' leaves [$6511] untouched.


==Flags==
==Flags==
Remember that "special" F register ? Well, each of its bits is called a "flag", and holds information about (usually) the accumulator. A flag is dubbed "set" if it equals 1, and "reset" otherwise.
Remember that "special" F register? Well, each of its bits is called a "flag", and holds information about (usually) the accumulator. A flag is dubbed "set" if it equals 1, and "reset" otherwise.


Here are the 8 flags :
Here are the 8 flags :
Line 472: Line 465:
|7||6||5||4||3||2||1||0
|7||6||5||4||3||2||1||0
|-
|-
|S||Z||-||H||-||P/V||N||C
|Z||N||H||C||-||-||-||-
|}
|}
Both "-" are unsued flags. They behavior is very complicated, and isn't "official". Treat them as random.
All "-" are unused flags. They are set to 0 at all times.

===S : Sign===
If the accumulator is negative (from a signed perspective), the flag is set. You can also view it as a copy of the accumulator's 7th bit.


===Z : Zero===
===Z : Zero===
Line 484: Line 474:
===H : Half-Carry===
===H : Half-Carry===
Works like the Carry flag, but referring to the least-significant ''nibble''. It is only used with the DAA instruction, so... forget it until then.
Works like the Carry flag, but referring to the least-significant ''nibble''. It is only used with the DAA instruction, so... forget it until then.

===P/V : Parity/Overflow===
This flag's meaning depends on the last operation.

If it means parity, it is set if the number of 1 in the accumulator is even. If it is odd, then it is reset.

If it means overflow, it is set if the last operation caused the accumulator's sign to change.


===N : Add/Subtract===
===N : Add/Subtract===
Line 501: Line 484:
* SCF sets it, and
* SCF sets it, and
* CCF inverts it.
* CCF inverts it.




==Manipulating data==
==Manipulating data==
===Instructions get !===
===Instructions get!===
Let's get these :
Let's get these :
{| class="wikitable"
{| class="wikitable"
!Syntax
!Syntax
!Effect
!Effect
!S
!Z
!Z
!P/V
!C
!C
|-
|-
|INC <nowiki>{reg8 | reg16 | (hl)}</nowiki>
|INC <nowiki>{reg8 | reg16 | [hl]}</nowiki>
|Adds one to the operand ("increments" it)
|Adds one to the operand ("increments" it)
|Affected, except for reg16
|Affected, except for reg16
|Affected, except for reg16
|Detects overflow, except for reg16
|Not affected
|Not affected
|-
|-
|DEC <nowiki>{reg8 | reg16 | (hl)}</nowiki>
|DEC <nowiki>{reg8 | reg16 | [hl]}</nowiki>
|Subtracts one to the operand ("decrements" it)
|Subtracts one to the operand ("decrements" it)
|Affected, except for reg16
|Affected, except for reg16
|Affected, except for reg16
|Detects overflow, except for reg16
|Not affected
|Not affected
|-
|-
|ADD A, <nowiki>{reg8 | reg16 | (hl)}</nowiki>
|ADD A, <nowiki>{reg8 | imm8 | [hl]}</nowiki>
|Adds the operand to the accumulator
|Adds the operand to the accumulator
|Affected
|Affected
|Affected
|Detects overflow
|Not affected
|Not affected
|-
|-
Line 539: Line 512:
|Adds the operand to HL
|Adds the operand to HL
|Affected
|Affected
|Affected
|Detects overflow
|Not affected
|Not affected
|-
|-
|SUB A, <nowiki>{reg8 | reg16 | (hl)}</nowiki>
|SUB <nowiki>{reg8 | imm8 | [hl]}</nowiki>
|Subtracts the operand from the accumulator
|Subtracts the operand from the accumulator. The syntax SUB A, <nowiki>{...}</nowiki> is also valid but less common.
|Affected
|Affected
|Affected
|Detects overflow
|Not affected
|-
|SBC HL, reg16
|Subtracts the operand plus the carry flag from HL
|Affected
|Affected
|Detects overflow
|Not affected
|Not affected
|}
|}
If you want to get information about any instruction, go [http://tutorials.eeems.ca/ASMin28Days/ref/z80is.html there].
If you want to get information about any instruction, go [http://tutorials.eeems.ca/ASMin28Days/ref/z80is.html there].


Q : Hey, but where's MULT ?
Q : Hey, but where's MULT?


A : Nowhere :D To multiply, you must write your own routines ! However, a nice lil' trick : to do A <- A*2, simply ''add a, a'' ! To do A <- A*3, do ''ld b, a'', ''add a, a'', ''add a, b'' (you can swap B with any other register, of course). I'll leave you A <- A*4, A*5, A*6 and A*7 as an exercise.
A : Nowhere :D To multiply, you must write your own routines! However, a nice lil' trick : to do A <- A*2, simply ''add a, a''! To do A <- A*3, do ''ld b, a'', ''add a, a'', ''add a, b'' (you can swap B with any other register, of course). I'll leave you A <- A*4, A*5, A*6 and A*7 as an exercise.


For the rest of the tutorial, you'll see some text prefixed by a ";". These are comments, and are NOT part of the code. This line : "ld (hl), a ; Store the mon's ID" will be interpreted as "ld (hl), a". Everything following a ";" is ignored.
For the rest of the tutorial, you'll see some text prefixed by a ";". These are comments, and are NOT part of the code. This line : "ld [hl], a ; Store the mon's ID" will be interpreted as "ld [hl], a". Everything following a ";" is ignored.


Also, the Game Boy's CPU as four very specific instructions :
Also, the Game Boy's CPU as four very specific instructions :
{| class="wikitable"
{|
|ld (hli), a
|ld [hli], a
|Equivalent to ''ld (hl), a'' then ''inc hl''.
|Equivalent to ''ld [hl], a'' then ''inc hl''.
|-
|-
|ld (hld), a
|ld [hld], a
|Equivalent to ''ld (hl), a'' then ''dec hl''.
|Equivalent to ''ld [hl], a'' then ''dec hl''.
|-
|-
|ld a, (hli)
|ld a, [hli]
|Equivalent to ''ld a, (hl)'' then ''inc hl''.
|Equivalent to ''ld a, [hl]'' then ''inc hl''.
|-
|-
|ld a, (hld)
|ld a, [hld]
|Equivalent to ''ld a, (hl)'' then ''dec hl''.
|Equivalent to ''ld a, [hl]'' then ''dec hl''.
|}
|}
These are often used to operate on cHunks of memory.
These are often used to operate on chunks of memory.



===Overflow===
===Overflow===
Line 588: Line 549:
add a, 119
add a, 119
</pre>
</pre>
What value will hold A ? The naive guess would be 322, or %101000010. However, this is 9 bits large, and won't fit in A. The way it works is that the eight rightmost bits are kept in the register, the rest being discarded. Because a non-zero value is dicarded, the C flag is set.
What value will hold A? The naive guess would be 322, or %101000010. However, this is 9 bits large, and won't fit in A. The way it works is that the eight rightmost bits are kept in the register, the rest being discarded. Because a non-zero value is discarded, the C flag is set.


Thus, the result is A equals 66 = %01000010, and the C flag is set.
Thus, the result is A equals 66 = %01000010, and the C flag is set.



===Register pairs and RAM===
===Register pairs and RAM===
Let's say you run a ld hl, $D361. $D361 is put into HL, but since it is registers H and L paired up, what happen to them ?
Let's say you run a ld hl, $D361. $D361 is put into HL, but since it is registers H and L paired up, what happen to them?


Because two hex digits mean one byte, $D3, as well as $61, is a byte. Since $D3 and H are leftmost in both cases, ld hl, $D361 is actually a shorter form of ld h, $D3 then ld l, $61.
Because two hex digits mean one byte, $D3, as well as $61, is a byte. Since $D3 and H are leftmost in both cases, ld hl, $D361 is actually a shorter form of ld h, $D3 then ld l, $61.


Let's say the following instruction is ld ($2315), hl. Applying the same logic would mean H's value would be stored at ($2315), and L's would be at ($2316). However, you just lost THE GAME. Because the z80 is a "little-endian" processor, L's value (the "little-end") is stored first, at ($2315). So ($2315) is $61, and ($2316) is $D3.
Let's say the following instruction is ld [$2315], hl. Applying the same logic would mean H's value would be stored at [$2315], and L's would be at [$2316]. However, you just lost THE GAME; because the z80 is a "little-endian" processor, L's value (the "little-end") is stored first, at [$2315]. So [$2315] is $61, and [$2316] is $D3.


Stop here, and remember this until it becomes natural to you. Because this "little-endian"ness is very tricky for beginners. It is ''very'' important when working with memory.
Stop here, and remember this until it becomes natural to you. Because this "little-endian"ness is very tricky for beginners. It is ''very'' important when working with memory.


Here is an exercise : what values will ($C000) to ($C00F) contain after this code is ran ?
Here is an exercise : what values will [$C000] to [$C00F] contain after this code is ran?


Initial values :
Initial values :
{| class="wikitable"
{| class="wikitable"
|$C000||$C001||$C002||$C003||$C0004||$C005||$C006||$C007||$C008||$C009||$C00A||$C00B||$C00C||$C00D||$C00E||$C00F
|$C000||$C001||$C002||$C003||$C004||$C005||$C006||$C007||$C008||$C009||$C00A||$C00B||$C00C||$C00D||$C00E||$C00F
|-
|-
|$00||$03||$4F||$C0||$DE||$57||$2A||$00||$FF||$01||$23||$34||$56||$78||$9A||$BC
|$00||$03||$4F||$C0||$DE||$57||$2A||$00||$FF||$01||$23||$34||$56||$78||$9A||$BC
Line 613: Line 573:
<pre>
<pre>
ld hl, $C303
ld hl, $C303
ld a, ($C001)
ld a, [$C001]
ld b, 3
ld b, 3
add a, b
add a, b
ld c, 0
ld c, 0
sbc hl, bc
sbc hl, bc
ld (hl), a
ld [hl], a
inc hl
inc hl
ld b, (hl)
ld b, [hl]
sub a, b
sub a, b
inc (hl)
inc [hl]
inc hl
inc hl
ld (hl), b
ld [hl], b
ld bc, 9
ld bc, 9
add hl, bc
add hl, bc
ld (hl), a
ld [hl], a
ld ($C00B), hl
ld [$C00B], hl
</pre>
</pre>



==Stacks==
==Stacks==
===A stack ? Can you eat that ?===
===A stack? Can you eat that?===
No it's not. It's a data structure that has the cool property of not being fixed-length. How does it work ? Just like a stack of plates. Imagine you're washing some plates in the back of a restaurant. Next to you is a pile of plates you need to wash. Waiters come and place ("push") plates on top of the stack and when finished washing a plate, you take ("pop") the topmost one. This way of working is called LIFO (Last In First Out).
No you can't. It's a data structure that has the cool property of not being fixed-length. How does it work? Just like a stack of plates. Imagine you're washing some plates in the back of a restaurant. Next to you is a pile of plates you need to wash. Waiters come and place ("push") plates on top of the stack and when finished washing a plate, you take ("pop") the topmost one. This way of working is called LIFO (Last In First Out).


In our case, we will do it by saving the top of the stack as a memory address. This value is called the ''stack pointer''. Here is an example, with the stack growing to the right :
In our case, we will do it by saving the top of the stack as a memory address. This value is called the ''stack pointer''. Here is an example, with the stack growing to the right :
Line 668: Line 627:
|$00||$03||$4F||$C0||$7C||$2A||??||??||??||??||??||??||??||??||??||??
|$00||$03||$4F||$C0||$7C||$2A||??||??||??||??||??||??||??||??||??||??
|}
|}



===Coding a stack===
===Coding a stack===
Let's say we have our stack pointer saved at memory address $C000 (because it is a 16-bit value, it also uses memory address $C001 !!).
Let's say we have our stack pointer saved at memory address $C000 (because it is a 16-bit value, it also uses memory address $C001!!).


To push register DE :
To push register DE :
<pre>
<pre>
ld hl, ($C000) ; Retrieve stack pointer
ld hl, [$C000] ; Retrieve stack pointer
ld (hl), e ; Push the low-order byte
ld [hl], e ; Push the low-order byte
inc hl ; Move stack pointer
inc hl ; Move stack pointer
ld (hl), d ; Repeat
ld [hl], d ; Repeat
inc hl
inc hl
ld ($C000), hl ; Save stack pointer
ld [$C000], hl ; Save stack pointer
</pre>
</pre>
To pop into register DE :
To pop into register DE :
<pre>
<pre>
ld hl, ($C000) ; Retrieve stack pointer
ld hl, [$C000] ; Retrieve stack pointer
dec hl ; Move stack pointer
dec hl ; Move stack pointer
ld d, (hl) ; Pop the high-order byte
ld d, [hl] ; Pop the high-order byte
dec hl ; Repeat
dec hl ; Repeat
ld e, (hl)
ld e, [hl]
ld ($C000), hl ; Save stack pointer
ld [$C000], hl ; Save stack pointer
</pre>
</pre>



===Good news===
===Good news===
Okay, coding a stack is cool, but... isn't there a faster way of doing it ? Of course ! Because the Game Boy's CPU has a stack by itself ! Meet
Okay, coding a stack is cool, but... isn't there a faster way of doing it? Of course! Because the Game Boy's CPU has a stack by itself! Meet
{| class="wikitable"
{| class="wikitable"
|PUSH reg16
|PUSH reg16
Line 716: Line 673:
where reg16 is any 16-bit register pair. AF can be used here.
where reg16 is any 16-bit register pair. AF can be used here.


Also meet SP, which makes all of this possible. SP is the '''hardware Stack Pointer'''. You can INC and DEC it, and you can't use it as a source in LD. Here are equivalents of ''push hl'' and ''pop hl'' (assuming we could use (sp), 'cause we can't :3)
Also meet SP, which makes all of this possible. SP is the '''hardware Stack Pointer'''. You can INC and DEC it, and you can't use it as a source in LD. Here are equivalents of ''push hl'' and ''pop hl'' (assuming we could use [sp], 'cause we can't :3)
{| class="wikitable"
{| class="wikitable"
|PUSH HL
|PUSH HL
|<pre>
|<pre>
dec sp
dec sp
ld (sp), h
ld [sp], h
dec sp
dec sp
ld (sp), l
ld [sp], l
</pre>
</pre>
|-
|-
|POP HL
|POP HL
|<pre>
|<pre>
ld l, (sp)
ld l, [sp]
inc sp
inc sp
ld h, (sp)
ld h, [sp]
inc sp
inc sp
</pre>
</pre>
Line 736: Line 693:
Note that the stack grows '''downwards''' (ie, PUSH reduces the value from SP, and POP augments it). Also, POP doesn't alter memory.
Note that the stack grows '''downwards''' (ie, PUSH reduces the value from SP, and POP augments it). Also, POP doesn't alter memory.


You cannot PUSH / POP with 8-bit registers. Instead, to save register B, you '''must''' ''push bc''. You don't have to push and pop to the same register ! For example :
You cannot PUSH / POP with 8-bit registers. Instead, to save register B, you '''must''' ''push bc''. You don't have to push and pop to the same register! For example :
<pre>
<pre>
push af
push af
ld a, ($C000)
ld a, [$C000]
pop de
pop de
</pre>
</pre>
is completely valid. (Note that E's value after the POP is equal to F's when PUSHing, so this is the only way to directly access the F register)
is completely valid. (Note that E's value after the POP is equal to F's when PUSHing, so this is the only way to directly access the F register)


Beware with the stack, even more when you're not coding your own game : everyone uses the stack ; even the CPU ! (We're about to see how) The best practice to have is to leave the stack identical before and after your code. Otherwise, expect some crashes, yay !
Beware with the stack, even more when you're not coding your own game : everyone uses the stack ; even the CPU! (We're about to see how) The best practice to have is to leave the stack identical before and after your code. Otherwise, expect some crashes, yay!



==Control structures==
==Control structures==
===Rollin' around===
===Rollin' around===
...at the speed of sound ! (I'M SO SORRY)
...at the speed of sound! (I'M SO SORRY)


Up until now, we've seen only programs that begin somewhere and that are ran top to bottom, in that order. However, this never happens in a more complex context. So, let's see how to manipulate code flow !
Up until now, we've seen only programs that begin somewhere and that are ran top to bottom, in that order. However, this never happens in a more complex context. So, let's see how to manipulate code flow!


We have two instructions that allow execution to jump somewhere else in memory :
We have two instructions that allow execution to jump somewhere else in memory :
Line 761: Line 717:
|Has execution jumping over offset8 bytes
|Has execution jumping over offset8 bytes
|}
|}
What's the difference ?
What's the difference?


First, JP can go '''anywhere'''. JP tells the CPU "jump to this memory address". JR is much more limited, as it can only reach a signed 8-bit range (128 bytes backwards, or 127 bytes forwards).
First, JP can go '''anywhere'''. JP tells the CPU "jump to this memory address". JR is much more limited, as it can only reach a signed 8-bit range (128 bytes backwards, or 127 bytes forwards).
Line 769: Line 725:
Third, JR takes 7 or 12 CPU cycles to run, whereas JP always takes 10.
Third, JR takes 7 or 12 CPU cycles to run, whereas JP always takes 10.


Labels are actually memory addresses, they mark the target of the jumps. Here is an example of label usage :
And this brings us to the next part !
<pre>
loop: ; This is a label ! This defines label "loop".
sub a, $0A
jr c, finished
; Do stuff with a...
jr loop ; This jumps to the "sub a, $0A" right after the "loop:" line.
finished:
inc a
; Do some more stuff, we don't care anymore.
</pre>


You may wonder what this "jr c, finished" is. And this brings us to the next part!


===Conditionals===
===Conditionals===
JP and JR can be executed in unconditional ways, meaning the jump will always occur. This can be useful, but sometimes we don't want that. And that's where flags come in handy ! Because we are able to trigger jumps depending on the status of the flags.
JP and JR can be executed in unconditional ways, meaning the jump will always occur. This can be useful, but sometimes we don't want that. And that's where flags come in handy! Because we are able to trigger jumps depending on the status of the flags.
{| class="wikitable"
{| class="wikitable"
|JP condition, label
|JP condition, label
Line 794: Line 761:
Four instructions can use conditionals : CALL, RET (these are coming in soon), JP and JR. JR has a handicap, though : it can only use the Z, NZ, C and NC conditions.
Four instructions can use conditionals : CALL, RET (these are coming in soon), JP and JR. JR has a handicap, though : it can only use the Z, NZ, C and NC conditions.


So, there you see how you can create conditionals in z80 assembly : by shifting the flow of code.


I want you to understand the following : even though you have more contol over the flow of code, try to be organized ; intricate jumps can be excessively tough to understand. Also, jumps consume space and time ; try to jump the least you can.


Last thing, though it's more on the optimization side, but remember : if you're going for space - and that's often the case with ACE - you should use jr. But if speed is a must (that is, you absolutely need to shave 5 CPU cycles per jump, which is *RARE*), use jp. Use jp also when jr cannot reach the target - never use a chain of jr spaced by $7F bytes. It's '''pointless'''.

===A special jp===
There is one special case of jp, though !
{| class="wikitable"
|JP HL
|Has execution jumping to the address pointed to by hl.<br/>Does not accept any conditionals.
|}
Example :
<pre>
ld hl, $2457
jp hl
</pre>
will jump to $2457. Some might argue that "jp $2457" is better, as it'd save 1 byte and preserve the hl register.

However, "jp hl" is used to do dynamic jumps: "jp hl" may jump to a different location every time it is ran. When doing "static" (ie. always the same) jumps, it '''is''' better to use jp $xxyy. "jp hl" mostly used with function pointer tables - we'll see that later.

===Comparing stuff===
Here is an instruction that is heavily used with conditional jumps :
{| class="wikitable"
|CP <nowiki>{reg8 | imm8}</nowiki>
|Does the same as SUB <nowiki>{...}</nowiki>, but leaves a untouched.
|}
cp is heavily used with conditionals, since it compares the accumulator's value with another. Here is a nifty table :
{| class="wikitable"
|Comparison
|Unsigned equivalent
|Signed equivalent
|-
|A == ''number''
!colspan="2"|Z is set (A - ''number'' == 0)
|-
|A != ''number''
!colspan="2"|Z is not set (A - ''number'' != 0)
|-
|A < ''number''
|C is set (A - ''number'' generated a borrow)
|S and P/V are different (P/V means overflow)
|-
|A >= ''number''
|C is reset (A - ''number'' generated no borrow)
|S and P/V are the same
|}
Example :
<pre>
ld a, [hl]
cp $63
jr z, placeItems
inc hl
inc hl
jr someplace
placeItems:
ld b, [hl]
</pre>
If [hl] equals $63, execution jumps to placeItems.

Otherwise, executions continues through, increments hl twice, then jumps to "someplace"

===Chaining conditionals===
!!WARNING!! The code I'll be writing in assembly can be written in other ways. If you think you'd have done it in another way, try it. Count the instructions in your code, and if it is less than I did, then you did well !

Don't assume my way is the only. It is a good idea to try to find other ways to do the stuff I propose ! It's a good exercise !

Okay, so let's try the following C code, assuming a is the a register :
<pre>
if(a == $2A) {
// Success stuff
} else {
// Failure stuff
}
// Rest of the code
</pre>
In assembly, that's easy !
<pre>
cp $2A
jr nz, failure
; Success stuff
jr afterConditional
failure:
; Failure stuff
afterConditional:
; Rest of the code
</pre>
Think of another way... like, having the failure stuff first.
<pre>
cp $2A
jr z, success
; Failure stuff
jr afterConditional
sucess:
; Success stuff
afterConditional:
; Rest of the code
</pre>
Got it ? Good !

Okay. Let's get it one level higher.
<pre>
if(b == $C0 && c == $DE) { // && means AND. But it's a logical AND - we'll see another AND later.
// Success stuff
} else {
// Failure stuff
}
</pre>
Um... let's try doing multiple jumps.
<pre>
ld a, $C0
cp b
jr nz, failure
ld a, c
cp $DE
jr nz, failure
; Success stuff
jr afterCond
failure:
; Failure stuff
afterCond:
</pre>
Phew ! Not exactly the same, but it's still going fine.

To do a AND, simply treat stuff as a failure if any of the conditions fail.

Okay. Let's get it a lil' bit different.
<pre>
if(b == $C0 || c == $DE) { // || means OR. But it's a logical OR- we'll see another OR later.
// Success stuff
} else {
// Failure stuff
}
</pre>
Um... let's try doing multiple jumps.
<pre>
ld a, $C0
cp b
jr z, success
ld a, c
cp $DE
jr z, success
; Failure stuff
jr afterCond
success:
; Success stuff
afterCond:
</pre>
Okay ! Now, to do OR, we simply run each comparison. If any succeeds, we jump straight to SUCCESS. Othewise, we FAIL.

I'll leave the following code as an exercise :
<pre>
if((h == $C0 && l == $DE) || a == $2A) {
// Success stuff
} else {
// Failure stuff
}
</pre>

===On to loopings===
Some of you might have thought "Hey, ISSOtm. Up until now, you've been going forwards all the time. Even your jumps were skipping over instructions - but forever forwards. What if we went '''backwards''' ?"

Well, kudos to you ! This is the basis of looping structures.

Here is the most simple loop in assembly :
<pre>
loop:
; Do stuff
jr cond, loop
</pre>
Ta-daah ! Here is a more explicit (less generic) loop:
<pre>
ld b, $06
countingLoop:
; Do stuff (admit it preserves b)
dec b ; Sets Z if b == 0 after the DEC.
jr nz, countingLoop ; go back if b is non-zero
; Do some MOAR stuff
</pre>
This should run the "Do stuff" part six times exactly. If said part modifies b... it will work in other ways.

I'll leave to you as an exercise what would happen if the "ld b, $06" was replaced by a "ld b, $00"...

Now you got how to create loops. Neato. Let's see how to create... routines. Or procedures. Or functions. Whatever you call them.

===Routines / Functions / Procedures / Whatever===
I've told you about CALL and RET in the "Conditionals" section. Now let's see what they do.

They are a more avanced way to do jumps. Basically, you "call" a piece of code, that does its stuff, then "returns" to your code control of the CPU. Bam, CALL and RET explained.

{| class="wikitable"
|CALL label16
|Calls code starting at label16
|-
|CALL cond, label16
|Same, but with the same conditionals as JP (not JR)
|-
|RET
|Returns to the previous "caller".
|-
|RET cond
|Do I need to explain ?
|}

But wait, there is more ! And it is '''VITAL'''. The way call works is very simple :
# Get the address of the instruction right after the call.
# Push it onto the stack.
# Jump to the address specified by the call adress.

And ret works in a way that is compatible with call :
# Pop a number from the stack.
# Jump to that address.

Now, you need to be super-duper careful with the stack, since it is used by the call-ret system. Basically, make sure that between label16 and the next ret, you have done the same amount of PUSHes and POPs :
<pre>
call routine
; Ton of code
routine:
push hl
push bc
; Code
pop bc
; Codez
pop hl
; Codeeeee
ret
</pre>
is fine
<pre>
call baaad
; Ton of code
baaad:
push hl
push bc
; Code
pop bc
; Codeeeey
ret
</pre>
is bad.

Unless you'e ABSOLUTELY CERTAIN about what you're doing, upon leaving your routine's code, leave the stack the SAME is was. It is absolutely necessary to avoid screwing up everything.


<hr>
==Solutions to the exercises==
==Solutions to the exercises==
===Instructions get !===
===Instructions get!===
B can be swapped with any other register (except A)
B can be swapped with any other register (except A)
{| class="wikitable"
{| class="wikitable"
Line 838: Line 1,048:
</pre>
</pre>
|}
|}



===Register pair and RAM===
===Register pair and RAM===
Line 845: Line 1,054:
ld hl, $C303 ; Now H = $C3 and L = $03
ld hl, $C303 ; Now H = $C3 and L = $03


ld a, ($C001) ; A = $03
ld a, [$C001] ; A = $03


ld b, 3 ; B = $03
ld b, 3 ; B = $03
Line 855: Line 1,064:
sbc hl, bc ; HL = HL - (BC + C flag) = $C303 - ($0300 + $00) = $C003
sbc hl, bc ; HL = HL - (BC + C flag) = $C303 - ($0300 + $00) = $C003


ld (hl), a ; (HL) = ($C003) <- A = $06
ld [hl], a ; [HL] = [$C003] <- A = $06


inc hl ; HL = $C004
inc hl ; HL = $C004


ld b, (hl) ; B = (HL) = ($C004) = $DE
ld b, [hl] ; B = [HL] = [$C004] = $DE


sub a, b ; A = A - B = $06 - $DE = $06 + (-$DE) = $06 + ($21 + $01) = $28, C flag = 0
sub a, b ; A = A - B = $06 - $DE = $06 + (-$DE) = $06 + ($21 + $01) = $28, C flag = 0
<br/>
<br/>
Notice here that doing ''sub a, b'' actually increased A's value !
Notice here that doing ''sub a, b'' actually increased A's value!


inc (hl) ; (HL) = ($C004) = $DF
inc [hl] ; [HL] = [$C004] = $DF


inc hl ; HL = $C005
inc hl ; HL = $C005


ld (hl), b ; (HL) = B = $DE
ld [hl], b ; [HL] = B = $DE


ld bc, 9 ; B = $00, C = $09
ld bc, 9 ; B = $00, C = $09
Line 875: Line 1,084:
add hl, bc ; HL = HL + BC = $C005 + $0009 = $C00E
add hl, bc ; HL = HL + BC = $C005 + $0009 = $C00E


ld (hl), a ; (HL) = ($C00E) = A = $28
ld [hl], a ; [HL] = [$C00E] = A = $28


ld ($C00B), hl ; ($C00B) = L = $0E, and ($C00C) = H = $C0
ld [$C00B], hl ; [$C00B] = L = $0E, and [$C00C] = H = $C0


Initial values :
Initial values :
Line 892: Line 1,101:
|}
|}


===Chaining conditionals===
You think you gotcha ? I'll be explaining the code, don't'cha worry :)
<pre>
cp $2A ; it's better to do the a comparison first, since we'll use it for later comparisons.
jr z, success ; if we don't jump, we will do the ANDed comparison.
ld a, b
cp $C0
jr nz, failure ; first AND operand...
ld a, c
cp $DE
jr nz, failure ; ...then the second. The order doesn't matter here.
success:
; Success stuff
jr afterConditional
failure:
; Failure stuff
afterConditional:
</pre>
Didja get it ? If not, try to understand which part of the code matches which part of the C code. If you get it, you'll understand the assembly code. I admit it's tough at the first glance. I've been through this, don't'cha worry.


===On to loopings===
What would happen ? Well, imagine you ran the "Do stuff" part once. B is zero - we didn't touch it - and we reached a "dec b". Now, remember what the instruction does :
# We decrement B (B = 0 - 1 = 255, remember overflow ?)
# If B is zero, we set the Z flag. Otherwise we reset it. (Z is reset, since B != 0)
... so the loops starts again with B = 255.


tl;dr : the loop is ran 256 times !
==Credits==

Tutorial written by ISSOtm for Glitch City Laboratories.
"Hey ISSOtm, what if I wanted my loop to run zero times instead of 256 in that case ?" Simple !
<pre>
ld a, b ; this does NOT modifiy Z !!
cp $0 ; there is a more efficient way of doing this, but you don't know it yet.
jr z, afterLoop ; if b - 0 == 0, we skip the loop completely.
countingLoop:
; Do stuff (admit it preserves b)
dec b ; Sets Z if b == 0 after the DEC.
jr nz, countingLoop ; go back if b is non-zero
afterLoop:
; Do some MOAR stuff
</pre>

==Credits & Resources==
Tutorial written (mostly) by ISSOtm for Glitch City Laboratories.

Thanks to Torchickens and RaltsEye for correcting typos, formatting 'n stuff.


Includes bits of [http://tutorials.eeems.ca/ASMin28Days/welcome.html this tutorial] for the ASM part.
Includes bits of [http://tutorials.eeems.ca/ASMin28Days/welcome.html this tutorial] for the ASM part.


Heavily using [http://gbdev.gg8.se/wiki/articles/Pan_Docs the Pan Docs] for GameBoy-specific stuff.
Heavily using [http://gbdev.gg8.se/wiki/articles/Pan_Docs the Pan Docs] for GameBoy-specific stuff.

The [http://marc.rawer.de/Gameboy/Docs/GBCPU_Instr.html GCISheet] is useful for understanding CPU instructions and can be combined with [https://iimarckus.org/etc/asmopcodes.txt IIMarckus's opcode to instruction page] or the copy on [[The Big HEX List]].

[https://tcrf.net/Help:Contents/Finding_Content/Debugger_guide/BGB Torchickens has started a tutorial for BGB emulator's debugger at The Cutting Room Floor]

[[Category:Arbitrary code execution]]
[[Category:Arbitrary code execution]]
[[Category:Reference documents]]

Latest revision as of 22:42, 30 April 2023

Welcome to this tutorial! The purpose of this page is to allow everyone to program anything on a Game Boy. This may sound complex, but the Game Boy is a very well-documented system, and programming for it can be learned quite easily.

Our goal will first be to create arbitrary code execution programs, but later sections of this tutorial will give you tools for more general-purpose programming, so you'll be able to create your own games.

Let's get started!

A new world

In this part is a collection of different terms, concepts and notations that are vital for the rest of this tutorial. Do NOT skip anything here unless the text specifies you can. I mean it.

If you don't understand something later on, read that part again, and chances are, you'll understand it.

Numeric systems

This section deals with numbers and how to format them. Notions explained here are essential and will be used a ton of times. You've been warned.

This section is essential to understanding everything following it, so read it carefully. However, if you already know everything about binary and hexadecimal, you can skip this. But, if you lack some vocabulary at some point, come back here.

Let's take a number, such as 1337. It is written in base 10, or in decimal if you will.

It is interpreted the following way :

1337 = 1000 + 300 + 30 + 7
     = 1 * 1000 + 3 * 100 + 3 * 10 + 7 * 1
     = 1 * 10^3 + 3 * 10^2 + 3 * 10^1 + 7 * 10^0

Reading the numbers before the powers of ten gives 1, 3, 3, and 7, and we find 1337 again.

However, decimal is the base humans like to count in. But computers don't. Instead, they prefer binary. Binary is base 2, that is, instead of working with powers of 10, we work with powers of 2. Also, only 2 symbols are allowed, 0 and 1. Each of them is called a bit.

This paragraph is some trivia about why computers use binary instead of decimal. You can skip it if you want. So, why binary? Because we need computers to be efficient. So we need to store information using electricity. The easiest variable to manipulate is "Is power running?". The answer is either 0 (it doesn't) or 1 (it does). And there you got it! Why do computers count in binary? To keep them at reasonable prices!

To differentiate decimal numbers from binary numbers, binary numbers will be prepended with a % symbol. So, 10 is decimal, and %10 is binary. Got it? Okay.

Here is an example :

%10010101 = 1 * 2^7 + 0 * 2^6 + 0 * 2^5 + 1 * 2^4 + 0 * 2^3 + 1 * 2^2 + 0 * 2^1 + 1 * 2^0
          = 1 * 128 + 1 * 16 + 1 * 4 + 1 * 1
          = 128 + 16 + 4 + 1
          = 149

To identify each bit, we will give them a number. The rightmost bit will be given number 0, and each bit to the left is given a number one greater.

We will call the rightmost bit (bit 0) the least significant bit (or LSB for short), since changing it won't alter the value by much. Similarly, the leftmost bit will be called the most significant bit.

Lastly, for a few reasons, we will give names to groups of consecutive bits.

  • A group of 4 bits is called a nibble.
  • A group of 8 bits is called a byte (or sometimes a halfword).
  • A group of 16 bits is called a word.
  • A group of 32 bits is called a double word or dword.
  • A group of 64 bits is called a quadruple word or qword.

We will mostly be working with bytes, sometimes with words and rarely with nibbles. It is very rare to work with other structures, so you may forget them if you will.

Now, let's talk about hexadecimal. It is base 16, so we will be working with 16 symbols : 0 1 2 3 4 5 6 7 8 9 A B C D E F. Again, we will prepend hex numbers with a $ to differentiate them.

So, $A = 10, $B = 11, and so on to $F = 15. An example :

$95 = 9 * 16^1 + 5 * 16^0
    = 144 + 5
    = 149

Why using hexadecimal? Well, writing binary numbers is quite tedious. Take both examples together : we have 149 = %10010101 = $95. Now, consider the digits individually :

$9 = %1001
$5 = %0101

Hexadecimal works as a "condensation" of binary, as every nibble can be written using a hex symbol instead of four bits.

This way, we have a more readable way of writing numbers that can be converted to binary in a snap.

For the rest of this tutorial, we will mostly be using hexadecimal, but always remember the binary lying down below!

A dip into technical information

Registers

Registers are sections of RAM within the CPU itself. That is what you will be working with, alongside memory. But we'll see memory later.

There are 8 different registers, which can be actually paired up. These are A, B, C, D, E, F, H and L. These are NOT hex digits, so beware!

Any of these registers can hold an unsigned 8-bit value. That means :

  • A register can only hold an integer value.
  • This value is always positive.
  • This value is withing range 0 - 255 (both included).

The register pairs are the following : AF, BC, DE and HL. Any of them can hold an unsigned 16-bit value. That means :

  • A pair can only hold an integer value.
  • This value is always positive.
  • This value is within range 0 - 65535 (both included).

All of these registers are general-purpose... well, most of the time. Sometimes, there are restrictions, and sometimes, it is better to use some register over another.

Here are each register / pair's special attributes :

  • A is the Accumulator. It is the register you have to use to make arithmetic operations, and most of the time, memory access.
  • B is usually a 8-bit counter.
  • C is also used as a 8-bit counter, but also for port access. We'll see that way later.
  • D, E, H and L have no special attribute as 8-bit. However, when paired, they do.
  • F holds the CPU's Flags. It is very special, as you cannot use it as a general-purpose register. You can't even directly access it! We'll see how to use it later.
  • HL is quite the equivalent of A, but is 16-bit. Its name is because it stores the High and Low bytes of a memory address.
  • BC is mostly used as a Byte Counter. It can also be used together with A to access memory.
  • DE is mostly used to hold the memory address of a DEstination.

To store values to a register, we will use the LD (LoaD) instruction.

LD destination, source Store the value of source into destination.

Did I mention that nothing is case-sensitive?

However, you can't do LD as you wish, there are restrictions :

Destination
Source A B C D E H L BC DE HL [BC] [DE] [HL] [imm16]
A Yes Yes Yes Yes Yes Yes Yes No No No Yes Yes Yes Yes
B Yes Yes Yes Yes Yes Yes Yes No No No No No Yes No
C Yes Yes Yes Yes Yes Yes Yes No No No No No Yes No
D Yes Yes Yes Yes Yes Yes Yes No No No No No Yes No
E Yes Yes Yes Yes Yes Yes Yes No No No No No Yes No
H Yes Yes Yes Yes Yes Yes Yes No No No No No Yes No
L Yes Yes Yes Yes Yes Yes Yes No No No No No Yes No
BC No No No No No No No No No No No No No Yes
DE No No No No No No No No No No No No No Yes
HL No No No No No No No No No No No No No Yes
[BC] Yes No No No No No No No No No No No No No
[DE] Yes No No No No No No No No No No No No No
[HL] Yes Yes Yes Yes Yes Yes Yes No No No No No No No
[imm16] Yes No No No No No No Yes Yes Yes No No No No
imm8 Yes Yes Yes Yes Yes Yes Yes No No No No No Yes No
imm16 Yes No No No No No No Yes Yes Yes No No No No

imm8 means an 8-bit value, and imm16 a 16-bit value. (imm means immediate)

For those wondering what parentheses mean, that's coming in shortly.

Examples :

ld a, 25 Store value 25 into register A.
ld d, b Store value of register B into register D.
ld [$8325], a Store the value of register A into memory address $8325.

Let's make something clear immediately : ld de, hl isn't possible at all. Instead, do ld d, h then ld e, l (the order of the operations doesn't matter). Same with the other register pairs.

Trying to do something like ld a, $100 isn't possible. Like, physically impossible. You'll see why much, much later.

Note that ld a, -1 is valid, but actually, the "-1" wraps. Storing -1 will truly store 255 in a 8-bit register, and 65535 in a 16-bit register. Why? Coming soon.

Notice that F and AF aren't usable anywhere. Actually, only a few instructions use them.

Negative numbers

Time to confess: I've lied to you. Actually, 8-bit and 16-bit registers can hold negative numbers.

I've told you, "individual registers can hold unsigned 8-bit values, and pairs unsigned 16-bit values". However, these aren't true: these values can be signed. How does that work?

What we will be doing is cutting our number range in half, and telling one half is composed of negative numbers. But how to distinguish positive and negative numbers? Well, we tell the MSB is no longer meaning the symbol in front of 2^7, but it will give away the sign of the number (0 = positive, 1 = negative). So, instead of having values in ranges 0 to 255 and 0 to 65535, we will have values in ranges -128 to 127 and -32768 to 32767. Neat!

How to multiply by -1? Easy! You can either :

  • Calculate zero minus your number (just like in real life). However, you should consider 0 the same as 256 (in 8-bit mode) or 65536 (in 16-bit mode).
  • Flip the state of every bit, then add one.

This doesn't work in one case : -128 (%10000000)

-128   %10000000
Invert %01111111
Add 1  %10000000

So, uh, -(-128) = -128? Oops. You cannot negate -128 with only 8 bits. Try with 16 bits, and you'll see it works! However, with 16 bits, you can't negate -32768 for the same reason.

Now, let's see how the CPU handles the difference between unsigned and signed values. Surprise, it doesn't! Why? Because making the same operations using signed or unsigned values give the same result!

               unsigned        signed
  %00110010          50            50
+ %11001110       + 206         + -50
%1 00000000       = 256         =   0   (Disqualify ninth bit)

And, you just saw why I told you to consider 0 the same as 256 : they are similar! As 256 uses 8 zero bits preceded by a 1... but the 1 is discarded.

Memory

Finally! So what is "memory", you ask? It's a stream of bytes. A stream of numbers. And guess what? Everything is a stream of numbers. From numbers to this cute kitten video, everything is numbers. Including programs. That's why arbitrary code execution happens. Remember this sentence : Data is whatever you define it to be.

Ever wondered why you could open a JPEG in Word? Now you know.

How is every byte differentiated from its neighbors? Well, everyone gets a 16-bit unsigned integer called their "address" (thus ranging from $0000 to $FFFF). To access a byte, use its address, just like to reach your friend, you use his e-mail address.

So, how does running a program works? What happens is that a special register is incremented (its value is raised by one), then the processor fetches the byte located at the address held by that register, and processes it as an opcode ; when done, everything is repeated. Instructions can be one to three opcodes (bytes) large, so this cycle may repeat for a single instruction.

So now, how to access memory? With brackets! To access memory address $CD38, you just have to use [$CD38]. Yay!

To access the memory location pointed to by HL, just do... [hl]! It's the same with BC and DE.

So, to retrieve the value at memory address $6511 into register A : ld a, [$6511]

And to store the value of register C into the memory pointed to by HL : ld [hl], c

Remember to refer to the chart above for the legal LD combinations.

Obviously, ld [$6511], a will overwrite the previous value stored here. But ld [$6511], hl will store a 16-bit value, which is a word long, that is two bytes long! So, not only will [$6511] be overwritten, but [$6512] too! Always be very careful about the memory you're touching. Otherwise, stuff like the ZZAZZ glitch happen.

For those wondering, ld a, [$6511] leaves [$6511] untouched.

Flags

Remember that "special" F register? Well, each of its bits is called a "flag", and holds information about (usually) the accumulator. A flag is dubbed "set" if it equals 1, and "reset" otherwise.

Here are the 8 flags :

7 6 5 4 3 2 1 0
Z N H C - - - -

All "-" are unused flags. They are set to 0 at all times.

Z : Zero

This flag is set if the last operation's result was 0. It is reset otherwise.

H : Half-Carry

Works like the Carry flag, but referring to the least-significant nibble. It is only used with the DAA instruction, so... forget it until then.

N : Add/Subtract

If the last operation performed an addition, this is reset. otherwise it is set. Again, forget it until we talk about DAA.

C : Carry

The Carry flag detects whether the last operation's result was too large to fit in the target register (e.g ld a, $F2 then add a, $24 will set this flag).

There are two instructions that alter the carry flag :

  • SCF sets it, and
  • CCF inverts it.

Manipulating data

Instructions get!

Let's get these :

Syntax Effect Z C
INC {reg8 | reg16 | [hl]} Adds one to the operand ("increments" it) Affected, except for reg16 Not affected
DEC {reg8 | reg16 | [hl]} Subtracts one to the operand ("decrements" it) Affected, except for reg16 Not affected
ADD A, {reg8 | imm8 | [hl]} Adds the operand to the accumulator Affected Not affected
ADD HL, reg16 Adds the operand to HL Affected Not affected
SUB {reg8 | imm8 | [hl]} Subtracts the operand from the accumulator. The syntax SUB A, {...} is also valid but less common. Affected Not affected

If you want to get information about any instruction, go there.

Q : Hey, but where's MULT?

A : Nowhere :D To multiply, you must write your own routines! However, a nice lil' trick : to do A <- A*2, simply add a, a! To do A <- A*3, do ld b, a, add a, a, add a, b (you can swap B with any other register, of course). I'll leave you A <- A*4, A*5, A*6 and A*7 as an exercise.

For the rest of the tutorial, you'll see some text prefixed by a ";". These are comments, and are NOT part of the code. This line : "ld [hl], a ; Store the mon's ID" will be interpreted as "ld [hl], a". Everything following a ";" is ignored.

Also, the Game Boy's CPU as four very specific instructions :

ld [hli], a Equivalent to ld [hl], a then inc hl.
ld [hld], a Equivalent to ld [hl], a then dec hl.
ld a, [hli] Equivalent to ld a, [hl] then inc hl.
ld a, [hld] Equivalent to ld a, [hl] then dec hl.

These are often used to operate on chunks of memory.

Overflow

Let's try

ld a, 203
add a, 119

What value will hold A? The naive guess would be 322, or %101000010. However, this is 9 bits large, and won't fit in A. The way it works is that the eight rightmost bits are kept in the register, the rest being discarded. Because a non-zero value is discarded, the C flag is set.

Thus, the result is A equals 66 = %01000010, and the C flag is set.

Register pairs and RAM

Let's say you run a ld hl, $D361. $D361 is put into HL, but since it is registers H and L paired up, what happen to them?

Because two hex digits mean one byte, $D3, as well as $61, is a byte. Since $D3 and H are leftmost in both cases, ld hl, $D361 is actually a shorter form of ld h, $D3 then ld l, $61.

Let's say the following instruction is ld [$2315], hl. Applying the same logic would mean H's value would be stored at [$2315], and L's would be at [$2316]. However, you just lost THE GAME; because the z80 is a "little-endian" processor, L's value (the "little-end") is stored first, at [$2315]. So [$2315] is $61, and [$2316] is $D3.

Stop here, and remember this until it becomes natural to you. Because this "little-endian"ness is very tricky for beginners. It is very important when working with memory.

Here is an exercise : what values will [$C000] to [$C00F] contain after this code is ran?

Initial values :

$C000 $C001 $C002 $C003 $C004 $C005 $C006 $C007 $C008 $C009 $C00A $C00B $C00C $C00D $C00E $C00F
$00 $03 $4F $C0 $DE $57 $2A $00 $FF $01 $23 $34 $56 $78 $9A $BC

Code :

ld hl, $C303
ld a, [$C001]
ld b, 3
add a, b
ld c, 0
sbc hl, bc
ld [hl], a
inc hl
ld b, [hl]
sub a, b
inc [hl]
inc hl
ld [hl], b
ld bc, 9
add hl, bc
ld [hl], a
ld [$C00B], hl

Stacks

A stack? Can you eat that?

No you can't. It's a data structure that has the cool property of not being fixed-length. How does it work? Just like a stack of plates. Imagine you're washing some plates in the back of a restaurant. Next to you is a pile of plates you need to wash. Waiters come and place ("push") plates on top of the stack and when finished washing a plate, you take ("pop") the topmost one. This way of working is called LIFO (Last In First Out).

In our case, we will do it by saving the top of the stack as a memory address. This value is called the stack pointer. Here is an example, with the stack growing to the right :

$C000 $C001 $C002 $C003 $C004 $C005 $C006 $C007 $C008 $C009 $C00A $C00B $C00C $C00D $C00E $C00F
stack pointer
$00 $03 $4F $C0 $DE ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

Now, we push the value $2A :

$C000 $C001 $C002 $C003 $C004 $C005 $C006 $C007 $C008 $C009 $C00A $C00B $C00C $C00D $C00E $C00F
stack pointer
$00 $03 $4F $C0 $DE 2A ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

And, if we pop two values :

$C000 $C001 $C002 $C003 $C004 $C005 $C006 $C007 $C008 $C009 $C00A $C00B $C00C $C00D $C00E $C00F
stack pointer
$00 $03 $4F $C0 $DE $2A ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

We can now push $7C :

$C000 $C001 $C002 $C003 $C004 $C005 $C006 $C007 $C008 $C009 $C00A $C00B $C00C $C00D $C00E $C00F
stack pointer
$00 $03 $4F $C0 $7C $2A ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

Coding a stack

Let's say we have our stack pointer saved at memory address $C000 (because it is a 16-bit value, it also uses memory address $C001!!).

To push register DE :

ld hl, [$C000] ; Retrieve stack pointer
ld [hl], e ; Push the low-order byte
inc hl ; Move stack pointer
ld [hl], d ; Repeat
inc hl
ld [$C000], hl ; Save stack pointer

To pop into register DE :

ld hl, [$C000] ; Retrieve stack pointer
dec hl ; Move stack pointer
ld d, [hl] ; Pop the high-order byte
dec hl ; Repeat
ld e, [hl]
ld [$C000], hl ; Save stack pointer

Good news

Okay, coding a stack is cool, but... isn't there a faster way of doing it? Of course! Because the Game Boy's CPU has a stack by itself! Meet

PUSH reg16 Stores reg16 to the stack.
POP reg16 Retrieves reg16 from the stack.
INC SP Increments SP.
DEC SP Decrements SP.
ADD SP, imm8 Add imm8 (signed 8-bit value) to SP.
LDHL SP, imm8 Add imm8 (signed 8-bit value) to SP, and save the result in HL.

where reg16 is any 16-bit register pair. AF can be used here.

Also meet SP, which makes all of this possible. SP is the hardware Stack Pointer. You can INC and DEC it, and you can't use it as a source in LD. Here are equivalents of push hl and pop hl (assuming we could use [sp], 'cause we can't :3)

PUSH HL
dec sp
ld [sp], h
dec sp
ld [sp], l
POP HL
ld l, [sp]
inc sp
ld h, [sp]
inc sp

Note that the stack grows downwards (ie, PUSH reduces the value from SP, and POP augments it). Also, POP doesn't alter memory.

You cannot PUSH / POP with 8-bit registers. Instead, to save register B, you must push bc. You don't have to push and pop to the same register! For example :

push af
ld a, [$C000]
pop de

is completely valid. (Note that E's value after the POP is equal to F's when PUSHing, so this is the only way to directly access the F register)

Beware with the stack, even more when you're not coding your own game : everyone uses the stack ; even the CPU! (We're about to see how) The best practice to have is to leave the stack identical before and after your code. Otherwise, expect some crashes, yay!

Control structures

Rollin' around

...at the speed of sound! (I'M SO SORRY)

Up until now, we've seen only programs that begin somewhere and that are ran top to bottom, in that order. However, this never happens in a more complex context. So, let's see how to manipulate code flow!

We have two instructions that allow execution to jump somewhere else in memory :

JP label16 Has execution jumping to label16
JR offset8 Has execution jumping over offset8 bytes

What's the difference?

First, JP can go anywhere. JP tells the CPU "jump to this memory address". JR is much more limited, as it can only reach a signed 8-bit range (128 bytes backwards, or 127 bytes forwards).

Second, JR is one less byte large than JP. When writing code that must be as short as possible, this is super awesome.

Third, JR takes 7 or 12 CPU cycles to run, whereas JP always takes 10.

Labels are actually memory addresses, they mark the target of the jumps. Here is an example of label usage :

loop: ; This is a label ! This defines label "loop".
sub a, $0A
jr c, finished
; Do stuff with a...
jr loop ; This jumps to the "sub a, $0A" right after the "loop:" line.
finished:
inc a
; Do some more stuff, we don't care anymore.

You may wonder what this "jr c, finished" is. And this brings us to the next part!

Conditionals

JP and JR can be executed in unconditional ways, meaning the jump will always occur. This can be useful, but sometimes we don't want that. And that's where flags come in handy! Because we are able to trigger jumps depending on the status of the flags.

JP condition, label JR condition, label

Here are all 8 conditions :

Z If the Z flag is set NZ If the Z flag is reset
C If the C flag is set NC If the C flag is reset
PE If the P/V flag is set PO If the P/V flag is reset
M If the S flag is set P If the S flag is reset

Four instructions can use conditionals : CALL, RET (these are coming in soon), JP and JR. JR has a handicap, though : it can only use the Z, NZ, C and NC conditions.

So, there you see how you can create conditionals in z80 assembly : by shifting the flow of code.

I want you to understand the following : even though you have more contol over the flow of code, try to be organized ; intricate jumps can be excessively tough to understand. Also, jumps consume space and time ; try to jump the least you can.

Last thing, though it's more on the optimization side, but remember : if you're going for space - and that's often the case with ACE - you should use jr. But if speed is a must (that is, you absolutely need to shave 5 CPU cycles per jump, which is *RARE*), use jp. Use jp also when jr cannot reach the target - never use a chain of jr spaced by $7F bytes. It's pointless.

A special jp

There is one special case of jp, though !

JP HL Has execution jumping to the address pointed to by hl.
Does not accept any conditionals.

Example :

ld hl, $2457
jp hl

will jump to $2457. Some might argue that "jp $2457" is better, as it'd save 1 byte and preserve the hl register.

However, "jp hl" is used to do dynamic jumps: "jp hl" may jump to a different location every time it is ran. When doing "static" (ie. always the same) jumps, it is better to use jp $xxyy. "jp hl" mostly used with function pointer tables - we'll see that later.

Comparing stuff

Here is an instruction that is heavily used with conditional jumps :

CP {reg8 | imm8} Does the same as SUB {...}, but leaves a untouched.

cp is heavily used with conditionals, since it compares the accumulator's value with another. Here is a nifty table :

Comparison Unsigned equivalent Signed equivalent
A == number Z is set (A - number == 0)
A != number Z is not set (A - number != 0)
A < number C is set (A - number generated a borrow) S and P/V are different (P/V means overflow)
A >= number C is reset (A - number generated no borrow) S and P/V are the same

Example :

ld a, [hl]
cp $63
jr z, placeItems
inc hl
inc hl
jr someplace
placeItems:
ld b, [hl]

If [hl] equals $63, execution jumps to placeItems.

Otherwise, executions continues through, increments hl twice, then jumps to "someplace"

Chaining conditionals

!!WARNING!! The code I'll be writing in assembly can be written in other ways. If you think you'd have done it in another way, try it. Count the instructions in your code, and if it is less than I did, then you did well !

Don't assume my way is the only. It is a good idea to try to find other ways to do the stuff I propose ! It's a good exercise !

Okay, so let's try the following C code, assuming a is the a register :

if(a == $2A) {
    // Success stuff
} else {
    // Failure stuff
}
// Rest of the code

In assembly, that's easy !

cp $2A
jr nz, failure
    ; Success stuff
jr afterConditional
failure:
    ; Failure stuff
afterConditional:
; Rest of the code

Think of another way... like, having the failure stuff first.

cp $2A
jr z, success
    ; Failure stuff
jr afterConditional
sucess:
    ; Success stuff
afterConditional:
; Rest of the code

Got it ? Good !

Okay. Let's get it one level higher.

if(b == $C0 && c == $DE) { // && means AND. But it's a logical AND - we'll see another AND later.
    // Success stuff
} else {
    // Failure stuff
}

Um... let's try doing multiple jumps.

ld a, $C0
cp b
jr nz, failure
ld a, c
cp $DE
jr nz, failure
    ; Success stuff
jr afterCond
failure:
    ; Failure stuff
afterCond:

Phew ! Not exactly the same, but it's still going fine.

To do a AND, simply treat stuff as a failure if any of the conditions fail.

Okay. Let's get it a lil' bit different.

if(b == $C0 || c == $DE) { // || means OR. But it's a logical OR- we'll see another OR later.
    // Success stuff
} else {
    // Failure stuff
}

Um... let's try doing multiple jumps.

ld a, $C0
cp b
jr z, success
ld a, c
cp $DE
jr z, success
    ; Failure stuff
jr afterCond
success:
    ; Success stuff
afterCond:

Okay ! Now, to do OR, we simply run each comparison. If any succeeds, we jump straight to SUCCESS. Othewise, we FAIL.

I'll leave the following code as an exercise :

if((h == $C0 && l == $DE) || a == $2A) {
    // Success stuff
} else {
    // Failure stuff
}

On to loopings

Some of you might have thought "Hey, ISSOtm. Up until now, you've been going forwards all the time. Even your jumps were skipping over instructions - but forever forwards. What if we went backwards ?"

Well, kudos to you ! This is the basis of looping structures.

Here is the most simple loop in assembly :

loop:
; Do stuff
jr cond, loop

Ta-daah ! Here is a more explicit (less generic) loop:

ld b, $06
countingLoop:
; Do stuff (admit it preserves b)
dec b ; Sets Z if b == 0 after the DEC.
jr nz, countingLoop ; go back if b is non-zero
; Do some MOAR stuff

This should run the "Do stuff" part six times exactly. If said part modifies b... it will work in other ways.

I'll leave to you as an exercise what would happen if the "ld b, $06" was replaced by a "ld b, $00"...

Now you got how to create loops. Neato. Let's see how to create... routines. Or procedures. Or functions. Whatever you call them.

Routines / Functions / Procedures / Whatever

I've told you about CALL and RET in the "Conditionals" section. Now let's see what they do.

They are a more avanced way to do jumps. Basically, you "call" a piece of code, that does its stuff, then "returns" to your code control of the CPU. Bam, CALL and RET explained.

CALL label16 Calls code starting at label16
CALL cond, label16 Same, but with the same conditionals as JP (not JR)
RET Returns to the previous "caller".
RET cond Do I need to explain ?

But wait, there is more ! And it is VITAL. The way call works is very simple :

  1. Get the address of the instruction right after the call.
  2. Push it onto the stack.
  3. Jump to the address specified by the call adress.

And ret works in a way that is compatible with call :

  1. Pop a number from the stack.
  2. Jump to that address.

Now, you need to be super-duper careful with the stack, since it is used by the call-ret system. Basically, make sure that between label16 and the next ret, you have done the same amount of PUSHes and POPs :

call routine
; Ton of code
routine:
push hl
push bc
; Code
pop bc
; Codez
pop hl
; Codeeeee
ret

is fine

call baaad
; Ton of code
baaad:
push hl
push bc
; Code
pop bc
; Codeeeey
ret

is bad.

Unless you'e ABSOLUTELY CERTAIN about what you're doing, upon leaving your routine's code, leave the stack the SAME is was. It is absolutely necessary to avoid screwing up everything.



Solutions to the exercises

Instructions get!

B can be swapped with any other register (except A)

A <- ... A*4 A*5 A*6 A*7
Code
add a, a ; *2
add a, a ; *4
ld b, a
add a, a ; *2
add a, a ; *4
add a, b ; *5
add a, a ; *2
ld b, a
add a, a ; *4
add a, b ; *6

or

ld b, a
add a, a ; *2
add a, b ; *3
add a, a ; *6
ld b, a
add a, a ; *2
add a, a ; *4
add a, a ; *8
sub a, b ; *7

Register pair and RAM

We will examine each instruction individually to find out what happens.

ld hl, $C303 ; Now H = $C3 and L = $03

ld a, [$C001] ; A = $03

ld b, 3 ; B = $03

add a, b ; A = A + B = $03 + $03 = $06, C flag = 0

ld c, 0 ; C = $00

sbc hl, bc ; HL = HL - (BC + C flag) = $C303 - ($0300 + $00) = $C003

ld [hl], a ; [HL] = [$C003] <- A = $06

inc hl ; HL = $C004

ld b, [hl] ; B = [HL] = [$C004] = $DE

sub a, b ; A = A - B = $06 - $DE = $06 + (-$DE) = $06 + ($21 + $01) = $28, C flag = 0
Notice here that doing sub a, b actually increased A's value!

inc [hl] ; [HL] = [$C004] = $DF

inc hl ; HL = $C005

ld [hl], b ; [HL] = B = $DE

ld bc, 9 ; B = $00, C = $09

add hl, bc ; HL = HL + BC = $C005 + $0009 = $C00E

ld [hl], a ; [HL] = [$C00E] = A = $28

ld [$C00B], hl ; [$C00B] = L = $0E, and [$C00C] = H = $C0

Initial values :

$C000 $C001 $C002 $C003 $C0004 $C005 $C006 $C007 $C008 $C009 $C00A $C00B $C00C $C00D $C00E $C00F
$00 $03 $4F $C0 $DE $57 $2A $00 $FF $01 $23 $34 $56 $78 $9A $BC

Final values :

$C000 $C001 $C002 $C003 $C0004 $C005 $C006 $C007 $C008 $C009 $C00A $C00B $C00C $C00D $C00E $C00F
$00 $03 $4F $06 $DF $DE $2A $00 $FF $01 $23 $34 $0E $C0 $28 $BC

Chaining conditionals

You think you gotcha ? I'll be explaining the code, don't'cha worry :)

cp $2A ; it's better to do the a comparison first, since we'll use it for later comparisons.
jr z, success ; if we don't jump, we will do the ANDed comparison.
ld a, b
cp $C0
jr nz, failure ; first AND operand...
ld a, c
cp $DE
jr nz, failure ; ...then the second. The order doesn't matter here.
success:
    ; Success stuff
jr afterConditional
failure:
    ; Failure stuff
afterConditional:

Didja get it ? If not, try to understand which part of the code matches which part of the C code. If you get it, you'll understand the assembly code. I admit it's tough at the first glance. I've been through this, don't'cha worry.

On to loopings

What would happen ? Well, imagine you ran the "Do stuff" part once. B is zero - we didn't touch it - and we reached a "dec b". Now, remember what the instruction does :

  1. We decrement B (B = 0 - 1 = 255, remember overflow ?)
  2. If B is zero, we set the Z flag. Otherwise we reset it. (Z is reset, since B != 0)

... so the loops starts again with B = 255.

tl;dr : the loop is ran 256 times !

"Hey ISSOtm, what if I wanted my loop to run zero times instead of 256 in that case ?" Simple !

ld a, b ; this does NOT modifiy Z !!
cp $0 ; there is a more efficient way of doing this, but you don't know it yet.
jr z, afterLoop ; if b - 0 == 0, we skip the loop completely.
countingLoop:
; Do stuff (admit it preserves b)
dec b ; Sets Z if b == 0 after the DEC.
jr nz, countingLoop ; go back if b is non-zero
afterLoop:
; Do some MOAR stuff

Credits & Resources

Tutorial written (mostly) by ISSOtm for Glitch City Laboratories.

Thanks to Torchickens and RaltsEye for correcting typos, formatting 'n stuff.

Includes bits of this tutorial for the ASM part.

Heavily using the Pan Docs for GameBoy-specific stuff.

The GCISheet is useful for understanding CPU instructions and can be combined with IIMarckus's opcode to instruction page or the copy on The Big HEX List.

Torchickens has started a tutorial for BGB emulator's debugger at The Cutting Room Floor