Recently, I have began work on a GBA emulator which is keeping me busy. The emulator that I worked on before this (and still do) is a Nes emulator, so this is a huge step up for me, but where is the fun without challenge?
So far, i haven't even finished decoding the instruction sets so don't expect much yet, the ARM7TDMI is a pretty complex CPU. I will outline the method that I am using to decode the instruction sets of the ARM7TDMI, incase it's useful to any other GBA emulator authors who were stuck like I was blankly staring at the ARM7 opcode format for ages.
Keep in mind that it is very difficult to explain the exact way that i decode each instruction without alot of text, but I will try to explain as best I can with some abstraction.
The ARM7TDMI effectively has two instruction sets, one that makes use of full 32bit instructions called the ARM instruction set, and another which only uses 16 bit instructions called the THUMB instruction set which reduces code density but removes some instructions as well as limiting the immediate values that can be used. You can independently switch between these two instruction sets on the fly using the BX instruction.
THUMB instruction set
As the THUMB instruction set is the easiest to decode, I will tackle this first.
Binary opcode format of the THUMB instruction set.
Each instruction is made up of 2 bytes (half word), and each instruction has it's own format that is decoded by the CPU. As you can see, some instruction formats like format 1 have an opcode field which will specify the particular instruction that this is, for example, the "Move shifted register" format has a 2 bit opcode field. If you want more detail on the specifics of the instructions and the fields, then take a look at the manual
here, or check out NO$GBA specs
here.
Now, the way I decode an instruction is by taking the upper 8 bits of the instruction and checking it's value. For example, if I took the upper 8 bits and the value was equal to 0 to 7, I would know that this must be a "LSL Rd, Rs, Offset". Why? This is because while the upper 8 bits equal 0 to 7, then the opcode field (move shifted register format) is equal to 0 and all the other bits for this format are correct. here is a
more visual example.
Let's say our upper 8 bits look like the ones in the picture
00000101
If we check the instruction format, we can see that the "Move shifted register" format allows for this value, as the bottom three bits of 8 bit value we are using are the
top three bits of the offset5 field (bits 8,9, 10), which could therefor be anything as they are arbitrary, so the following binary values can therefor reperesent the same instruction. ("LSL Rd, Rs, Offset5").
00000000 (0)
00000001 (1)
00000010 (2)
00000011 (3)
00000100 (4)
00000101 (5)
00000110 (6)
00000111 (7)
If we were to get a value of 8 (
00001000), this would mean that the opcode field had increased by one, so the instruction would now be "LSR Rd, Rs, Offset5". So now, we need to do the same again, as the bottom three bits could be anything. these binary values will all be the same instruction.
00001000 (8)
00001001 (9)
00001010 (A)
00001011 (B)
00001100 (C)
00001101 (D)
00001110 (E)
00001111 (F)
Now to decode the rest of the instruction set, you just follow this logical pattern.
Here is a coded example of how you might decode the first couple of instructions...
<textarea>u16 instruction = fetch();
switch (instruction >> 8)
{
case 0:
case 1:
case 2:
case 3:
case 4:
case 5:
case 6:
case 7:
{
// LSL Rd, Rs, Offset5
} break;
case 8:
case 9:
case 0xA:
case 0xB:
case 0xC:
case 0xD:
case 0xE:
case 0xF:
{
// LSR Rd, Rs, Offset5
} break;
}
or you may choose to use an array of function pointers.
void (*CPU_Thumb_Instruction[0x100]) (u16 instruction) =
{
Thumb_LSL_Rd_Rs_Offset, // 00h
Thumb_LSL_Rd_Rs_Offset, // 01h
Thumb_LSL_Rd_Rs_Offset, // 02h
Thumb_LSL_Rd_Rs_Offset, // 03h
Thumb_LSL_Rd_Rs_Offset, // 04h
Thumb_LSL_Rd_Rs_Offset, // 05h
Thumb_LSL_Rd_Rs_Offset, // 06h
Thumb_LSL_Rd_Rs_Offset, // 07h
/*-------------------*/
Thumb_LSR_Rd_Rs_Offset, // 08h
Thumb_LSR_Rd_Rs_Offset, // 09h
Thumb_LSR_Rd_Rs_Offset, // 0Ah
Thumb_LSR_Rd_Rs_Offset, // 0Bh
Thumb_LSR_Rd_Rs_Offset, // 0Ch
Thumb_LSR_Rd_Rs_Offset, // 0Dh
Thumb_LSR_Rd_Rs_Offset, // 0Eh
Thumb_LSR_Rd_Rs_Offset, // 0Fh
};
</textarea>
I will explain how to decode the ARM instruction set at a later date, as it is considerably more complex, and requires much more thought and care.
Peace