So far, i haven't even finished decoding the instruction sets so don't expect much yet, the ARM7TDMI is a pretty complex CPU. I will outline the method that I am using to decode the instruction sets of the ARM7TDMI, incase it's useful to any other GBA emulator authors who were stuck like I was blankly staring at the ARM7 opcode format for ages.
Keep in mind that it is very difficult to explain the exact way that i decode each instruction without alot of text, but I will try to explain as best I can with some abstraction.
The ARM7TDMI effectively has two instruction sets, one that makes use of full 32bit instructions called the ARM instruction set, and another which only uses 16 bit instructions called the THUMB instruction set which reduces code density but removes some instructions as well as limiting the immediate values that can be used. You can independently switch between these two instruction sets on the fly using the BX instruction.
THUMB instruction set
As the THUMB instruction set is the easiest to decode, I will tackle this first.
Binary opcode format of the THUMB instruction set.
Each instruction is made up of 2 bytes (half word), and each instruction has it's own format that is decoded by the CPU. As you can see, some instruction formats like format 1 have an opcode field which will specify the particular instruction that this is, for example, the "Move shifted register" format has a 2 bit opcode field. If you want more detail on the specifics of the instructions and the fields, then take a look at the manual here, or check out NO$GBA specs here.
Now, the way I decode an instruction is by taking the upper 8 bits of the instruction and checking it's value. For example, if I took the upper 8 bits and the value was equal to 0 to 7, I would know that this must be a "LSL Rd, Rs, Offset". Why? This is because while the upper 8 bits equal 0 to 7, then the opcode field (move shifted register format) is equal to 0 and all the other bits for this format are correct. here is a more visual example.
Let's say our upper 8 bits look like the ones in the picture
If we check the instruction format, we can see that the "Move shifted register" format allows for this value, as the bottom three bits of 8 bit value we are using are the top three bits of the offset5 field (bits 8,9, 10), which could therefor be anything as they are arbitrary, so the following binary values can therefor reperesent the same instruction. ("LSL Rd, Rs, Offset5").
If we were to get a value of 8 (00001000), this would mean that the opcode field had increased by one, so the instruction would now be "LSR Rd, Rs, Offset5". So now, we need to do the same again, as the bottom three bits could be anything. these binary values will all be the same instruction.
Now to decode the rest of the instruction set, you just follow this logical pattern.
Here is a coded example of how you might decode the first couple of instructions...
<textarea>u16 instruction = fetch();
switch (instruction >> 8)
// LSL Rd, Rs, Offset5
// LSR Rd, Rs, Offset5
or you may choose to use an array of function pointers.
void (*CPU_Thumb_Instruction[0x100]) (u16 instruction) =
Thumb_LSL_Rd_Rs_Offset, // 00h
Thumb_LSL_Rd_Rs_Offset, // 01h
Thumb_LSL_Rd_Rs_Offset, // 02h
Thumb_LSL_Rd_Rs_Offset, // 03h
Thumb_LSL_Rd_Rs_Offset, // 04h
Thumb_LSL_Rd_Rs_Offset, // 05h
Thumb_LSL_Rd_Rs_Offset, // 06h
Thumb_LSL_Rd_Rs_Offset, // 07h
Thumb_LSR_Rd_Rs_Offset, // 08h
Thumb_LSR_Rd_Rs_Offset, // 09h
Thumb_LSR_Rd_Rs_Offset, // 0Ah
Thumb_LSR_Rd_Rs_Offset, // 0Bh
Thumb_LSR_Rd_Rs_Offset, // 0Ch
Thumb_LSR_Rd_Rs_Offset, // 0Dh
Thumb_LSR_Rd_Rs_Offset, // 0Eh
Thumb_LSR_Rd_Rs_Offset, // 0Fh
I will explain how to decode the ARM instruction set at a later date, as it is considerably more complex, and requires much more thought and care.