Unpacking Themida #1

I recently got a binary from a friend who wanted me to help him hack some game or other.
Now, the game aspect interests me less, but one I started working on this I started to get drawn in by the challenge of unpacking the binary itself so I thought I'd try to document my progress here.

I must start by saying that I have very little experience with packers. When I started this off (yesterday), I had no idea it was packed with Themida - or in fact that it was packed at all. I could immediately see that it was packed but I only know it's Themida from some debugger output that happened to include the word 'Themida'. I also have no previous experience with Themida at all and I have no idea what version/feature(s) of Themida this uses etc. etc. For all I know it's something else entirely that just happens to call itself Themida.

My aim here is purely the challenge; seeing if I can understand how the code executes, how the packing is performed and learning how to get by all those bastardly annoying little tricks those Themida people have designed to thwart attempts at working out what the hell's going on. LEARNING is what it's all about after all!

The more I get into this the more I realise that this is in fact one seriously complex ORGE.
That's because ogres have layers, as do onions, as does this crazy thing.
Many many many layers - and MAN can they make one cry.

I'll try peel it all back layer by layer on this blog; lets see how far we get.

Ok, let us begin.

Cracking open our target.exe in IDA we immediately see a warning:

Oh yes, we'd like to see the original imports section so we do as it kindly suggests.

Ok so we've got it open. We can see that the IDA auto-analyzer has identified 3 functions:

Lets start with 'start' @ 0x15f9000.

It seems that IDA has struggled with the analysis a bit. The stack pointer (SP) is clearly off in sub_15f900b.
What's obviously happened is that IDA has thought that this piece of code is two separate functions when in fact it is a single function (once can see this from the function prologue @ 0x15f9000 and epilogue @ 0x15f9045).

This is easily fixed. We just ask IDA to forget the function and create a new one specifying where we want to function end to be.

Aaaah that's much better. Now we can go ahead and clean it up a bit in Hex-Rays.

int __usercall start(_BYTE *debug_ins_location)  
  int res; // [sp+0h] [bp-4h]@0

  if ( *debug_ins_location == 0xCC )
    *debug_ins_location = 0;
    sub_15f9046((int *)(debug_ins_location - 0x2A200A),
  return res;

It's quite easy to see what's going on here.
The process checks that the trap to debugger instruction (INT 3 / 0xcc) is still present. If it is, it patches it out to 00. I guess this check is to make sure that some poor fool doesn't just patch out this instruction in the hope of getting their debugger to work (the anti-debugging comes later). Pretty easy to circumvent this anyway.

It then calls sub_15f9046 with a number of parameters. Lets dissect this next.

int __stdcall sub_15F9046(int a1, unsigned int a2, int a3, int a4)  
  int v4; // esi@1
  unsigned int i; // ecx@1
  int v7; // [sp+4h] [bp-4h]@0

  v4 = a1;
  for ( i = a2 >> 2; i; --i )
    *(_DWORD *)v4 ^= a3;
    *(_DWORD *)v4 += a4;
    v4 += 4;
  return v7;

The function takes a pointer to some location, a value which it divides by 4 (>> 2) and two further values which it uses to mangle the buffer.
The buffer is looped over in 4 byte increments (sizeof(int) - a dword) and the second parameter (a2) is divided by 4 (sizeof(int)) so in essence we can consider the buffer as an array of dwords. Each dword in the array is decrypted with an XOR a3 and + a4. a3 and a4 can therefore both be considered to be a form of a key.

Cleaning it up, it looks as follows:

int __stdcall decrypt_buffer(int *p_buf, unsigned int len, int key1, int key2)  
  unsigned int i; // ecx@1
  int res; // [sp+4h] [bp-4h]@0

  for ( i = len >> 2; i; --i )
    *p_buf ^= key1;
    *p_buf += key2;
  return res;

Looking at the caller again:

// @ 0x15f9037
decrypt_buffer((int *)(debug_ins_location - 0x2A200A),  

We can work out the offset of the buffer by looking at the location of the INT 3 instruction (0x15f900a) and subtracting 0x2a200a = 0x1357000.
The length is 0x1000 which means the dword array has 1000 elements. The two 'keys' can be seen as well.

To perform the decryption itself within the IDB we can use some IDAPython magic to decrypt and patch out the relevant bytes:

def decrypt_dword(dword, key1, key2):  
    dword ^= key1
    dword += key2
    return dword

def decrypt_dword_ea(ea, key1, key2):  
    return decrypt_dword(Dword(ea), key1, key2)

def decrypt_block_ea(ea, num_elements, key1, key2):  
    for i in range(0, num_elements):
        val = decrypt_dword_ea(ea + i * 4, key1, key2)
        PatchDword(ea + i * 4, val)

# perform the decrypt on the relevant block
decrypt\_block\_ea(0x1357000, 1000, 0x6F44AB66, 0x6AAB8CDA)  

The start of the block before decryption:

The start of the block after decryption:

We can clearly see now that there is code here (see below), which moves us on to our next layer...