Decoding obfuscated strings in What's App Dalvik Executable bytecode

I was poking around in the What's App Android DEX this evening. One notices immediately that they are using various obfuscations, once of which is a string obfuscation (this is NOT DexGuard, maybe Zelix KlassMaster?).
It took me a little while to learn some Dalvik bytecode, but it seems the obfuscation is fairly straight forward.

Let's take a look at one such function:

CODE:0010EFB8   Method 6552 (0x1998):_  
CODE:0010EFB8    static void  
CODE:0010EFB8   com.whatsapp.n0.()  
CODE:0010EFB8   const/16                        v2, 0x5B  
CODE:0010EFBC   const/16                        v3, 0x39  
CODE:0010EFC0   const/16                        v1, 0x19  
CODE:0010EFC4   const/16                        v4, 9  
CODE:0010EFC8   const/4                         v6, 0  
CODE:0010EFCA   const/4                         v0, 2  
CODE:0010EFCC   new-array                       v9, v0,   
CODE:0010EFD0   const-string                    v0, aK_20 # "k>^"  
CODE:0010EFD4   invoke-virtual                  {v0},   
CODE:0010EFDA   move-result-object              v0  
CODE:0010EFDC   array-length                    v5, v0  
CODE:0010EFDE   move                            v7, v5  
CODE:0010EFE0   move                            v8, v6  
CODE:0010EFE2   move-object                     v5, v0  
CODE:0010EFE4 loc_10EFE4:                             # CODE XREF: n0__clinit_@V+9Aj  
CODE:0010EFE4   if-gt                           v7, v8, loc_10F034  
CODE:0010EFE8   new-instance                    v0,   
CODE:0010EFEC   invoke-direct                   {v0, v5}, (ref) imp. @ unk_2EB58>  
CODE:0010EFF2   invoke-virtual                  {v0},   
CODE:0010EFF8   move-result-object              v0  
CODE:0010EFFA   aput-object                     v0, v9, v6  
CODE:0010EFFE   const/4                         v8, 1  
CODE:0010F000   const-string                    v0, aE_10 # "E"  
CODE:0010F004   invoke-virtual                  {v0},   
CODE:0010F00A   move-result-object              v0  
CODE:0010F00C   array-length                    v5, v0  
CODE:0010F00E   move                            v7, v6  
CODE:0010F010   move                            v6, v5  
CODE:0010F012   move-object                     v5, v0  
CODE:0010F014 loc_10F014:                             # CODE XREF: n0__clinit_@V+CCj  
CODE:0010F014   if-gt                           v6, v7, loc_10F066  
CODE:0010F018   new-instance                    v0,   
CODE:0010F01C   invoke-direct                   {v0, v5}, (ref) imp. @ unk_2EB58>  
CODE:0010F022   invoke-virtual                  {v0},   
CODE:0010F028   move-result-object              v0  
CODE:0010F02A   aput-object                     v0, v9, v8  
CODE:0010F02E   sput-object                     v9, n0_z  
CODE:0010F032 locret:  
CODE:0010F032   return-void  
CODE:0010F034 # ---------------------------------------------------------------------------  
CODE:0010F034 loc_10F034:                             # CODE XREF: n0__clinit_@V:loc_10EFE4j  
CODE:0010F034   aget-char                       v10, v5, v8  
CODE:0010F038   rem-int/lit8                    v0, v8, 5  
CODE:0010F03C   packed-switch                   v0, switchdata_10F098  
CODE:0010F042 # ---------------------------------------------------------------------------  
CODE:0010F042 loc_10F042:                             # CODE XREF: n0__clinit_@V+84j  
CODE:0010F042   move                            v0, v4 # default:  
CODE:0010F044 loc_10F044:                             # CODE XREF: n0__clinit_@V+9Ej  
CODE:0010F044                                         # n0__clinit_@V+A2j ...  
CODE:0010F044   xor-int/2addr                   v0, v10  
CODE:0010F046   int-to-char                     v0, v0  
CODE:0010F048   aput-char                       v0, v5, v8  
CODE:0010F04C   add-int/lit8                    v0, v8, 1  
CODE:0010F050   move                            v8, v0  
CODE:0010F052   goto                            loc_10EFE4  
CODE:0010F054 # ---------------------------------------------------------------------------  
CODE:0010F054 loc_10F054:                             # CODE XREF: n0__clinit_@V+84j  
CODE:0010F054   move                            v0, v1 # case 0: // (0x0)  
CODE:0010F056   goto                            loc_10F044  
CODE:0010F058 # ---------------------------------------------------------------------------  
CODE:0010F058 loc_10F058:                             # CODE XREF: n0__clinit_@V+84j  
CODE:0010F058   move                            v0, v2 # case 1: // (0x1)  
CODE:0010F05A   goto                            loc_10F044  
CODE:0010F05C # ---------------------------------------------------------------------------  
CODE:0010F05C loc_10F05C:                             # CODE XREF: n0__clinit_@V+84j  
CODE:0010F05C   move                            v0, v3 # case 2: // (0x2)  
CODE:0010F05E   goto                            loc_10F044  
CODE:0010F060 # ---------------------------------------------------------------------------  
CODE:0010F060 loc_10F060:                             # CODE XREF: n0__clinit_@V+84j  
CODE:0010F060   const/16                        v0, 0x75 # case 3: // (0x3)  
CODE:0010F064   goto                            loc_10F044  
CODE:0010F066 # ---------------------------------------------------------------------------  
CODE:0010F066 loc_10F066:                             # CODE XREF: n0__clinit_@V:loc_10F014j  
CODE:0010F066   aget-char                       v10, v5, v7  
CODE:0010F06A   rem-int/lit8                    v0, v7, 5  
CODE:0010F06E   packed-switch                   v0, switchdata_10F0B0  
CODE:0010F074 # ---------------------------------------------------------------------------  
CODE:0010F074 loc_10F074:                             # CODE XREF: n0__clinit_@V+B6j  
CODE:0010F074   move                            v0, v4 # default:  
CODE:0010F076 loc_10F076:                             # CODE XREF: n0__clinit_@V+D0j  
CODE:0010F076                                         # n0__clinit_@V+D4j ...  
CODE:0010F076   xor-int/2addr                   v0, v10  
CODE:0010F078   int-to-char                     v0, v0  
CODE:0010F07A   aput-char                       v0, v5, v7  
CODE:0010F07E   add-int/lit8                    v0, v7, 1  
CODE:0010F082   move                            v7, v0  
CODE:0010F084   goto                            loc_10F014  
CODE:0010F086 # ---------------------------------------------------------------------------  
CODE:0010F086 loc_10F086:                             # CODE XREF: n0__clinit_@V+B6j  
CODE:0010F086   move                            v0, v1 # case 0: // (0x0)  
CODE:0010F088   goto                            loc_10F076  
CODE:0010F08A # ---------------------------------------------------------------------------  
CODE:0010F08A loc_10F08A:                             # CODE XREF: n0__clinit_@V+B6j  
CODE:0010F08A   move                            v0, v2 # case 1: // (0x1)  
CODE:0010F08C   goto                            loc_10F076  
CODE:0010F08E # ---------------------------------------------------------------------------  
CODE:0010F08E loc_10F08E:                             # CODE XREF: n0__clinit_@V+B6j  
CODE:0010F08E   move                            v0, v3 # case 2: // (0x2)  
CODE:0010F090   goto                            loc_10F076  
CODE:0010F092 # ---------------------------------------------------------------------------  
CODE:0010F092 loc_10F092:                             # CODE XREF: n0__clinit_@V+B6j  
CODE:0010F092   const/16                        v0, 0x75 # case 3: // (0x3)  
CODE:0010F096   goto                            loc_10F076  
CODE:0010F096 # ---------------------------------------------------------------------------  
CODE:0010F098 switchdata_10F098:                      # DATA XREF: n0__clinit_@V+84r  
CODE:0010F098   .short 0x100  
CODE:0010F09A   .short 4  
CODE:0010F09C   .int 0  
CODE:0010F0A0   .int 0xC, 0xE, 0x10, 0x12  
CODE:0010F0B0 switchdata_10F0B0:                      # DATA XREF: n0__clinit_@V+B6r  
CODE:0010F0B0   .short 0x100  
CODE:0010F0B2   .short 4  
CODE:0010F0B4   .int 0  
CODE:0010F0B8   .int 0xC, 0xE, 0x10, 0x12  
CODE:0010F0B8   Method End  

The first thing we notice is that the function is assigning some literal values to registers v1-v4. These values appear to remain unchanged through the decoding code.

Moving along in the code, we can see that there's some sort of mangled string being assigned @ 0x0010EFD0.
Pulling out the string until its NULL termination yields the following:


The function will now loop through each character of the string and perform a transform on it in order to decrypt it. The transform is a simple XOR.

Remember those 4 registers we set up with literals in the beginning? These are the values used for the XOR. In fact there are 5 values in this function; the fifth one is simply used as a direct literal instead of having been placed in a register.

The 'packed-switch' instruction decides which XOR value to use based on the current character index in the string MOD 5. This allows us to repeatedly cycle through the XOR 'keys'.

Here's a python function that performs the decryption:

def decode_string(encoded_str, key):
    encoded_str = bytearray(encoded_str)
    decoded_str = ''

    for i in range(len(encoded_str)):
        decoded_str += chr(encoded_str[i] ^ key[i % 5])

    return decoded_str

And in use:

import binascii

key = [0x19, 0x5b, 0x39, 0x75, 0x09]
encoded_str = binascii.unhexlify("6b3e5e1c7a6d3e4b5a7971345710267a344c1b7d6b224e147d7a335c0726783d4d107b6d3e41016a713a57126c7d7b551a66722e4936666c354d07705a345d10297f295618295a344c1b7d6b22691d66773e701b6f767b5f1460753e5d")

print decode_string(encoded_str, key)

Which in this case yields:

register/phone/countrywatcher/aftertextchanged lookupCountryCode from CountryPhoneInfo failed

Happy Hacking!