x64 spoon

Wed 16 February 2011 by ivan

While coding and debugging some low-level stuff I sometime need to write a little piece of assembly code to see if i'm right. Until now, I was writing code into a process debugged with OllyDbg, and steping it. Pretty ugly, but it works when you want to know what a "smsw eax" is doing. Last time, I was confronted to the X64 reality, where no public tool like OllyDbg allow you to debug a 64-bit process on Windows. So I decided to write a little application to see how some 64-bit instructions are running.

Just for the fun I started this project on Linux X86_64, and just because it's cool the code is running into a "sandboxed" environment.

I know Linux provides a nice feature for hardened sandboxing called SECCOMP. The idea is, you can only use 4 syscalls: read(), write(), exit() and sigreturn. Of course read() and write() can only be called on existing opened file descriptors, so this can be pretty safe.This feature is even use by Chrome for its Linux sandbox. If you are interested, more details are given by Nicolás Bareil in Sandboxing based on SECCOMP for Linux kernel.

Also, I'm using Linux X86_64 but I wanted to run both 64-bit and 32-bit code snippets without having 2 different processes. You know 64-bit kernels (CONFIG_X86_64) allow you to run 32-bit native applications if you compile in support for CONFIG_IA32_EMULATION.

A few words about X64 computing. For the CPU this mode is called IA-32e mode (or long-mode). To enable it you need to set bit LME (Long Mode Enabled) in MSR IA32_EFER (see Intel manual vol 3A : Initializing IA-32e Mode). Then if you want to run 64-bit code your must setup a code segment (CS) with bit L (64-bit code segment) at 1 and D (operand-size) to 0. (Intel manual vol 3A: Segment Descriptors).So if you want to run both 32-bit and 64-bit code with the same kernel, you must have at least two segment descriptors in your GDT, or have 2 GDTs (General Descriptor Table). One entry must have L=1 and D=0, for 64-bit code, and the other L=0 and D=1 for 32-bit code. Linux uses 1 GDT, with 2 segments named GDT_ENTRY_DEFAULT_USER_CS and GDT_ENTRY_DEFAULT_USER32_CS. For your indication cs64=0x33 (RPL:3 TABLE:0 INDEX:6) and cs32=0x23 (RPL:3 TABLE:0 INDEX:4)

So if you want to run pure 32-bit code into a 64-bit task you "just" need to update your segment selector. This task can be hard because you cannot just change your CS segment selector with a MOV instruction, other ways are provided to do this.

In order to perform this hack I chose the famous Metasm framework. So here is my configuration:

ivan@converge:~$ ruby -v
ruby 1.8.7 (2010-08-16 patchlevel 302) [x86_64-linux]
ivan@converge:~$  hg clone https://metasm.cr0.org/hg/metasm
requesting all changes
adding changesets
adding manifests
adding file changes
added 2239 changesets with 4294 changes to 306 files
updating to branch default
217 files updated, 0 files merged, 0 files removed, 0 files unresolved
ivan@converge:~$ export RUBYLIB=~/metasm

For my POC i want to have my Ruby process and:

  • Grab some assembly (64 or 32) and compile it.
  • Create a subprocess with restrictions: limited access to resources and SECCOMP mode enabled. The subprocess can exchange with his parent only via a dedicated pipe.
  • Run the shellcode in a basic environment. All GPRs (General Purpose Register) cleared and a clean stack. Code and data segments are flats (they address all the memory). FS and GS are not supported.
  • Read the result with the parent. I'll use the GPRs final values as result.

Do it yourself

For the first part we will use the Shellcode class from Metasm. With method assemble we easily assemble what we want by specifying the target CPU. We will use the X86_64 CPU for now, so only assemble 64-bit code. You can obtain the raw assembly with the encode_string method on the Shellcode object.For example:

asm=Metasm::Shellcode.assemble(Metasm::X86_64.new, shellcode_src).encode_string

And you got your code in asm variable.

The next part requires creating a subprocess. For this we just fork() and after that, in the child, we setup some limits :

  • CPU user time and memory allocation are limited with setrlimit() calls, on RLIMIT_CPU and RLIMIT_AS. The setrlimit and fork methods are directly provided by the core ruby class Process.
  • To close all possible opened file descriptors (because Ruby's runtime could have opened some FDs I'm not aware of) I decided to use syscall close() because I cannot find a sysclose() method in Ruby IO's class (even if sysopen() is present ...). But it's impossible to directly use LIBC functions from Ruby. Impossible? Not for Metasm ! I used Metasm's DynLdr module, its role is closed to python module CTYPES. It provides you a wrapper for any native library. So you can wrap a function like memfrob() and invoke it from your Ruby code.

Moreover, the DynLdr module allows to compile and run asm or C code directly in the Ruby process. Pretty useful isn't it?

So if I want to close all fds except mine, I can write:

dl.new_func_c <<EOS
int close(int);
void closefds(int max, int keep)
        int fd;
        for(fd=0; fd<=max; fd++)
dl.closefds(maxfd, wr.fileno)

Notice the new_func_c method, it parses the C source and compiles it on the fly ! You can even call stdlib functions.

  • To enable SECCOMP mode you call prctl() with option PR_SET_SECCOMP (22). Very easy now with DynLdr:
# enter SECCOMP mode
dl.new_api_c("int prctl(int option, long, long, long, long);", "/lib/libc.so.6")
if not dl.respond_to? :prctl
        puts "[-] prctl not found"

status=dl.prctl(22, 1, 0, 0, 0)
if status!=0
        puts "[-] prctl call failed"

This time I didn't use new_func_c but new_api_c, with this one you can declare an extern C function from its prototype to make it available from the Ruby. Here the second argument (libc.so.6) is not necessary because all GNU libc exports are already defined in metasm/os/gnu_exports.rb ; but now the reader knows he can interface with others libraries :]

Our child process is now able to run safely our code. If we want to dump the GPRs after the code execution we need to wrap it. For this, I use new_func_asm to create a new function written in assembly, compile it and load it inside the Ruby process address space. In a few words, this function clears the GPRs, calls the code and puts the GPRs into a buffer given as argument. Then I just have to print results in the pipe for our parent to have them.

So if I decide to run :

ivan@converge:~$ cat shellcode
inc rax
mov rdi, 0xdeadbeefc0febabe
xchg rax, rbx
push rdi
pop rsi
xor rdi, rdi
ivan@converge:~$ ruby spoon.rb shellcode --x64
Rax: 0x0 | Rbx: 0x1 | Rcx: 0x0 | Rdx: 0x0 | Rsi: 0xDEADBEEFC0FEBABE | Rdi: 0x0

To sum up, Metasm assembled code from file 'shellcode', forked a child, setup some restrictions, ran it and printed the GPRs state after the assembly execution. All of this happened in the Ruby process. Here, I chose not to print registers R8 to R15.

And if I try to run this :

ivan@converge:~$ cat shellcode
jmp $
ivan@converge:~$ ruby spoon.rb shellcode --x64
 [wait 2s]
Something wrong happened !

The child was killed by the kernel because he has consumed too much user CPU time.

Gimme moar power !

In this part we will see how to run 32-bit code in our 64-bit Ruby process. The plan is :

  1. Run 32-bit code in a 64-bit task.
  2. ???
  3. Profit !

As I said at the beginning, I want to run both 32-bit and 64-bit code in the same process. To achieve that I could reuse the GDT_ENTRY_DEFAULT_USER32_CS code segment selector, but I decided to see if Linux is well working, so I'll create my own code segment with syscall modify_ldt(). Here I'm allocating my own LDT (Local Descriptor Table), the same than a GDT but only for my process. In this LDT will be placed my own segment descriptors. For example I can have a 16bits dedicated stack segment. Of course you cannot do what you want with the new segment descriptors, the kernel validates them before, but you know sometimes you're doing it wrong !

So basically, I need to:

  1. Map my code and stack at some 32-bit address. We use mmap() with flag MAP_32BIT for this.
  2. Initialize the stack and write the code in memory.
  3. Perform a 64 to 32 transition.
  4. Execute the 32-bit code.
  5. Go back to a 64-bit code segment to print the result.

The first point is easy, now armed with Metasm:

dl.new_api_c <<EOS
long mmap(long addr, long length, long prot, long flags, long fd, long offset);
long munmap(long addr, long length);

struct user_desc {
  unsigned __int32 entry_number;
  unsigned __int32 base_addr;
  unsigned __int32 limit;
  int seg_32bit:1;
  int contents:2;
  int read_exec_only:1;
  int limit_in_pages:1;
  int seg_not_present:1;
  int useable:1;

long modify_ldt(int func, struct user_desc*ptr, long bytecount);

#define PROT_R 1
#define PROT_W 2
#define PROT_X 4

#define MAP_PV 0x2
#define MAP_ANON 0x20
#define MAP_32 0x40
if not dl.respond_to? :mmap
        puts "[-] mmap not found"

if not dl.respond_to? :modify_ldt
        puts "[-] modify_ldt not found"

# alloc memory in 32b user-space
code32=dl.mmap(0, MEMSIZE, dl::PROT_R|dl::PROT_W|dl::PROT_X, dl::MAP_PV|dl::MAP_ANON|dl::MAP_32, -1, 0)
if code32==-1
        puts "[-] mmap failed (code32)"
puts "[+] Allocated code32 at: #{code32.to_s(16)}"

stack32=dl.mmap(0, MEMSIZE, dl::PROT_R|dl::PROT_W, dl::MAP_PV|dl::MAP_ANON|dl::MAP_32, -1, 0)
if stack32==-1
        puts "[-] mmap failed (stack32)"
puts "[+] Allocated stack32 at: #{stack32.to_s(16)}"

# Our LDT:
# 0 empty
# 1 code32 segment
# ...
# create an LDT entry, map code segment to our memory and copy shellcode into
# entry=1
# ldt_entry base_addr size_in_pages
# 32bits:1 type:2 (2=code) readonly:0 limit_in_pages:1 seg_not_present:0 usable:1
if dl.modify_ldt(1, struct, struct.length)!=0
        puts "[-] modify_ldt failed"

As you can see, Metasm supports structure declarations. You initialize them with alloc_c_struct.

Now the hard part, how do we switch from a 64-bit to a 32-bit code segment? We can perform a far-call, a far-jmp, even a far-ret ; but I prefer to use an iret instruction because it allows you to change both CS, EIP, SS and the ESP register at the same time. The iret instruction is allowed for this because we are moving to a conforming code segment with the same DPL (Descriptor Privilege Level). To be short, we stay in ring3 user-land :]

To call iret we need a proper stack to tell what the CPU state will be. To be precise we must provide EIP, CS, EFLAGS, ESP and SS. Even if we still have a 64bits stack we need to push argument as if we were in 32bits, so the stack will look like this:

 Highdw    Lowdw
[ eip    | eip  ] <- rsp
[ eflags | esp  ]
[ ss     |  0   ]

Then we launch iret and land in our new code segment.

Last question: how to come back from 32-bit to the 64-bit code segment ? Remember we moved our code to 32-bit addressable memory, but our original 64-bit code (the caller) is not necessarily in this range, so we need a stager in 64-bit code inside the 32-bit addressable memory to perform a far-jmp back to our caller. At the end of the 32-bit code we put a far-ret, when we have initialized the stack some things were pushed, the original 64-bit code segment (GDT_ENTRY_DEFAULT_USER_CS if you prefer) and pointer to a code located just after the stager.

At the end of the 32-bit code execution, we do a far-ret, which comes back in 64-bit mode, but still in the 32-bit addressable memory, and then we can call our far-jmp to our caller.

Last thing, the far-jmp has to know where to find the caller RIP and RSP.

Here is the code ('shellcode' holds our binary 32bit compiled shellcode):

stager=Metasm::Shellcode.assemble(Metasm::X86_64.new, <<EOS).encode_string
# our 32-bit shellcode goes here
db #{shellcode.inspect}
retf # return in 64bits to next mov instruction

# stack crafted so that retf lands here, in 64-bit mode
mov rsp, [rip-$_+1f]
jmp [rip-$_+2f]
1: dq 0xdeadbeefc0feb4be ; caller rsp, patch me !
2: dq 0xdeadbeefc0feb4be ; caller rip, patch me !
dl.memory_write(code32, stager)

Now we are able to run 32-bit code in our 64-bit process ! For example :

ivan@converge:~$ cat shellcode
inc eax
mov edi, 0xdeadbeef
xchg eax, ebx
push edi
pop esi
xor edi, edi
ivan@converge:~$ ruby spoon.rb shellcode
Eax: 0x0 | Ebx: 0x1 | Ecx: 0x0 | Edx: 0x0 | Esi: 0xDEADBEEF | Edi: 0x0

And if I try:

ivan@converge:~$ cat shellcode
inc rax
ivan@converge:~$ ruby spoon.rb shellcode
/home/ivan/metasm/metasm/parse.rb:59:in `parse_instruction': invalid opcode arguments "inc rax", allowed : [[:reg], [:modrm], [:modrm]] near "inc" at "\"<unk>\"" line 1 (Metasm::ParseError)
        from /home/ivan/metasm/metasm/parse.rb:330:in `parse'
        from /home/ivan/metasm/metasm/exe_format/shellcode.rb:69:in `assemble'
        from /home/ivan/metasm/metasm/exe_format/main.rb:70:in `assemble'
        from spoon.rb:237:in `run_shellcode'
        from spoon.rb:315

I can haz syscallz ??

Now you should be like Bruce Dang:wtfhead

The point is: "Ok, I'm seccomped now, but i still can use write on the opened file descriptor, so wut can happened?"Knowing the opened pipe file descriptor number is 4 you can run:

ivan@converge:~$ cat shellcode
mov rbx, 'i<3bruce'
push rbx
inc rax ; syscall write in 64bits
mov rdi, 4 ; fd
lea rsi, [rsp]
inc rdx
shl rdx, 3 ; 8 chars
pop rbx
ivan@converge:~$ ruby spoon.rb shellcode  --x64
i<3bruceRax: 0x8 | Rbx: 0x6563757262333C69 | Rcx: 0x7FFC4E5F4082 | Rdx: 0x8 | Rsi: 0x7FFF111A99F0 | Rdi: 0x4

You can show your love ! You just have written your own data in the pipe, not very useful ... Moreover, I have to admit it, the x64 syscalls calling convention is a bit scary.

A cool thing is you can always call 32-bit syscalls from a 64-bit process:

ivan@converge:~$ cat shellcode
push 'ruce'
push 'i<3b'
mov eax, 4 ; sys_write in 32-bit
mov ebx, 4 ; fd
mov ecx, esp
mov edx, 8 ; 8 chars
int 0x80
add esp, 8
ivan@converge:~$ ruby spoon.rb shellcode
i<3bruceEax: 0x8 | Ebx: 0x4 | Ecx: 0x41443FF0 | Edx: 0x8 | Esi: 0x0 | Edi: 0x0

Time to die pilot !

Now, I know what "smsw rax" does in 64bits, and that changed my life. By the way, while testing this code we noticed a few bugs with some Ruby x64 packages. The Ruby process is flooded by rt_sigprocmask() syscalls and you need to recompile your Ruby to avoid that, like says here. If you dont, as soon as the child process enters SECCOMP mode, he is immediately killed ... Well, it was not designed to be SECCOMP safe =]

There is not a public release of an OllyDbg64 like for now. But, when I have to deal with x64 assembly this little script is convenient. Of course you can do the same on Windows, you just have to remove all the Linux-dependant things :]

Anyway, if you need to do some ASM hacks, Metasm is very convenient. I know it's Ruby but one day you have to evolve to use some real tools, and that is a big one. You can find the Ruby source, spoon.rb, attached to this post. Thanks a lot to @metasm for his help !

Have fun with Metasm !