Linux syscall ABITue 05 July 2011 by jj
A quick post to summarize the linux kernel syscall ABI on i386 architecture.
It is hard to come by a short summary of how to do direct syscalls under the linux kernel. This does not intend to be exhaustive nor authoritative, but at least now I'll have something to bookmark :)
Linux kernel - X86
The kernel abi allows up to 6 register arguments
After the syscall, the return value is stored in eax, and execution continues after the int 80h instruction. All other register values are preserved.
mov eax, __NR_write mov ebx, 1 mov ecx, string_label mov edx, string_length int 80h ret
I think this instruction is specific to Intel CPUs.
|arg 6||dword ptr [ebp]|
Due to the CPU design, after the syscall execution resumes at a fixed address, which under linux is defined at boot to be somewhere in the vdso.
The kernel restores esp to the value ebp had during sysenter, and jumps to the following code :
pop ebp pop edx pop ecx ret
This means that after the syscall, the situation is:
|final value||values at sysenter|
|eax||syscall return value|
|eip||dword ptr [ebp+12]|
|ecx||dword ptr [ebp+8]|
|edx||dword ptr [ebp+4]|
|ebp||dword ptr [ebp]|
mov eax, __NR_write mov ebx, 1 mov ecx, string_label mov edx, string_length push syscall_ret sub esp, 12 mov ebp, esp sysenter ud2 syscall_ret: ret
I'm not sure how all this would work in the event of a sys_restart.
Also note that ebp must point to valid memory, even if the syscall does not return nor uses stack arguments (e.g. __NR_exit)
This instruction is specific to AMD CPUs.
I was not able to test this one, which I believe to be similar to sysenter, except that syscall saves its return address, so the kernel resumes execution right after the syscall instruction instead of the fixed vdso address.
The correct way to make a syscall under linux is to use the vdso trampoline, that the kernel will initialize with the correct opcode sequence for your CPU.
The ABI is the same as the int 80h one.
Note that the glibc loader is responsible for setting up gs:10h, the kernel will *not* do that on its own. The dynamic loader ld-linux.so initializes this pointer using set_thread_area() with the vdso base address found in the auxiliary vector at the process entrypoint.
mov eax, __NR_write mov ebx, 1 mov ecx, string_label mov edx, string_length call gs:[10h] ret
Linux kernel - x64 (aka X86_64, amd64)
Both AMD and Intel use the syscall instruction.
Execution resumes after the syscall instruction, with the return value in the rax register.
rcx and r11 values are not preserved across the syscall, all others are.
mov rax, __NR_write mov rdi, 1 mov rsi, string_label mov rdx, string_length syscall ret
On kernels compiled with the 'CONFIG_IA32_EMULATION' feature, X64 code can call legacy 32-bit syscalls using int 80h.
The ABI is the same as for x86 (arg 0 in ebx, ...)
Note that this mode can not reference memory above 0xffffffff, and that the syscall number stored in eax is the X86 one.
It is possible to use the sysenter instruction in X64 binaries, but I dont know the ABI here. The kernel seems to segfault every time, and I did not investigate more.
To be continued !