7 Jun 2013
System calls is the way user processes communicate to the kernel. Look at the following program, for example.
#include <stdio.h> int main(int argc, char **argv) { printf("Hello, world!"); return 0; }
When you call the program, even before it is started, the shell makes a
couple of system calls such as fork()
and exec()
. The program itself
then makes several more system calls before the write()
and exit()
system calls represented by the two lines in the code.
System calls can be performed in several ways, but one of the most
common is through a special software interrupt with the int
instruction. For example, linux and most unix-like hobby kernels I've
studied use int 0x80
. That's also what I chose to use in my kernel.
Next is the problem of passing data. The simplest way is using registers, and that's what most projects seem to use. For this, I chose a combination of a single register and the processes own stack.
Let's look at how read()
would be implemented. I've not actually
implemented it in my kernel yet, but here's how it would work.
First the definition in the c library:
int read(int file, char *ptr, int len) { return _syscall_read(file, ptr, len); }
Simply a wrapper for an assembly function:
[global _syscall_read] _syscall_read: mov eax, SYSCALL_READ int 0x80 mov [syscall_error], edx ret
This function puts an identifier for the system call in the eax
register and then execute the system call interrupt.
Note: Here I return the error code through register
edx
. In the actual code at this point, I used the
register ebx
. I should have looked up Calling
Conventions more carefully.
Of course, this can be simplified with a macro to
[global _syscall_read] DEF_SYSCALL(read, SYSCALL_READ)
In the kernel, the system call is caught by the following function:
registers_t *syscall_handler(registers_t *r) { if(syscall_handlers[r->eax]) r = syscall_handlers[r->eax](r); else r->edx = ERR_NOSYSCALL; return r; }
If the system call is registered correctly in the kernel (through the
macro KREG_SYSCALL(read, SYSCALL_READ)
), this will pass everything
onto the following function:
KDEF_SYSCALL(read, r) { process_stack stack = init_pstack(); r->eax = read((int)stack[0], (char *)stack[1], (int)stack[2]); r->edx = errno; return r; }
The init_pstack()
macro expands to (unitptr_t *)(r->useresp + 0x4)
and this lets us read the arguments passed to the system call from where
they are pushed on call.
Then the read()
function has the same definition as the library version.
int read(int file, char *ptr, int len) { ... }
Spoiler alert: Keeping a version of read()
(and in fact every
syscall function) inside the kernel will turn out to have some really
cool advantages...
This works for c compiled with the cdecl
calling convention. For other
languages or calling conventions, the asm functions will have to be
adjusted.
The methods described in this post has been implemented in git commit 8a26e26163.