System calls

7 Jun 2013

System calls is the way user processes communicate to the kernel. Look at the following program, for example.

#include <stdio.h>

int main(int argc, char **argv)
{
    printf("Hello, world!");

    return 0;
}

When you call the program, even before it is started, the shell makes a couple of system calls such as fork() and exec(). The program itself then makes several more system calls before the write() and exit() system calls represented by the two lines in the code.

System calls can be performed in several ways, but one of the most common is through a special software interrupt with the int instruction. For example, linux and most unix-like hobby kernels I've studied use int 0x80. That's also what I chose to use in my kernel.

Next is the problem of passing data. The simplest way is using registers, and that's what most projects seem to use. For this, I chose a combination of a single register and the processes own stack.

Sample system call

Let's look at how read() would be implemented. I've not actually implemented it in my kernel yet, but here's how it would work.

User side

First the definition in the c library:

int read(int file, char *ptr, int len)
{
    return _syscall_read(file, ptr, len);
}

Simply a wrapper for an assembly function:

[global _syscall_read]
_syscall_read:
    mov eax, SYSCALL_READ
    int 0x80
    mov [syscall_error], edx
    ret

This function puts an identifier for the system call in the eax register and then execute the system call interrupt.

Note: Here I return the error code through register edx. In the actual code at this point, I used the register ebx. I should have looked up Calling Conventions more carefully.

Of course, this can be simplified with a macro to

[global _syscall_read]
DEF_SYSCALL(read, SYSCALL_READ)

Kernel side

In the kernel, the system call is caught by the following function:

registers_t *syscall_handler(registers_t *r)
{
    if(syscall_handlers[r->eax])
        r = syscall_handlers[r->eax](r);
    else
        r->edx = ERR_NOSYSCALL;

    return r;
}

If the system call is registered correctly in the kernel (through the macro KREG_SYSCALL(read, SYSCALL_READ)), this will pass everything onto the following function:

KDEF_SYSCALL(read, r)
{
    process_stack stack = init_pstack();

    r->eax = read((int)stack[0], (char *)stack[1], (int)stack[2]);
r->edx = errno;

    return r;
}

The init_pstack() macro expands to (unitptr_t *)(r->useresp + 0x4) and this lets us read the arguments passed to the system call from where they are pushed on call.

Then the read() function has the same definition as the library version.

int read(int file, char *ptr, int len)
{
    ...
}

Spoiler alert: Keeping a version of read() (and in fact every syscall function) inside the kernel will turn out to have some really cool advantages...

This works for c compiled with the cdecl calling convention. For other languages or calling conventions, the asm functions will have to be adjusted.

Git

The methods described in this post has been implemented in git commit 8a26e26163.

Comments

comments powered by Disqus
© 2012 Thomas Lovén - @thomasloven - GitHub