Thomas Lovén - Thread Stacks

Since the x86 architecture has relatively few processor registers, a programmer may need additional space to store temporary values. For most compilers and languages, this space is the stack. C, for example, (gcc and clang at least) uses the stack to store local variables, function arguments and return addresses. In other words, the stack comes in use every time there is a function call.

The common way a function call is handled by a c compiler is this:

Push each argument to the stack (in reverse order)
Execute the CALL instruction (which pushes the address of the next

instruction to the stack and jumps to the callee)

The callee does the following:

Push the base pointer to the stack
Sets the base pointer to the current stack pointer
Subtracts the stack pointer to reserve place for local variables.
Do its thing
Increase the stack pointer to free the space used by local variables.
Pop the base pointer from stack.
Execute the RET instruction (which puts the return value in EAX and

jumps to the position at the top of the stack.

While the callee is doing its thing it now has access to all the pushed arguments at addresses (ebp + 8) and forwards and all local variables at addresses up to ebp. The return address is reachable at (ebp + 4) if you'd ever want that.

This convention makes it really easy to have functions which takes an undefined number of arguments, like printf does.

Stacks in context switching

It also makes for really simple context switching. Since the return address is stored on the stack, if you were to switch stacks inside a function, when you return, you'll be somewhere else. This is a common way of making usermode threads. Ponder the following:

void switch_thread()
{
    push_all_registers();
    switch_stack_pointer();
    pop_all_registers();
    return;
}

void a()
{
    while(1)
    {
        do_something();
        switch_thread();
    }
}

void b()
{
    while(1)
    {
        do_something_else();
        switch_thread();
    }
}

Imagine two threads - A and B running, A runs a() and B runs b(). Each has a stack somewhere in memory, and A is currently running. The top of the stacks looks like:

                           +-----------------------+
                           |switch_stack_pointer RA|
                           |all registers          |
+----------ESP----------+  |switch_thread RA       |
|a RA                   |  |b RA                   |
|          ...          |  |          ...          |

where RA means Return Address and ESP is where the stack pointer is currently pointing. As execution of A continues, the processor will do_something() and then call switch_thread()...

                           +-----------------------+
                           |switch_stack_pointer RA|
+----------ESP----------+  |all registers          |
|switch_thread RA       |  |switch_thread RA       |
|a RA                   |  |b RA                   |
|          ...          |  |          ...          |

switch_thread() pushes all registers to the stack and calls switch_stack_pointer()

+----------ESP----------+  +-----------------------+
|switch_stack_pointer RA|  |switch_stack_pointer RA|
|all registers          |  |all registers          |
|switch_thread RA       |  |switch_thread RA       |
|a RA                   |  |b RA                   |
|          ...          |  |          ...          |

switch_stack_pointer() performs some scheduling to find out which thread is to run next, and then switches the stack pointer over to the top of B's stack.

+-----------------------+  +----------ESP----------+
|switch_stack_pointer RA|  |switch_stack_pointer RA|
|all registers          |  |all registers          |
|switch_thread RA       |  |switch_thread RA       |
|a RA                   |  |b RA                   |
|          ...          |  |          ...          |

The processor keeps on executing code, and switch_stack_pointer() soon returns

+-----------------------+
|switch_stack_pointer RA|  +----------ESP----------+
|all registers          |  |all registers          |
|switch_thread RA       |  |switch_thread RA       |
|a RA                   |  |b RA                   |
|          ...          |  |          ...          |

switch_thread() pops all registers and returns...

+-----------------------+
|switch_stack_pointer RA|
|all registers          |
|switch_thread RA       |  +----------ESP----------+
|a RA                   |  |b RA                   |
|          ...          |  |          ...          |

... and we're now in b() with all registers of B loaded.

Stacks in the kernel

When an interrupt or exception happens in user mode, a new stack is loaded from the tss and (usually) all registers are pushed onto it before the kernel starts the Interrupt Service Routine.

Wait... all registers are pushed onto it? I like the sound of that. That's, like, half the work of changing threads, right? Right!

If you've been following a kernel development tutorial (like James Molloys or Brandon Friesens) you probably have something like this to handle interrupts:

int_stub:
    pusha

    xor eax, eax
    mov ax, ds
    push eax

    mov eax, 0x10
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax

    call int_handler

    pop eax
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax

    popa

    add esp, 8

    iret

void int_handler(registers_t r)
{
    do_stuff();
}

In fact, if you've been following one of those tutorials, you probably have the above code twice, for some reason...

Anyway. This would take care of both pushing and poping all registers, and with only a small modification, it becomes very easy to switch the stacks too...

int_stub:
    pusha

    xor eax, eax
    mov ax, ds
    push eax

    mov eax 0x10
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax

    push esp ;Pass stack pointer to int_handler
    call int_handler
    mov esp, eax ;int_handler returns a new stack pointer

    pop eax
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax

    popa

    add esp, 8

    iret

registers_t *int_handler(registers_t *r)
{
    do_stuff();
    r = get_next_thread(r);
    return r;
}

This gives a pointer to the threads registers as input to the ISR and expect a pointer to some registers in return. They may or may not be the same.

Keeping track of the stacks

The saved registers are a large part of what defines each thread, but there are actually a few things more that are needed.

First of all, the kernel may want some extra information associated with each thread, such as scheduling information and a list of signal handlers.

Sometimes a thread in user mode will need help from the kernel which it cannot offer immediately. The thread may for example issue a read request to a file that's on a drive which has some spin-up time before it can be read. The kernel may then switch to another thread while the disk spins up. Therefore it's a good idea to have a separate kernel stack space for each thread.

With some thought, those three things can be easily combined into a single data structure. So let's think about it for a while.

While the thread is running we want some information stored somewhere in kernel space about it.

+-----------------------+
|thread information     |
+-----------------------+

Then, when an interrupt or syscall happens, a new stack is loaded and some stuff is pushed onto it. If we want this near our thread information it will have to go right before it, since the stack grows backwards.

+-----------------------+
|thread registers       |
|thread information     |
+-----------------------+

Finally, we want the kernel mode stack. Well... the stack pointer is right at the start of the registers now, so why not just continue the stack from there?

+-----------------------+
|   ...                 |
|kernel mode stack      |
|thread registers       |
|thread information     |
+-----------------------+

Setting this up

To set this up, the thread information structure has to be set up something like:

struct thread_info_struct
{
    uint8_t stack_space[KERNEL_STACK_SIZE];
    registers_t r;
    struct thread_data_struct thread_data;
} my_thread_info;

When the thread is running in user mode, the TSS should be set up in such a way that the stack pointer loaded at an interrupt points to the end of the registers, i.e. the beginning of the thread data.

TSS.esp0 = &my_thread_info.thread_data;

And that's really all there is to it. Unbelievable, really, how many years it took for me to figure this out.

In the process, I've found inspiration in Rhombus by Nick Johnson and linux.

Some considerations

In order to do the actual switching of threads, I implemented a special syscall which can be called only from kernel mode.

Let's say a user mode program calls yield(). This performs a syscall in the form of an interrupt instruction INT 0x80 and thus we jump into the kernel.

The kernel performs some housekeeping and selects a new thread to run. It then performs the special switching interrupt INT 0x82.

Since we're already in kernel mode, no new stack is loaded but the registers are pushed onto the old one. The top of the kernel stack will then contain a registers_t structure and a pointer to it is saved in a kernel_stack variable in the thread_data portion of the thread information structure.

Next, the thread information structure of the new thread is read and the kernel_stack pointer from it is returned to the int_stub as above. The IRET instruction brings us back to wherever we were before (probably in kernel mode, but could as well be user mode). If the new thread was swapped out while in kernel mode, it will carry on from wherever it was and eventually return to user mode.

This way of handling kernel stacks also makes for really clean nesting of interrupts.

Usage

This method has been implemented in git commit 756852fc66

A warning

I recently learned - the hard way - that the clang compiler does not use this calling convention for functions which do not in turn call other functions. I.e

int double_integer(int a)
{
    return 2*a;
}

int main(int argc, char **argv)
{
    double_integer(5);
}

If this code is compiled with clang double_integer will (in some cases) not push ebp to stack.

This severely hinders many debuggers and should be considered a bug in my oppinion.

ThomasLovén.com

Thread Stacks