Thomas Lovén - Virtual File System 2

Once again, several months have passed since I wrote anything here. I also worked very little on the kernel during this time, though. So no losses there, at least. Instead I've been busy with a special side project that I will write a bit about later and a lot of personal stuff that you probably don't care about if you're reading those posts. Unless you're my wife, that is (I got married! :) ). Then again, if you're my wife you're probably not reading those posts anyway, so I guess the entire point of this paragraph was to tell you I got married(!).

Ahem...

The virtual filesystem

So, as I said in my last post I recently rewrote my VFS layer. Although, I must admit I'm not quite satisfied with it yet...

...

Know what? Let's rewrite it again!

What I want

First of all: What do I want from the virtual filesystem?

The VFS should be an abstraction layer for the files used by the kernel and user processes. The files in this case could be files on a disk, the disk hardware itself, pipes, files stored in ram, read-only files generated on-the-fly by the kernel, network connections(?) etc.

Further, I want the VFS to be independent of any disk file system, e.g. the VFS shouldn't have user-group-other read-write-execute tuples just because it's designed with ext2 in mind. It might have those tuples - I haven't decided yet - but it it does it won't be because ext2 uses them. Nor will the VFS be designed with ext2 or any other disk file system in mind.

The VFS should offer the functions

open() // Open or create a file
close() // Close a file
read() // Read data from opened file
write() // Write data to opened file
move() // Move a file or directory
link() // Put a file or directory in path tree
unlink() // Remove a file or directory from path tree
stat() // Get more info about a file or directory
isatty() // Returns true if the file is a terminal
mkdir() // Create a directory
readdir() // Get a directory entry
finddir() // Find a file by name from a directory

for all files and directories regardless of their underlying device or driver.

The file system should have a tree structure with a single root.

Filesystems should be mountable at any path in the tree provided the path points to a directory or non-existing node within an existing parent directory. I.e. if the empty directory /foo, a filesystem can be mounted to /foo or /foo/bar but not to foo/baz/bar. If /foo is not empty, it should a filesystem can still be mounted to it, unless it is already a mountpoint. All contents of /foo are then hidden untill the filesystem is unmounted. Mounting filesystems to non-directories should not be possible. Mounted filesystems does not have to have a root directory, but can consist of a single file.

Those are just some rules for mounting that I thought of pretty much arbitrarily. I might change my mind later, but this will do for now.

Implementation

The most important data structure of the VFS is the inode. Each file used by the kernel gets an inode which keeps track of some important information of it, such as what type of file it is or which driver controls it.

Some inodes live in the mount tree, which keeps track of all mounted filesystems. Looking up a file by absolute path always starts in the mount tree and is performed by a function called namei (name to inode).

Example:

A user process wants the file /mnt/floppy/foo/bar.txt
namei starts at the VFS root /
namei searches for /mnt in the VFS tree, finds it and gets it's inode.
namei searches for /mnt/floppy in the VFS tree, finds it and gets it's inode.
namei searches for /mnt/floppy/foo, which is not found.
namei asks the /mnt/floppy inode for /mnt/floppy/foo and gets it's inode.
namei asks the /mnt/floppy/foo inode for /mnt/floppy/foo/bar.txt and gets it's inode.
namei returns the inode for /mnt/floppy/foo/bar.txt

A good starting point for the inode structure might be some pointers to allow it to be placed in a tree, then.

struct vfs_node_st;
typedef vfs_node_t * INODE;

typedef struct vfs_node_st
{
    char name[VFS_NAME_SZ];
    INODE parent;
    INODE child;
    INODE older, younger;
    uint32_t type;
} vfs_node_t;

This does waste a bit of memory, since most inodes that are used by the system won't be in the VFS tree, but four size_t isn't that much, and it's far from the worst memory thief in this kernel anyway.

The type field in the VFS node struct is used by the VFS tree to make sure stuff is only mounted onto directories.

I typedef INODE as a pointer to keep the code a bit cleaner and easier to maintain.

Then, we need a way to keep track of the driver, i.e. the functions called to access the file. To do this, I define a new struct:

typedef struct vfs_driver_st
{
    uint32_t (*open)(INODE, uint32_t);
    uint32_t (*close)(INODE);
    uint32_t (*read)(INODE, void *, uint32_t, uint32_t);
    uint32_t (*write)(INODE, void *, uint32_t, uint32_t);
    uint32_t (*link)(INODE, INODE, const char *);
    uint32_t (*unlink)(INODE, const char *);
    uint32_t (*stat)(INODE, struct stat *st);
    uint32_t (*isatty)(INODE);
    uint32_t (*mkdir)(INODE, const char *);
    dirent_t *(*readdir)(INODE, uint32_t);
    INODE (*finddir)(INODE, const char *);
} vfs_driver_t;

and add vfs_driver_t *d to the inode struct. I also added a length value, a void pointer for arbitrary data used by the drivers and a flags value - also for use by the drivers. The inode struct now looks like this:

typedef struct vfs_node_st
{
    char name[VFS_NAME_SZ];
    void *parent;
    void *child;
    void *older, *younger;
    uint32_t type;
    vfs_driver_t *d;
    void *data;
    uint32_t flags;
    uint32_t length;
}

Vfs functions

Next, I create some wrapper functions to call the driver functions.

uint32_t vfs_open(INODE ino, uint32_t mode)
{
    if(ino->d->open)
        return ino->d->open(ino, mode);
    return 0;
}

and similar for all functions except readdir and finddir which contain code to handle . and .. for mount roots.

dirent_t *vfs_readdir(INODE ino, uint32_t num)
{
    if(ino->type & FS_MOUNT)
    {
        if(num == 0)
        {
            dirent_t *ret = calloc(1, sizeof(dirent_t));
            ret->ino = ino;
            strcpy(ret->name, ".");
            return ret;
        } else if(num == 1) {
            dirent_t *ret = calloc(1, sizeof(dirent_t));
            ret->ino = ino->parent;
            strcpy(ret->name, "..");
            return ret;
        }
    }
    if(ino->d->readdir)
    return ino->d->readdir(ino, num);
    return 0;
}

INODE vfs_finddir(INODE ino, const char *name)
{
    if(ino->type & FS_MOUNT)
    {
        if(!strcmp(name, "."))
        {
            return ino;
        } else if(!strcmp(name, "..")) {
            return ino->parent;
        }
    }
    if(ino->d->finddir)
    return ino->d->finddir(ino, name);
    if(ino->d->readdir)
    {
    // Backup solution
    int num = 0;
    dirent_t *de;
    while(1)
    {
        de = vfs_readdir(ino, num);
        if(!de)
            return 0;
        if(!strcmp(name, de->name))
            break;
        free(de->name);
        free(de);
        num++;
    }
    INODE ret = de->ino;
    free(de->name);
    free(de);
    return ret;
    }
    return 0;
}

Finally, I needed a function for mounting filesystems in the mount tree and the namei function, which can actually be combined since they both need to traverse the entire path.

Warning: Pointer-pointers ahead!

First: a function for traversing the mount tree as far as possible

INODE vfs_find_root(char **path)
{
    // Find closest point in mount tree
    INODE current = vfs_root;
    INODE mount = current;
    char *name;
    while((name = strsep(path, "/")))
    {
        current = current->child;
        while(current)
        {
            if(!strcmp(current->name, name))
            {
            mount = current;
            break;
            }
            current = current->olderyounger;
        }
        if(!current)
        {
            if(*path)
            {
                *path = *path - 1;
                *path[0] = '/';
            }
            *path = name;
            break;
        }
    }

    return (INODE)mount;
}

Pretty self explanatory. No? Well, strsep is a library function which picks out one part of the path at a time and also advances the path pointer. Then, for each part, we look through the children of the node we're at for one with the right name. If it is not found, the path pointer is backed up one step and the last node we found is returned.

The namei/mount function then uses this as a starting point:

INODE vfs_namei_mount(const char *path, INODE root)
{
    char *npath = strdup(path);
    char *pth = &npath[1];
    // Find closest point in mount tree
    INODE current = vfs_find_root(&pth);
    char *name;
    while(current && (name = strsep(&pth, "/")))
    {
        // Go through the path
        INODE next = vfs_finddir(current, name);

        if(root)
        {
        // If we want to mount someting
            if(!next)
            {
                // Create last part of path if it doesn't exist
                // But only if it is the last part.
                if(pth)
                    return 0;
                next = calloc(1, sizeof(vfs_node_t));
                strcpy(next->name, name);
                next->type = FS_DIRECTORY;
            }

            // Add path to mount tree
            next->parent = current;
            next->older = current->child;
            current->child = next;
        }

        if(!next)
            return 0;
        if(!current->parent)
            free(current);

        current = next;
    }
    free(npath);

    if(root && current->type == FS_DIRECTORY)
    {
        // Replace node in mount tree
        root->parent = current->parent;
        if(root->parent->child == current)
            root->parent->child = root;
        root->older = current->older;
        if(root->older)
            root->older->younger = current;
        root->younger = current->younger;
        if(root->younger)
            root->younger->older = current;
        strcpy(root->name, current->name);
        root->type = FS_MOUNT;
        if(current == vfs_root)
            vfs_root = root;

        free(current);
    }
    return current;
}

Note how pth is changed by vfs_find_root() to only contain the part of the path that wasn't found. After that, we ask each node for the next until we reach the target or a dead end. If the dead end is at the very last part of the path (/foo/bar in the example above) and we want to mount something a new node is created. Otherwise the function returns. Also, if the goal is to mount something, each part of the path is added to the mount tree. Finally, the mounting is performed - if requested and the final inode is returned.

I also made two simple wrappers for this function:

INODE vfs_namei(const char *path)
{
    return vfs_namei_mount(path, 0);
}

INODE vfs_mount(const char *path, INODE root)
{
    return vfs_namei_mount(path, root);
}

And finally, a function for unmounting file systems:

INODE vfs_umount(const char *path)
{
    char *npath = strdup(path);
    char *pth = &npath[1];
    INODE ino = vfs_find_root(&pth);
    if(!ino || pth)
    {
        free(npath);
        return 0;
    }
    if(ino->child)
    {
        free(npath);
        return 0;
    } else {
        // Remove node from mount tree
        if(ino->parent->child == ino)
            ino->parent->child = ino->older;
        if(ino->younger)
            ino->younger->older = ino->older;
        if(ino->older)
            ino->older->younger = ino->younger;
        free(npath);
        return ino;
    }
}

And that's it for now. A lot of code this time, but that's because I don't want to push my changes to github quite yet, so I can't give you a commit link.

Next time, I'll look at some file related system calls.

ThomasLovén.com

Virtual File System 2

The virtual filesystem

What I want

Implementation

Vfs functions

Comments