11 Dec 2013
Once again, several months have passed since I wrote anything here. I also worked very little on the kernel during this time, though. So no losses there, at least. Instead I've been busy with a special side project that I will write a bit about later and a lot of personal stuff that you probably don't care about if you're reading those posts. Unless you're my wife, that is (I got married! :) ). Then again, if you're my wife you're probably not reading those posts anyway, so I guess the entire point of this paragraph was to tell you I got married(!).
Ahem...
So, as I said in my last post I recently rewrote my VFS layer. Although, I must admit I'm not quite satisfied with it yet...
...
Know what? Let's rewrite it again!
First of all: What do I want from the virtual filesystem?
The VFS should be an abstraction layer for the files used by the kernel and user processes. The files in this case could be files on a disk, the disk hardware itself, pipes, files stored in ram, read-only files generated on-the-fly by the kernel, network connections(?) etc.
Further, I want the VFS to be independent of any disk file system, e.g. the VFS shouldn't have user-group-other read-write-execute tuples just because it's designed with ext2 in mind. It might have those tuples - I haven't decided yet - but it it does it won't be because ext2 uses them. Nor will the VFS be designed with ext2 or any other disk file system in mind.
The VFS should offer the functions
open() // Open or create a file close() // Close a file read() // Read data from opened file write() // Write data to opened file move() // Move a file or directory link() // Put a file or directory in path tree unlink() // Remove a file or directory from path tree stat() // Get more info about a file or directory isatty() // Returns true if the file is a terminal mkdir() // Create a directory readdir() // Get a directory entry finddir() // Find a file by name from a directory
for all files and directories regardless of their underlying device or driver.
The file system should have a tree structure with a single root.
Filesystems should be mountable at any path in the tree provided the
path points to a directory or non-existing node within an existing
parent directory. I.e. if the empty directory /foo
, a filesystem
can be mounted to /foo
or /foo/bar
but not to foo/baz/bar
. If
/foo
is not empty, it should a filesystem can still be mounted to
it, unless it is already a mountpoint. All contents of /foo
are then
hidden untill the filesystem is unmounted. Mounting filesystems to
non-directories should not be possible. Mounted filesystems does not
have to have a root directory, but can consist of a single file.
Those are just some rules for mounting that I thought of pretty much arbitrarily. I might change my mind later, but this will do for now.
The most important data structure of the VFS is the inode. Each file used by the kernel gets an inode which keeps track of some important information of it, such as what type of file it is or which driver controls it.
Some inodes live in the mount tree, which keeps track of all mounted filesystems. Looking up a file by absolute path always starts in the mount tree and is performed by a function called namei (name to inode).
Example:
/mnt/floppy/foo/bar.txt
namei
starts at the VFS root /namei
searches for /mnt
in the VFS tree, finds it and gets it's inode.namei
searches for /mnt/floppy
in the VFS tree, finds it and gets it's inode.namei
searches for /mnt/floppy/foo
, which is not found.namei
asks the /mnt/floppy
inode for /mnt/floppy/foo
and gets it's inode.namei
asks the /mnt/floppy/foo
inode for /mnt/floppy/foo/bar.txt
and gets it's inode.namei
returns the inode for /mnt/floppy/foo/bar.txt
A good starting point for the inode structure might be some pointers to allow it to be placed in a tree, then.
struct vfs_node_st; typedef vfs_node_t * INODE; typedef struct vfs_node_st { char name[VFS_NAME_SZ]; INODE parent; INODE child; INODE older, younger; uint32_t type; } vfs_node_t;
This does waste a bit of memory, since most inodes that are used by the
system won't be in the VFS tree, but four size_t
isn't that much,
and it's far from the worst memory thief in this kernel anyway.
The type
field in the VFS node struct is used by the VFS tree to make
sure stuff is only mounted onto directories.
I typedef INODE
as a pointer to keep the code a bit cleaner and easier
to maintain.
Then, we need a way to keep track of the driver, i.e. the functions called to access the file. To do this, I define a new struct:
typedef struct vfs_driver_st { uint32_t (*open)(INODE, uint32_t); uint32_t (*close)(INODE); uint32_t (*read)(INODE, void *, uint32_t, uint32_t); uint32_t (*write)(INODE, void *, uint32_t, uint32_t); uint32_t (*link)(INODE, INODE, const char *); uint32_t (*unlink)(INODE, const char *); uint32_t (*stat)(INODE, struct stat *st); uint32_t (*isatty)(INODE); uint32_t (*mkdir)(INODE, const char *); dirent_t *(*readdir)(INODE, uint32_t); INODE (*finddir)(INODE, const char *); } vfs_driver_t;
and add vfs_driver_t *d
to the inode struct. I also added a length
value, a void pointer for arbitrary data used by the drivers and a flags
value - also for use by the drivers. The
inode struct now looks like this:
typedef struct vfs_node_st { char name[VFS_NAME_SZ]; void *parent; void *child; void *older, *younger; uint32_t type; vfs_driver_t *d; void *data; uint32_t flags; uint32_t length; }
Next, I create some wrapper functions to call the driver functions.
uint32_t vfs_open(INODE ino, uint32_t mode) { if(ino->d->open) return ino->d->open(ino, mode); return 0; }
and similar for all functions except readdir
and finddir
which
contain code to handle .
and ..
for mount roots.
dirent_t *vfs_readdir(INODE ino, uint32_t num) { if(ino->type & FS_MOUNT) { if(num == 0) { dirent_t *ret = calloc(1, sizeof(dirent_t)); ret->ino = ino; strcpy(ret->name, "."); return ret; } else if(num == 1) { dirent_t *ret = calloc(1, sizeof(dirent_t)); ret->ino = ino->parent; strcpy(ret->name, ".."); return ret; } } if(ino->d->readdir) return ino->d->readdir(ino, num); return 0; }
INODE vfs_finddir(INODE ino, const char *name) { if(ino->type & FS_MOUNT) { if(!strcmp(name, ".")) { return ino; } else if(!strcmp(name, "..")) { return ino->parent; } } if(ino->d->finddir) return ino->d->finddir(ino, name); if(ino->d->readdir) { // Backup solution int num = 0; dirent_t *de; while(1) { de = vfs_readdir(ino, num); if(!de) return 0; if(!strcmp(name, de->name)) break; free(de->name); free(de); num++; } INODE ret = de->ino; free(de->name); free(de); return ret; } return 0; }
Finally, I needed a function for mounting filesystems in the mount tree
and the namei
function, which can actually be combined since they both
need to traverse the entire path.
Warning: Pointer-pointers ahead!
First: a function for traversing the mount tree as far as possible
INODE vfs_find_root(char **path) { // Find closest point in mount tree INODE current = vfs_root; INODE mount = current; char *name; while((name = strsep(path, "/"))) { current = current->child; while(current) { if(!strcmp(current->name, name)) { mount = current; break; } current = current->olderyounger; } if(!current) { if(*path) { *path = *path - 1; *path[0] = '/'; } *path = name; break; } } return (INODE)mount; }
Pretty self explanatory. No? Well, strsep
is a library function which
picks out one part of the path at a time and also advances the path
pointer. Then, for each part, we look through the children of the node
we're at for one with the right name. If it is not found, the path
pointer is backed up one step and the last node we found is returned.
The namei/mount function then uses this as a starting point:
INODE vfs_namei_mount(const char *path, INODE root) { char *npath = strdup(path); char *pth = &npath[1]; // Find closest point in mount tree INODE current = vfs_find_root(&pth); char *name; while(current && (name = strsep(&pth, "/"))) { // Go through the path INODE next = vfs_finddir(current, name); if(root) { // If we want to mount someting if(!next) { // Create last part of path if it doesn't exist // But only if it is the last part. if(pth) return 0; next = calloc(1, sizeof(vfs_node_t)); strcpy(next->name, name); next->type = FS_DIRECTORY; } // Add path to mount tree next->parent = current; next->older = current->child; current->child = next; } if(!next) return 0; if(!current->parent) free(current); current = next; } free(npath); if(root && current->type == FS_DIRECTORY) { // Replace node in mount tree root->parent = current->parent; if(root->parent->child == current) root->parent->child = root; root->older = current->older; if(root->older) root->older->younger = current; root->younger = current->younger; if(root->younger) root->younger->older = current; strcpy(root->name, current->name); root->type = FS_MOUNT; if(current == vfs_root) vfs_root = root; free(current); } return current; }
Note how pth
is changed by vfs_find_root()
to only contain the part
of the path that wasn't found. After that, we ask each node for the next
until we reach the target or a dead end. If the dead end is at the very
last part of the path (/foo/bar
in the example above) and we want to
mount something a new node is created. Otherwise the function returns.
Also, if the goal is to mount something, each part of the path is added
to the mount tree. Finally, the mounting is performed - if requested and
the final inode is returned.
I also made two simple wrappers for this function:
INODE vfs_namei(const char *path) { return vfs_namei_mount(path, 0); } INODE vfs_mount(const char *path, INODE root) { return vfs_namei_mount(path, root); }
And finally, a function for unmounting file systems:
INODE vfs_umount(const char *path) { char *npath = strdup(path); char *pth = &npath[1]; INODE ino = vfs_find_root(&pth); if(!ino || pth) { free(npath); return 0; } if(ino->child) { free(npath); return 0; } else { // Remove node from mount tree if(ino->parent->child == ino) ino->parent->child = ino->older; if(ino->younger) ino->younger->older = ino->older; if(ino->older) ino->older->younger = ino->younger; free(npath); return ino; } }
And that's it for now. A lot of code this time, but that's because I don't want to push my changes to github quite yet, so I can't give you a commit link.
Next time, I'll look at some file related system calls.