Thomas Lovén - TAR Filesystem

It's finally time to implement an actuall filesystem, and all the hard work with the VFS framework will pay off.

For my first filesystem, I chose the tape archive format - also known as a tarball. It's a bit like what James Molloy did but with support for directories.

Tar file format

The tar file format - as the name implies - was designed for storing files on magnetic tape, which can only be read and written sequentially. Therefore, there is no index of the files in the archive, but each file is preceded by a data block in human-readable format.

In the POSIX standard of 1988, the data block has the following format:

typedef struct
{
    unsigned char name[100];
    unsigned char mode[8];
    unsigned char uid[8];
    unsigned char gid[8];
    unsigned char size[12];
    unsigned char mtime[12];
    unsigned char checksum[8];
    unsigned char type[1];
    unsigned char linkname[100];
    unsigned char tar_indicator[6];
    unsigned char tar_version[2];
    unsigned char owner[32];
    unsigned char group[32];
    unsigned char device_major[8];
    unsigned char device_minor[8];
    unsigned char prefix[155];
}__attribute__((packed)) tar_header_t;

The data block is 500 bytes long, but the data starts 512 (one standard disk sector) bytes after the start of the data block The data itself is also padded so that the next data block starts at a 512 byte boundary.

For historical reasons all numerical values (i.e. uid, gid, size, mtime, checksum, device_major and device_minor) are in octal ascii format (ascii '0' to '7') padded with leading zeroes and terminated with a null character (0x00) or space (0x20).

Reshaping the tree

The TAR format is very good for storing data on tape where read and write operations are sequential, but it makes less sense once the files are loaded into ram and you might want to access them randomly.

Instead, I reshape the file list into a file tree:

tree_t *build_tar_tree(tar_header_t *tar)
{
    ...
    while(tar->name[0])
    {
        tartree_add_node(tree, tar, (char *)&tar->name);
        uint32_t size;
        sscanf((char *)&tar->size, "%o", &size);
        tar = (tar_header_t *)((size_t)tar + size + 512);
        if((size_t)tar % 512)
            tar = (tar_header_t *)((uint32_t)tar + 512 - ((uint32_t)tar%512))
    }
    ...
}

void tartree_add_node(tree_t *tree, tar_header_t *tar, char *path)
{
    ...
    tree_node_t *node = tree->root;
    char *p;
    for(p = strtok(path, "/"); p; p = strtok(NULL, "/"))
    {
        int found = 0;
        list_t *l;
        for_each_in_list(&node->children)
        {
            ...
            if(!strcmp(entry->name, p))
            {
                found = 1;
                node = tn;
                break;
            }
        }
        if(!found)
        {
            ...
            tarfs_entry_t *n = malloc(sizeof(tar_entry_t));
            n->name = strdup(p);
            n->tar = tar;
            ...
            tree_make_child(node, new);
            node = new;
        }
    }
}

Note that this assumes that the files and directories of the tar archive are in top-down order, e.g. that /bin is before /bin/echo, otherwise strange things happen. Deterministic and expected things, mind you, but strange.

Mounting the filesystem

The tarfs driver makes use of the data field in the vfs node to store the tar tree.

INODE tarfs_init(tar_header_t *tar)
{
    vfs_node_t *node = calloc(1, sizeof(vfs_node_t));
    strcpy(node->name, "tarfs");
    node->d = &tarfs_driver;
    node->type = FS_DIRECTORY;

    tree_t *tar_tree = build_tar_tree(tar);
    node->data = tar_tree->root;
    free(tar_tree);

    return node;
}

I then add the following to my kernel initialization function:

tar_header_t *tarfs_location = assert_higher((tar_header_t *)mods[0].mod_start);
vfs_mount("/", tarfs_init(tarfs_location));

where mods is the multiboot modules table as loaded by grub.

The tarfs driver

Finally, the tarfs driver functions. For now I only need to implement read() and finddir().

INODE tar_finddir(INODE dir, const char *name)
{
    tree_node_t *tn = (tree_node_t *)dir->data;
    list_t *l;
    for_each_in_list(&tn->children, l)
    {
        tree_node_t *cn = list_entry(l, tree_node_t, siblings);
        ...
        if(!strcmp(entry->name, name)
        {
            INODE node = calloc(1, sizeof(vfs_node_t));
            strcpy(node->name, entry->name);
            node->d = &tarfs_driver;
            node->data = (void *)cn;
            sscanf((char *)&entry->tar->size, "%o", &node->length);
            if(entry->tar->type[0] == TAR_TYPE_DIR)
                node->type = FS_DIRECTORY;
            else
                node->type = FS_FILE;
            return node;
        }
    }
    return 0;
}

Finddir allocates space for a new inode for each file that is searched for. It's up to the caller to free the inode when it's done with it. This should be true for all finddir functions, but I think I've missed it at a few places... Someday I will clean up all my memory leaks.

Anyway, finddir also finds the right node in the tarfs tree and puts it in the data field of the inode.

uint32_t read_tar(INODE node, void *buffer, uint32_t size, uint32_t offset)
{
    tree_node_t *tn = (tree_node_t *)node->data;
    tarfs_entry_t *te  = (tarfs_entry_t *)tn->item;
    tar_header_t *tar = te->tar;

    uint32_t tsz;
    sscanf((char *)&tar->size, "%o", &tsz);
    if(offset > tsz) return EOF;

    if((size + offset) > tsz) size = tsz - offset;

    offset = offset + (uint32_t)tar + 512;
    memcpy(buffer, (void *)offset, size);
    if(size == tsz - offset)
        ((char *)buffer)[size] = EOF;

    return size;
}

Using it

Now, all I need to do in order to make read-only files accessible to my kernel is put them in a directory and run

$ tar -cf tarfs.tar tarfs/*

and then make sure tarfs.tar is loaded as a multiboot module by qemu

$ qemu-system-i386 -kernel kernel -initrd tarfs.tar

or by adding a line to the grub menu.lst file.

module /boot/tarfs.tar

Final note

While writing this post, I got back to polishing this code and added readdir for the tarfs. I also added buffering of inodes, so that if the same file is asked for twice, the same inode is returned. This meant I had to add a flush function to the driver structure.

I also rewrote a bit of both vfs_finddir and vfs_readdir. This in turn forced me to go back on a design decision. The VFS can now only mount new devices onto already existing paths (excluding /).

Just in case I'd write something in coming posts that might confuse you...

Git

I finally decided to push upstream, so here's the latest commit: 843520405e. It still contains some stuff I haven't written about, though.

ThomasLovén.com

TAR Filesystem