Devlog 03: File formats

Reading time: 8 min

Hello World!

Today we’re talking about how we’re going to package (some) of the data we’ll need to make the game run. Obviously we don’t want to hard code all the locations, scenes, regions, items, scripts, etc directly into the code.

Blitz Basic 2.1 is an old language and lacks a lot of the nicer utilities of modern languages like dynamic arrays. As a matter of fact, the language looks and feels like it grew organically to fit a need while maintaining backward compatibility with previous basics, which gives us some funkyness like having two types of arrays the get accessed differently and have different starting indices depending on where they were defined… Oof…

One of the features of Blitz Basic is that you can define your own types that are binary compatible with C structs (I’ll be using C++ notation). So these two code snippets are equivalent:

    struct neon_header
    {
        char        magic[4];       // Must be "NEON"
        uint16_t    major_version;
        uint16_t    minor_version;
    };
    NEWTYPE .neon_header
        magic.b[4]      ; .b is of byte
        major_version.w ; .w is for word
        minor_version.w
    END NEWTYPE

Here I’ve defined the beginning of what a header could look like. The first 4 bytes of th file spell “NEON” which lets us know we’re reading the right file format and the next for form a MAJOR.MINOR version number for the file, in case we decide to extended it in the future.

Defining our data structures

Let’s begin by defining the data structures we know we need. These won’t be their final forms as we will be adding more to them in the future as we work out new features, we’re concerned with doing the minimum we need in order to achieve our goal: Load a scene from a location and show hover text when the mouse is over a region. I’ll be prefixing all the structure with “neon” to keep things tidy and avoid any naming collision. Let’s start with the most “primitive” of the types we need and work our way upward to the most complex.

Strings

If we’re going to show hover text, we’re going to need strings. While C-style strings are null terminated, strings in Blitz Basic allocate an extra 4 bytes before the start of the string that denotes the size.

struct neon_string
{
    uint32_t length;
    char body[1];
}

The body[1] member isn’t really a character array with one item, it’s the first item in a variable sized char array. We don’t need to make an equivalent type in Blitz because this is exactly how their strings work. We would simply create a new string, size it to the amount we need, then copy the the relevant memory into the buffer, like so:

    ; Assume the pointer in the file is just before our neon_string
    string_size.l = 0   ; .l denotes long
    ; Read in the size of the string
    ReadMem file_handle, &string_size, SizeOf .l

    ; Allocate a string with enough space
    MaxLen the_string$ = string_size
    ReadMem file_handle, &the_string, string_size

Almost none of the strings we’ll need will actually have to be converted to real Blitz strings because we’re going to process them ourselves in custom text printing routines. The notable and immediate exception is reading background images which will require a call to the Blitz built-in function LoadBitmap.

Regions

For the moment, regions are going to be nothing more than fancy rectangles. Eventually we’ll assign scripts and even images to them. For now we’ll keep them as rectangles with a description. Here we get our first concern then: how do store the string in a region? Two options that come to mind are storing them as part of the region structure or storing all string in a separate string table and reference them here by ID. Each has their pros and cons. Storing the strings in the region makes it easy to get the string data but now each region is variable in size and iterating through an array of them will be a pain. On the other hand, if we store an id, there is an added level of indirection but our structures remain constant size.

Let’s go with the latter approach:

struct neon_region
{
    uint16_t x, y;
    uint16_t width, height;
    uint16_t description_id;
}
NEWTYPE .neon_region
    x.w
    y.w
    width.w
    height.w
    decription_id.w
END NEWTYPE

Pretty simple and uncomplicated, let’s enjoy that while it lasts.

Scenes

Scene also contain a description string and a background image. Both of these can be referenced as id (more on background later). That leaves the varying number of regions in each scene. Again we have multiple options: we could embed the regions in the scene, shouldn’t be to hard since the regions are a constant size, but we’d be passing the whole “variably sized structure array” issue upwards to the locations. We could set an arbitrary limit on the number of regions which would make the scenes a constant size but both waste memory and add a constrain on our scenes. We could also do the same thing we did for strings and create a region table. Provided we assert that the regions a scene’s references are contiguous, we could just store the index into the region table and the number of regions the scene has.

struct neon_scene
{
    uint16_t name_id;
    uint16_t background_id;

    uint16_t region_count;
    uint16_t first_region_id;
}
NEWTYPE .neon_scene
    name_id.w
    background_id.w

    region_count.w
    first_region_id.w
END NEWTYPE

Location

Just like everything else under it, there are many way to write location data and since we’ve gone with the index/table approach, I don’t see a reason to stop now.

Once again, if we assume a scene table, we can do the same trick we did for regions.

struct neon_location
{
    uint16_t name_id;
    
    uint16_t background_count;
    uint16_t first_background_id;

    uint16_t scene_count;
    uint16_t first_scene_id;
}
NEWTYPE .neon_location
    name_id.w
    background_count.w
    first_background_id.w
    scene_count
    first_scene_id
END NEWTYPE

String table

So far we’ve managed to postpone dealing with variable sized objects (other than the string definition above). If the strings are variably sized and occupy a contiguous chunk of memory, how do we turn an index into memory location?

One approach would be for the index to be a pointer to the string memory, this would make accessing the strings pretty easy but come with the disadvantage that we would have to update every reference to a string in every object once we load the neon file since we can’t know ahead of time where the string will end up in memory.

Indirection is here to save us again! As we load each string and allocate space for it, we can store its starting pointer in a new array of longs. The string id used by the other data structures references this array from which we get the string’s memory location. Since we control how the strings are written to disk, we know the order the strings will be in so the indices in the string table will always be correct.

File layout

At this point our neon files should look like this:

NameSize
Header8
Locations10 x number of locations
Scenes8 x number of scenes
StringVariable

Note: The string table isn’t stored in the neon file, it’s built at runtime on load.

Big or Little Endian?

Finally the last thing we have to worry about is the endian-ness of the architecture the game is running on. The Amiga uses the Motorola 68000 family of processors which are big endian but Intel processors are little endian.

Endiannes refers to how memory is read by the chip. Big-endian systems read the most significant byte first while little-endian systems read the least significant byte first.

For example, casting these bytes in memory 11 22 33 44 as a long on a big-endian system would give you the hex value 0x11223344 whereas a little-endian system would give you 0x44332211.

This is a problem when we want to write our tools to run on modern OSes running on an Intel chip but want the game to run on the big-endian Motorola-based Amiga.

We need to do a conversion either when we write the data or when we read it. Since it’s always better to do everything you can at “compile” time rather then runtime, we’ll output the file in big-endian format.

Natually, this complicates matters a little. If we didn’t have to worry, we could write out all the locations like so:

// locations is vector of neon_locations
// neonfile is an output file stream
neonfile.write(locations.data(), locations.size() * sizeof(neon_location));

Easy.

However that will store the data in little-endian format. Since file streams don’t convert, we’ll need to write out each field ourselves, after we swap the bytes.

We’re only dealing with bytes, words, and longs and only the last two need to be swapped, we need two function:

uint32_t byteswap(uint32_t value) noexcept
{
    auto data = reinterpret_cast<char*>(&value);
    return (data[3] << 24) | (data[2] << 16) | (data[1] << 8) | (data[0]);

}

uint16_t byteswap(uint16_t value) noexcept
{
    auto data = reinterpret_cast<char*>(&value);
    return (data[1] << 8) | (data[0]);
}

We don’t need more than that since we’re not going to be swapping 64 bit values. Sadly, serialization now looks like this:

for (auto const& location : locations)
{
    neonfile.write((char*)&byteswap(location.name_id),              sizeof(uint16_t));
    neonfile.write((char*)&byteswap(location.background_count),     sizeof(uint16_t));
    neonfile.write((char*)&byteswap(location.first_background_id),  sizeof(uint16_t));
    neonfile.write((char*)&byteswap(location.scene_count),          sizeof(uint16_t));
    neonfile.write((char*)&byteswap(location.first_scene_id),       sizeof(uint16_t));
}

Ugh… Better in the editor than in the game itself, I suppose.

Next time we’ll be looking at the beginnings of an editor that’ll allow us to write out neon files since we definitely don’t want to be doing this by hand!

See you next game!