In this first article of two I will talk about some issues I've encountered while working on data formats for the Nordenfelt engine. First I'm going to explain the meaning of "data-driven design" (DDD). After that I will show you some aspects of file formatting for games or programs in general. I'm going to use only files as storage medium here. Data can also be stored in databases or sent over networks but that's not part of this topic.

What Is "Data-Driven Design"?

A quick look on Wikipedia shows that there is hardly information about what data-driven design really is. The meaning derived from the term itself says "design according to the program's data". In games data is stuff like properties of units (e.g. hit points or position), system configuration (e.g. screen resolution) or more complex information like AI scripts. Another definition has a more detailed definition for DDD. It identifies data as program flow controlling information. Scripts are the best examples here. They are external definitions of program parts. But aren't scripts code as well as the compiled main program which executes the scripts? Aren't hit points controlling the lifespan of a unit? Images or sounds can be compiled/linked into the program itself. We can define everything in our C/C++/Java/C#/Delphi/whatever source files. Why do we need DDD at all?

Simple answer: development speed and flexibility.

Compiling/linking every time we change a sprite in a game would be very time-consuming. Games have huge amounts of assets so that would slow down development heavily. The delay of "linking" assets into the game by loading them after the program started makes this step obsolete.

Another reason is that integrating assets into executable binaries is programmers work. Artists would be bound to the availability of a programmer for testing the changed assets in the game. DDD decouples artists and programmers here.

But it's not just about artistic resources. Writing code for artificial intelligence for example is often done by trial and error. Extracting the code from the baked executable into script files streamlines the development process to.

The conclusion is that DDD is a separation of program flow and appearance (graphics, sounds, etc.) from the program itself for providing easier modification access. Therefore the program has to load data from additional files. Loading information from files, or more generally from chunks of bytes, is known as unmarshalling or deserialization. I'm going to use the word serialization as umbrella term for conversion between game data and bytes.

The following paragraphs will show you some issues of designing file formats and writing serialization code.

Format

The first thing you have to do in DDD: know your data. Think about what information will be stored outside the program. Common external game files are images for textures or sprites, sounds, meshes or configuration files. Most of these types have sophisticated formats. Images can be BMPs, JPGs, PNGs, TGAs or DDS. Sounds may come as MP3, OGG or WAV. 3DS, X, OBJ or DAE are meshes. You can easily find engines, SDKs or libraries which support these formats.

The more interesting formats are your proprietary ones. How will you format data for levels, units, configuration or scores? A simple solution would be dumping everything as plain text into a TXT file. This is straight forward but may become painful when parsing the file to reload it into the game. E.g. lists may need to store their size before their entries. Otherwise the parser doesn't know how many entries it should read from the file. So you have to explicitly save some meta data for correct reloading.

Formats like XML help you designing your file formats. The syntax streamlines format design decisions and is well known by developers. There are some good and free parsers available, e.g. Xerces (C++ and Java) or TinyXml (C++).

Further formats are JSON or YAML which are more specific to serialization. The next paragraphs will show you some guidelines how to choose the right format for your project. And by right format I mean XML ;)

Serialization

This functionality of a program covers converting game data to bytes as well as the other way around. As mentioned above there are some parsers which do this job for you. Programming languages like Java or C# natively support serialization. Other languages have libraries which deliver this functionality. One of the best serialization libraries for my "mother tongue" C++ is BOOST's serialization. It supports multiple file formats, reference management, extensibility and needs very small code for saving/loading your data. But be warned: there are some drawbacks like humble DLL support, bloated documentation (yes, there can be too much documentation!) and deep high-level C++ template code which is not for the fainthearted. As long as you keep it simple there won't be a problem. Just don't wake the beast by doing odd tricks!

Readability

Nordenfelt's prototype had many files which contained just a sequence of values. Without the value semantics I had to switch over to the parsing code and check the meaning again and again. I've already written about this here. Finally the lack of semantics made me go nuts. So I switched the files over to XML. That sounds simple but was a load of work: use XML from the very beginning.

Now semantics and values are combined. Further Visual Studio highlights the XML structure, auto-completes my input and has these nice -/+ signs aside the element blocks for hiding and collapsing them. All these things combined make file reading a breeze.

Validity

JSON, YAML and XML support schema validation. Schema files can be used for defining structure and data types in your game data files. The data files can be checked automatically with the corresponding schema files. This speeds up bug hunting a lot.

Schema validation provides no semantic checks (correct me if I'm wrong). Parsers can detect wrong structures with schema files but they can not detect wrong values, e.g. a screen resolution of 50000000x13. It is good advice to check values explicitly while loading. Missing checks are loopholes for hackers. I remember surfing to a website where an image was declared to be 1 million x 2 million in size. The browser (IE, what else) freezed as well as the whole workstation. That became a running gag in our office :)

Versioning

I've learned this lesson while rewriting about 50 files for a single new attribute in an XML element:

Include a version number in your file format.

When a project grows it also gets more files to handle. First you will have a few files lurking beneath your program. Over time they become dozens of files, like busy bunnies :) Changing a file format will force you to either detect older versions during parsing or update each affected file. Game development is an iterative process (well, for most of us) where changes are commonplace. You can save yourself much work by adding a version number for the main elements like character, weapon or level. This advice is even more important for shipped products and network protocols. Compatibility is king.

Preview Part Two

OK, what's coming in part two? I will talk about some optimization techniques like baking and security improvements like obfuscation. Interoperability of data will be touched to.

Phew, that became a really long post. I'm finished now - the Mortal Kombat way of finished :)
Time for having a ball in the local pub!

 

Cheers,
Thomas

 

Add comment


Security code
Refresh