Data Format Design For Data-Driven Games
In this first article of two I will talk about some issues I've encountered while working on data formats for the Nordenfelt engine. First I'm going to explain the meaning of "data-driven design" (DDD). After that I will show you some aspects of file formatting for games or programs in general. I'm going to use only files as storage medium here. Data can also be stored in databases or sent over networks but that's not part of this topic.
What Is "Data-Driven Design"?
A quick look on Wikipedia shows that there is hardly information about what data-driven design really is. The meaning derived from the term itself says "design according to the program's data". In games data is stuff like properties of units (e.g. hit points or position), system configuration (e.g. screen resolution) or more complex information like AI scripts. Another definition has a more detailed definition for DDD. It identifies data as program flow controlling information. Scripts are the best examples here. They are external definitions of program parts. But aren't scripts code as well as the compiled main program which executes the scripts? Aren't hit points controlling the lifespan of a unit? Images or sounds can be compiled/linked into the program itself. We can define everything in our C/C++/Java/C#/Delphi/whatever source files. Why do we need DDD at all?
Simple answer: development speed and flexibility.
Compiling/linking every time we change a sprite in a game would be very time-consuming. Games have huge amounts of assets so that would slow down development heavily. The delay of "linking" assets into the game by loading them after the program started makes this step obsolete.
Another reason is that integrating assets into executable binaries is programmers work. Artists would be bound to the availability of a programmer for testing the changed assets in the game. DDD decouples artists and programmers here.
But it's not just about artistic resources. Writing code for artificial intelligence for example is often done by trial and error. Extracting the code from the baked executable into script files streamlines the development process to.
The conclusion is that DDD is a separation of program flow and appearance (graphics, sounds, etc.) from the program itself for providing easier modification access. Therefore the program has to load data from additional files. Loading information from files, or more generally from chunks of bytes, is known as unmarshalling or deserialization. I'm going to use the word serialization as umbrella term for conversion between game data and bytes.
The following paragraphs will show you some issues of designing file formats and writing serialization code.
Format
The first thing you have to do in DDD: know your data. Think about what information will be stored outside the program. Common external game files are images for textures or sprites, sounds, meshes or configuration files. Most of these types have sophisticated formats. Images can be BMPs, JPGs, PNGs, TGAs or DDS. Sounds may come as MP3, OGG or WAV. 3DS, X, OBJ or DAE are meshes. You can easily find engines, SDKs or libraries which support these formats.
The more interesting formats are your proprietary ones. How will you format data for levels, units, configuration or scores? A simple solution would be dumping everything as plain text into a TXT file. This is straight forward but may become painful when parsing the file to reload it into the game. E.g. lists may need to store their size before their entries. Otherwise the parser doesn't know how many entries it should read from the file. So you have to explicitly save some meta data for correct reloading.
Formats like XML help you designing your file formats. The syntax streamlines format design decisions and is well known by developers. There are some good and free parsers available, e.g. Xerces (C++ and Java) or TinyXml (C++).
Further formats are JSON or YAML which are more specific to serialization. The next paragraphs will show you some guidelines how to choose the right format for your project. And by right format I mean XML ;)
Serialization
This functionality of a program covers converting game data to bytes as well as the other way around. As mentioned above there are some parsers which do this job for you. Programming languages like Java or C# natively support serialization. Other languages have libraries which deliver this functionality. One of the best serialization libraries for my "mother tongue" C++ is BOOST's serialization. It supports multiple file formats, reference management, extensibility and needs very small code for saving/loading your data. But be warned: there are some drawbacks like humble DLL support, bloated documentation (yes, there can be too much documentation!) and deep high-level C++ template code which is not for the fainthearted. As long as you keep it simple there won't be a problem. Just don't wake the beast by doing odd tricks!
Readability
Nordenfelt's prototype had many files which contained just a sequence of values. Without the value semantics I had to switch over to the parsing code and check the meaning again and again. I've already written about this here. Finally the lack of semantics made me go nuts. So I switched the files over to XML. That sounds simple but was a load of work: use XML from the very beginning.
Now semantics and values are combined. Further Visual Studio highlights the XML structure, auto-completes my input and has these nice -/+ signs aside the element blocks for hiding and collapsing them. All these things combined make file reading a breeze.
Validity
JSON, YAML and XML support schema validation. Schema files can be used for defining structure and data types in your game data files. The data files can be checked automatically with the corresponding schema files. This speeds up bug hunting a lot.
Schema validation provides no semantic checks (correct me if I'm wrong). Parsers can detect wrong structures with schema files but they can not detect wrong values, e.g. a screen resolution of 50000000x13. It is good advice to check values explicitly while loading. Missing checks are loopholes for hackers. I remember surfing to a website where an image was declared to be 1 million x 2 million in size. The browser (IE, what else) freezed as well as the whole workstation. That became a running gag in our office :)
Versioning
I've learned this lesson while rewriting about 50 files for a single new attribute in an XML element:
Include a version number in your file format.
When a project grows it also gets more files to handle. First you will have a few files lurking beneath your program. Over time they become dozens of files, like busy bunnies :) Changing a file format will force you to either detect older versions during parsing or update each affected file. Game development is an iterative process (well, for most of us) where changes are commonplace. You can save yourself much work by adding a version number for the main elements like character, weapon or level. This advice is even more important for shipped products and network protocols. Compatibility is king.
Preview Part Two
OK, what's coming in part two? I will talk about some optimization techniques like baking and security improvements like obfuscation. Interoperability of data will be touched to.
Phew, that became a really long post. I'm finished now - the Mortal Kombat way of finished :)
Time for having a ball in the local pub!
Cheers,
Thomas
Circle Collision Detection BeautyTwo months ago I wrote an article about collision detection optimization for Nordenfelt. At that time the basic problem emerged from the fact that the engine used rectangles for collision detection. In this article I want to explain why I dropped the rectangle approach. Simple Collision ShapesThere are two simple solutions for collision detection: Using rectangles or circles. Testing rectangles for intersection is as simple as the following code shows: bool Intersect(Rectangle& a, Rectangle& b) Collision detection between circles is a no-brainer to: bool Intersect(Circle& a, Circle& b) The calculation of the distance length is rather expensive due to the need for a square root (length = Squareroot(x*x + y*y)). Therefore we use a common trick and compare the squared values (length*length = x*x + y*y): bool Intersect(Circle& a, Circle& b) Detecting points inside a rectangle or circle are special cases where the point is a rectangle/circle with an area equal zero: bool Intersect(Rectangle& r, Vector2D& v) I don't know how good triangles would fit into collision detection. I did not investigate this shape. The simplicity of rectangle and circle was enough for me. A Wrong ChoiceBack in the days when I planned to use pixel art for Nordenfelt I chose rectangles as physical representation atoms. Airplanes, turrets, ships and other war machines have angular shapes. So rectangles were the first choice. The only drawback were rotations. Only axis-aligned rectangles provide simple collision detection routines. But pixel art is not rotatable anyway (without ugly distortions) so that was no problem. After some graphical experiments I dropped the pleasing idea of pixel art - sob :( - and switched over to rendered 3D models and higher screen resolutions. Now rotations became possible and were included right away. This created the demand for rectangle rotation which was solved by including polygons for physical representation. Complex algorithms and some trickery were needed to make them usable for the engine. Circles have a huge advantage over rectangles and polygons: Rotation is faster (rotation of one center compared to rotating every polygon corner) and does not change the shape, related to axis alignment. I thought about using circles instead of polygons. Laziness stopped me pondering. Profound RefactoringIntegration of the new power-ups (coming in Nordenfelt 0.2) created a refactoring demand for state machines, animations and sprite management. Why not kill two birds with one stone and replace the awkward polygon shapes with circles? So I dived deep into the "physics" code again. After four days of refactoring the results manifested as hierarchical circle structures. These structures can easily be scaled, translated and rotated. The hierarchic struture speeds up collision tests. When the envelope circle does not intersect neither do the inner circles:
BonusAnother advantage of circles over polygons and rectangles appeared while coding: Line intersection is a breeze and can easily be extended with line thickness: float GetSquaredLength(Vector2D& v) float GetDotProduct(Vector2D& v1, Vector2D& v2) This intersection would be much more complex, slower and uglier for rectangles and polygons. Line intersection is needed e.g. for shots which need to check their trace. Otherwise they may fly through an object without detecting the collision:
ConclusionI don't know if there are better collision detection solutions out there. I bet there are. Finally hierarchical circle structures are a beautiful, fast and easy to understand method for body representations. It also applies to 3D very well. Simply add axis Z in the vectors and it works in three dimensions. Try this with polygons :)
Cheers, Ship Design, Release IntervalCall For Your OpinionNordenfelt is approaching version 0.2. An important point on the TODO list is fixing the player ship graphics. Therefore I did some draft sketches and posted them on the Nordenfelt blog. It would be great if you could give me some feedback which designs you like the most. Please drop me your comments here. Shortening Release IntervalThe release of Nordenfelt 0.1 was four weeks ago. Version 0.2 will be out next week. In my opinion a release interval of roughly five weeks is far too long. It's like driving a car with both eyes closed and taking a look only once a minute. Continuously watching the street would be the best case here but that's impossible, at least for me. An attentive look every few seconds is enough. So I'm going to shorten this interval length. One week would be nice which is possible in most cases. Let's see how this will work. There may be releases without any visible changes but that's the nature of such short dev circles.
Cheers, The Pimpl IdiomCurrently I'm refactoring the input system of Nordenfelt. The reason for this is to enable mapping input from mouse, keyboard and joystick to any actions in the game. E.g. it should be possible to control the ship by WASD for movement and SHIFT for fire as well as by cursor keys for movement and any mouse button for fire. Mouse button functionality should be swappable for left-handers, etc. Briefly: players will be allowed to set their input scheme as they please. After two days of refactoring one design guideline proved itself as well suited for refactoring: the pimpl idiom. The pimpl idiom stands for "private implementation idiom". Modern object-oriented languages like Java or C# favour public implementation: the implementation details are intermixed with the interface declaration. There are design patterns for hiding details like interfaces, proxies or factories. Nevertheless, modern languages expose their class internals like private methods or properties in their interface. Firstly it clutters the interface with needless "information" and secondly creates unnecessary dependencies. More dependencies result in longer recompile times. Recompilation is not really an issue for modern languages. Unfortunately C++ has complex features like templates or argument dependent lookup which make compiling cpp files a time consuming task. An easy way to avoid dependencies is writing pure interfaces which stay the same, regardless of their internals. Lets say we have the following class declaration in C++ header Character.hpp: #include <rectangle.hpp> There are two hidden methods and many properties in this declaration. They are not available to foreign operations. So why should we expose them here at all? The above mentioned dependencies are a further reason for hiding the guts. What if we replace the animation/body pairs and combine them in a new class Action? The declaration of Character will change and every cpp file including Character.hpp has to be recompiled. Characters are a common unit in games so many classes would be affected. Let's use the pimpl idiom to hide the internals: class Character Oh yeah, that's much better! No private detail is visible in the declaration (well, nearly) and all includes are gone. Only the class Pimpl is there. Pimpl will contain all the private stuff we kicked out. The forward declaration of Pimpl does not reveal anything about its structure, neither private nor public. As long as we keep construction, changes and deletion of pimpl in the cpp file we have no need for its interface anyway. Let's take a look on the implementation in Character.cpp: #include <animation.hpp> As you can see: a pure interface is clean, has less dependencies and therefore is more refactoring-friendly. The pimpl idiom does not have additional overhead compared to writing common class declarations and it will boost your development cycle in the long run.
Code well, New Nordenfelt Blog is OnlineI just switched www.nordenfelt-thegame.com over to its blog interface. Everything concerning Nordenfelt will be published over there from now on. Feel free to subscribe to the email newsletter or the blog's RSS feed to keep in touch with the development process. See you there!
Cheers, |



