Data Representation

Herein you will find some of my thoughts about cartographic data representation (meaning data structures, not visual representation)

Background

Forces at Work

Fast

The data must be rendered extremely quickly. We want to be able to pan around maps with ease, maps should load and display snappily, and changes made should be reflected immediately.

Small

The data should be as concise as possible. Map data can get big, and we want people to be able to edit as large a map as possible.

Faulting

Data should be faultable. It should not be necessary to load the entire map into main memory in order to edit and view it. Data should be faulted on-demand. Ideally, data could be faulted out also when it ceases to be necessary.

Types of Data

We need two broad sorts of data.

Topographic data: We need topographical data, which is very large, and stores altitude points for the landscape.

Cartographic data: We also need point data, line data, area data, and volume data (the latter, I hope we don't need much) for things like cities, points of interest, political regions, etc. This will be (relatively) small, but should still be as concise, quickly renderable, and faultable as possible.

Limits, Sizes, and Resolutions

Since we're going to be thinking about sizes and limits, we should probably do some research on what sorts of measurements are reasonable.

Earth Diameter: 7,926 miles (12,756 km)
Earth Circumference: 24,902 miles (40,076 km)
Tallest Mountain: 29,029 feet (5 miles)
Earth's Effective Atmosphere: 10 miles (16 km)
Moon Diameter: 1/4 that of earth
MaxInt: 2,147,483,647
MaxLong: 9,223,372,036,854,775,808 (wow!)

5,280 feet in a mile

A mapping program that mapped to a resolution of one foot would be more than anybody could ever ask for. We might do better than this, but if altitudes were measured using Java ints, we could have mountains that were 2147483647 feet high, or 406,720 miles high. That's enough for anybody: it lets us have mountains or chasms that are 80 times the height of the earth's. Height in feet using a signed int is enough for altitude, even for fantasy mapping software.

For latitude and longitude, however, things are different. The earth is 24,902 miles around it's middle, and about that around the other way (maybe slightly less). If we used a pair of ints for coordinates, we would be able to map a planet with 406,720 miles squared area, which is about 20 times the size of the earth in each dimension, so about 400 times as much surface area.

This wouldn't let us map Phade, 'cuz Phade's too big. But we could map a particular continent. If we used longs, we would dramatically increase our coverage. If we used longs in both dimensions, we would have more coverage with no tricks than any mapping software has any right to.

Should probably use longs then, unless there's some truly dramatic real performance penalty to it.

Additional Considerations

Metadata

Cartographic data needs to be associated with arbitrary metadata: groups, names, various things that one can query on, type (forest, swamp, biome, political boundary), etc.

Projections

A lot of cartography is obsessed with the idea of map projections. There's a bunch of math behind this, but basically, you can't draw a flat map of a round globe. In fantasy mapping software, this is likely to be much worse, as you might have all sorts of wacky shapes of planets.

As a result, we're forced to confront the idea of map projections ourselves.

It is unclear what to do about it. We certainly do not want to deal with it constantly, we mostly want to edit a rectangular grid of space, and be done with it. Many people will like this the best anyway, and it's the right thing to do for flat planets.

What are the consequences of map projections? Distance calculation is screwed up -- if you want to make a 40km diameter hole in the earth, it will look bigger, or distorted, in some areas of the map.

Is it worth worrying about? It's a dramatic complication, and the tool would be fun and exciting without it. But it's also the sort of thing that if you don't think about from the beginning, you may in fact be screwed by it.

Dimensionality

Coordinates can be 2D (on the surface, like a city), 3D (above or below the ground), or 4D (specified at a point in time, but I think we won't go there...)

Topology

Adaptive Quadtrees: Introduction

My original thoughts were to have a tree of grids, each node being 16x16 elements square, with that many data points. Doing research into similar techniques brought me up against the standard for doing this sort of thing: quadtrees.

The best introduction I've seen yet is a wonderful article by Thatcher Ulrich about using adaptive quadtrees for continuous LOD terrain meshing.

It turns out that this is quite a common problem in the world of game programming (like flight simulators and car race games) and 3d terrain visualizations for GIS systems, and that there's been a bunch of research on these sorts of techniques.

The basic idea is that you have a tree of 3x3 grids of dots. Each node is split in half vertically and horizontally, and in the 4 squares thus created, four subtrees are rooted.

Note that the points in the four corners of a node are given by the parent node, so each node actually only needs to store a plus-shaped arrangement of dots.

           x....O....x    
           .    |    .                                       
           .    |    .    O - point in this quadtree node
           O----O----O    x - point specified by parent node
           .    |    .
           .    |    .
           x....O....x

Note that in adjacent quadtree nodes, edge vertices are shared. This is a concious decision: some redundant data in exchange for more efficient rendering. This will create some (minor) difficulties when using this as a data structure for editing.

The quadtree thus specified defines a bunch of triangles -- 8 triangles per node, actually. This is because there are implicit triangle edges along each edge, and from the center to each of the edge points, and to each of the corner points.

Disabled Vertices and Subtrees

When rendering a very zoomed-out version of the data, or when animating 3d views of the system, rendering all of these triangles can make the system very slow.

There is a technique for reducing the number of triangles substantially without affecting display quality substantially.

Any vertex can be disabled. This means that there is no such vertex, and that any triangles involving it are coalesced.

In addition, an entire subtree can be disabled, which means that it contributes no triangles to the rendering.

A vertex is disabled if it is not depended on by anybody (see literature), and if the distance from the true value specified to the value that would be linearly interpolated is less than some error value.

A subtree can be disabled if it is not depended on, is not the root, and if its most extreme value is not outside the error metric of the linearly interpolated plane that it would be replaced with.

If a subtree is absent, it is always disabled.

Faulting

Nobody actually seems to do faulting, but everybody seems to think that it's a good idea and certainly possible. The idea seems to be that a subtree can be left out of memory as long as its extremes are known. Then, if rendering (or editing) would ever display the subtree, it could be faulted in from disk.

The literature seems to think that a major difficulty with this would be that this would affect framerates, which we care (relatively) little about, being mostly an editor, and not a dynamic fly-througher.

More specifics

For more details, see the article, and its references

Code could be based on the demo code that is shipped along with the wonderful article.

This is mostly a done deal.

Cartography

Types of Cartographical Data

Cartographical data is different. There are a few major types of data that describe features other than altitude.

Point Data

Points are straightforward. You need an X,Y position for sure, and an altitude (above or below the current ground level) for some point data (hovering objects, underground objects).

A triple of longs is all you need (or can have, for that matter).

Line Data

Lines are more tricky. A series (at least two) of points makes a line, but there are spline curves, smoothed lines, and other sorts of lines that we may or may not choose to support.

For attractive maps, smoothed lines are often essential.

Area Data

Areas are trickier yet. There are several types of areas you might want.

...plus perhaps more.

Volume Data

Volumetric data is confusing, hard to implement, and rarely needed. I'm thinking this is a chopping-block feature, unless something comes up for which it is vitally necessary.

Metadata

Cartographic data must be associated with metadata that can be queried using some sort of query interface.

Types of Metadata

Some sorts of data are: links to external documents describing the phenomenon, the name, the identifier (some unique ID), the type of data (like city, road, biome, etc), etc.

Logical versus physical structures

I'm thinking that it might in fact be the other way around, mostly. We have a bunch of things like cities and roads (what we're talking about here as metadata), and one of their many properties is their location on the map (like a point, line, area, or whatever).

Queries

Very commonly you will want to do queries for various cartographical features, like for layering effects, show/hide features, as well as just user queries (bring me to this city, how far is it from this city to that, etc). Thus, there has to be some sort of indices of this information that are maintained.