Cluster computing with Linux & Beowulf
S. Dale Morrey
sdalemorrey at gmail.com
Sat Feb 22 04:22:46 MST 2014
What you are describing is scatter/gather or a map/reduce approach.
This is handy for computing HUGE datasets but has a severe lag in
realtime. Hadoop is the new hawtness on this front. Study it and you'll
pretty much know all you need to know about the subject.
If you're really thinking of an MMORPG, MMOFPS or MMO-anything there are
actually much easier ways of scaling the load.
Note that what follows here is based on my experience designing an MMO
engine that is currently still closed source.
The most important thing when working with something that needs to be
realtime or near realtime is to filter your dataset and only send what is
strictly necessary to ensure adequate game play response characteristics.
We have several tricks to accomplish this.
Understand that there are 3 sets of coordinates.
Model Space, World Space, Visual Field
Model Space refers to the internal dimensions of the model. Most models
have a 0 point in the center of the cube that defines the collision box of
World Space is a grid that represents the locations of all objects in the
world. It is rare that any single server should know the complete world
space. Instead each server is responsible for it's own "local world
space". However we still call local worldspace, worldspace for the
purposes of this discussion.
Visual field is a cube that defines the maximum visual field of the
player. This is calculated as though the worldspace is completely empty or
actually free of obstructions. In otherwords no occlusion culling occurs.
Visual Field is centered (usually) in the models head in what we call a
camera, thus this is also called "Camera Space". The 0 point is the
player's own eye or the player's screen as the case may be. Negative z is
behind the player positive z is in front the player. To limit the z a
"fog" is calculated and applied. Things closer up receive more frequent
and more accurate updates than objects further out from the player. This
fog is spherical and has "zones". This is one reason why most clients have
a "fog" you can set and when you increase your fog you'll notice that
framerates increase accordingly. This is because there is less information
coming in and thus less to render. It is literally limiting the z (some
engines call it y, but we defined camera z as the eye line and model z as
the height line).
The client then draws a progressively thicker fog to obscure the fact that
you really just aren't getting updates. As an exercise download a 3D game,
set the fog to maximum distance and watch with delight as objects further
away appear to "get stuck" running or whatever. Again the server doesn't
bother to update distant objects as frequently.
Now as to how you increase the player count without increasing the server
Firstly static content such as worldmaps, models, textures etc are
delivered ahead of time to the player by a package update mechanism or
content distribution network. World of Warcraft uses bittorrent for this,
we used subversion (don't ask, it wasn't my decision).
Next the layout of all objects in worldspace were sent out at "zone in"
time. This was a list of all objects in the scenegraph and their proximate
locations in 3d space.
You ended up with something like "label,model, x,y,z, a,b,c" for each
object, with abc representing the rotation, pitch and yaw of the object
described from it's center as defined by the model's own "model space"
coordinates. Once this is sent, a camera was placed in the head of the
The client calculates and sends nothing more than a viewport which is a
series of x,y,z coordinates projected into world space. This defines what
the camera sees.
The server projects a cube in world space for each client starting with the
max visual extent and boxing a cube around the player, then calculating a
fog sphere accordingly.
This cube & sphere are updated everytime the player sends a positional
It is important to note that this is done against something called a
scenegraph which is a literal tree (or grid) of nodes (fixed points in 3d
space) and contains a list of things that are near those nodes.
The server sends updates to each client based upon what is happening only
within the players visual cube and the frequency of those updates is
determined by the player's "fog sphere".
These updates come through as a series of deltas, and once per second (or
some other semi-infrequent timeframe) a new baseline set is pushed for the
node and all adjacent nodes.
This allows the updates to be tiny. This is compacted even further through
bitpacking because all fields are fixed width, and bitpacking allows the
stream to get even smaller.
You can squeeze even more performance by pushing realtime event data (gun
shot, spell cast, animation start/stop, etc) out of band on UDP packets to
all clients at once using anycasting. The down side to this is if the
packet gets dropped the event gets missed. We had several times where
people would be running in place because they would run to a new spot and
the packet to stop the run animation never made it through. This would be
corrected by having them do something else. Eventually we moved all
animations client side and tied them to specific events i.e. player moves x
ft in n seconds play run animation, player stops moving, stop run
animation.. Nevertheless it was humorous and I'm not sure we ever totally
worked all the bugs out of that particular problem.
Because everything in the world is calculated against a scene graph, the
world itself can be sharded or segmented. High traffic areas might require
several servers. Whereas it might take several boxes to serve a single
high traffic area, perhaps even 1 or 2 per node in a really high traffic
area, multiple low traffic areas may easily be served by a single box.
Because each server is in direct communication with every other adjacent
server in the scenegraph, as the player transitions from zone to zone (a
zone here is a collection of nodes on a scenegraph) the servers hand off
the computational duties as the player reaches a border area. In everquest
this hand off was the source of the "zoning please wait!" message. In more
modern MMORPGs the handoff is much smoother because servers handling
adjacent areas actually communicate more with one another as the player
approaches a border area.
The final consideration is that computation and communication are handled
by different servers. The server that maintains the network link to the
client is not the same server that is handling the computations on the
scenegraph (and it's probably not even the chat server). Usally the client
communication server is a box situated at a network edge as close as
possible to the player. Whereas the world servers (the things actually
computing the scenegraph) are likely centered somewhere else entirely. In
our case each zone was comprised of a world server and an ai/rules server.
Whereas the db server, auth server and a chat servers were global.
This means a single edge server can maintain connections to tens of
thousands of clients without breaking a sweat. The worldservers only need
to concern themselves with calculating changes to the scenegraph and thus
can handle millions of simultaneous changes without failing and they don't
incur the communications overhead of tens of thousands of TCP connections.
AI & rules are handled on their own servers and basically functioned as a
single client controlling multiple characters which allowed for very
efficient updates to MoBs. The limitation to this was that MoBs couldn't
chase you across multiple zones, but we were able to explain that in game
as them getting tired of the chase and turning around to go home. We did
occasionally allow AI to talk to one another which produced some very epic
cross world battles, but that's a story for another day.
Anyways, suffice it to say that if later on it is decided a given
worldserver is too crowded, the scenegraph can be updated by the game
designers to include more nodes. These nodes can be picked up by any box
that has the spare capacity to handle the calculations. But the edge
servers are the ones primarily responsible for communication with the
client, not the world servers. The edge server's primary limitation is
I hope that's helpful. Designing MMO engines really gave me a whole new
insight into the right way to scale a large computing task. I recommend
everyone build one at least once in their life. :D
On Sat, Feb 22, 2014 at 2:42 AM, Dan Egli <ddavidegli at gmail.com> wrote:
> Hey pluggers,
> I've a couple questions on the Linux cluster computing software, Beowulf (I
> think that's what it was called). Anyone know if there's a maximum number
> of nodes that a cluster can support? And can the cluster master add new
> nodes on the fly so to speak?
> My understanding of cluster computing is that you can send a job to the
> master server, and the master server will in turn break the job between all
> the nodes in the cluster, then combine the results. I think it's not all
> that dissimilar to multi-processor specific tasks. Is this correct?
> Here's a scenario I could envision. Please tell me if there's a serious
> flaw in the idea. I hope it explains what I'm thinking of.
> Program X is intended to handle quite literally up to hundreds of millions
> of simultaneous TCP connections across a number of network interfaces (the
> exact number is unimportant). So as program X runs, the overall load begins
> to rise as X has to do more and more work for each of the connections. That
> much is a given. Take, for instance, an MMORPG. The more people login to
> the game and interact with the world, the more work is involved for the
> actual server. Same idea here. My thought was that if X was built to
> support clustering and it was run on a Beowulf cluster, that would slow the
> initial growth of the load. Then, supposing that the dynamic addition is
> possible, I can watch the system and if the system load average on the
> cluster gets too large (due to overwhelming the CPU, not the internet
> pipe), someone could reduce without the end user noticing anything simply
> by adding new machines. Is this right? Or do I have a major flaw in my
> If Beowulf isn't the software I'm thinking of, what is? And what other
> issues, other than perhaps memory usage, would be involved? And speaking of
> memory, is there a way to distribute the memory at all? I know that there
> are 8GB dimms available easily these days, and a high-end motherboard can
> usually support 8 dimms. So that's 64GB of ram. But for hundreds of
> millions of connections, I could see the possibility of exceeding memory
> capacity being a potential issue. I suppose I could find bigger dimms (I've
> heard rumors of 16GB and even 32GB dimms, but never seen them) but that's
> going to seriously jack up the server costs if I do it that way. So if
> there's a way to distribute the memory usage along with distributing the
> CPU work, that would be of great assistance.
> Thanks in advance, folks!
> --- Dan
> PLUG: http://plug.org, #utah on irc.freenode.net
> Unsubscribe: http://plug.org/mailman/options/plug
> Don't fear the penguin.
More information about the PLUG