[Bibliography] [Bibtex Citation]
Game Usage Mining: Information Gathering for Knowledge
Discovery in Massive Multiplayer Games
Department of Computer and Information Science
Norwegian University of Science and Technology
Gisle B. Tveit
Department of Thermal Energy
Norwegian University of Science and Technology
Multiplayer games provide rich sources of information for data
mining. The primary purpose of data mining in games is to find
patterns of behavior, structure or content in order to improve the
overall gameplay, hence keeping players longer and increasing the
revenue of the game service. In this paper we define the term
game usage mining and describe it from web usage mining
perspective, with the main emphasis on data gathering. A
classification of game types from a data mining perspective is also
suggested. Based on the discussion we propose content for a
common game log format suitable for game usage mining.
Keywords: Massive Multiplayer Games, Web Usage Mining,
Multiplayer games, both on the wireless and the traditional Internet,
are excellent sources of information for data mining. We can mine
usage patterns of both human and virtual players (avatars), or
discover patterns in the games content and structure.
In this paper we compare the process of information gathering for
behavior mining in massive multiplayer games (game usage mining) to
the similar process known from web usage mining.
The main motivation for performing data mining in computer games is to
discover patterns, e.g. rules or statistics, that can possibly be used
to improve the game. Then paying players become more satisfied and
stay longer, which again increases the revenue of the game
service. This is of particular importance for wireless games, since
keeping a player is equivalent of making more money. Wireless Internet
revenue models are either proportional to money per time unit
spent by the player (e.g. WAP over GSM), or increasingly more common,
money per byte served to the player (e.g. UMTS, I-Mode and WAP
Areas and connections between areas (e.g. doors or tunnels), are
frequent building structures in games. From a web usage mining
perspective they can be seen as relatively analog to web pages and
links (i.e. URLs), respectively. Player behavior can also be detected,
including actions and speech act utterances. This makes it possible
generate game logs that resemble web logs in structure, and hence can
gain from applying web usage mining methods with only minor adaptions.
In order to have a clear vocabulary to describe data mining in
multiplayer games (from now on called game mining) we define
three main types: 1) game content mining - discovery of
patterns in multimedia or textual content in games (e.g. room layout),
2) game structure mining - discovery of structural patterns in
form of paths and connections binding the game world together
(e.g. hallways between rooms), and 3) game usage mining -
discovery of human and avatar behavior patterns. The described types
of game mining are inspired by well-known types of web mining - web
content mining, web structure mining and web usage mining
What are the similarities and differences of information
gathering for web usage mining and game usage mining?
The rest of this paper is organized as follows. Section 2 describes a
game classification schema. Section 3 describes the Game Usage Mining
concept. Section 4 compares information gathering in a web and game
context. Section 5 describes the proposed game log approach, and
finally the conclusion with future work.
From a information gathering viewpoint, games are proposed classified
according to figure 1, with examples for
Computer Game Classification
In the discrete game-state class we consider games that have a
discrete search space and a turn-based gameplay, e.g. in chinese
checkers it is not possible to do fractional (non-integer) moves of
marbles, and players have to wait until their turn. In Quake
 the search space (from a practical view)
is non-discrete, and players don't have to wait until their turn,
hence Quake belongs to the non-discrete game-state class.
In the singleplayer game class, the games only have one
simultaneous (human) player, as in case of Solitaire where the human
plays, and the computer only shuffles the cards.
The difference between the multiplayer and the massive
multiplayer classes is not absolute, but the prior covers games with
2-100 simultaneous players (or same order of magnitude), and the
latter covers games with the number of players being 2#2. An
example of a (non-discrete game-state) massive multiplayer game is
NCSoft's Lineage with approximately 110,000 simultaneous players
. We were not able to find
examples of discrete gamestate massive multiplayer games.
Another classification schema for computer games is based on genres
(e.g. action games, role-playing games, adventure games, sports game
etc.) . Genres are suited to classify
what the game is about, but not so well suited to classify the amount
and what type of data that can be gathered from the game, hence being
less useful in a data mining context.
Games can also be classified according to their network architecture
(e.g. single node, peer-to-peer, client/server or server-network)
, which is useful for describing where to
collect the data, but doesn't say anything about what type of data to
In figure 2 a mobile massive
multiplayer game setting is shown. Players interact with the Massive
Multiplayer Game Service (MMGS). The MMGS sends events about user
actions to the Game Usage Miner (GUM), if the GUM discovers patterns
of interest (e.g. player logoff probability) they get passed on to
Recommender Service (RS). When the RS receives a pattern it will give
a recommendation back to the MMGS about how to alter the player
experience. The purpose of this is to improve the gaming experience
for players. Methods used in the RS can e.g. be collaborative
filtering or case-based reasoning. Another purpose of the GUM is to
provide realtime metrics of the game, typically macro numbers such as
mood of the community (e.g. number of players likely to log off in the
Game Usage Mining Concept
In this section we compare information gathering in non-discrete
game-state massive multiplayer games with information gathering from
web browsing. The purpose of the information gathering is to enable
game usage mining and web usage mining, respectively.
For web usage mining, the primary information source is the highly
standardized web log (i.e. common or extended log format) created by
the web server. Unfortunately, there are currently no standardized
information sources that supports game usage mining in massive
User session time tsession is an important concept in web usage
mining, it is the estimate of user's active period, i.e. before
browsing to another web site or stop surfing.
In game usage mining we propose an approach for estimating player
sessions. However, if the player pays per time unit to play, it is
usually very simple to determine tsession, since the user
probably will explicitly log off instead of idling, or a least be
logged out automatically after a particular idle time of the client
(in order to avoid high bills). If the client tells the game server
that the logout was caused by timeout for a period tidle, the
accurate session time tsession can easily be determined by:
Where tstart and tstop are timestamps for the start and stop
of player session, respectively.
Logging of usage data in web usage mining is purely event driven. When
the client (browser) requests a web page or an object it causes a log
entry line (LogEntryt) to be written to the web server log. The
time resolution of the log is usually limited to whole seconds, so if
the web server receives several requests in a second, they are written
to the web log in an arbitrary order.
Game playing generate many more events than web browsing, since events
are based on real-time actions in the game (e.g driving a car, talking
to other avatars, flying on a dragon's back etc.). To avoid overload
of the game service's logging mechanism, it has to collect events in
memory for a period tcollect, then possibly prune events of less
importance and write the important events during the period
tcollect to the log. If tcollect is relatively short
(e.g. 1s or less, depending on game type), we propose that game log
entries (from possibly several players) collected in that period can
have an arbitrary order as the case for web logs. The reason for this
is to avoid costly synchronization in parallel implementations.
This section describes information attributes of the player - the
human customer of the gaming service.
The user's IP address is the simplest identification mechanism in web
usage mining. However, since many users can have the same IP address
(e.g. if surfing through a firewall or proxy gateway), the IP address
may not provide a unique user identification.
User identification can be improved by using cookies (small
information files residing at the user's browser which can be written
to and read by the web server), but cookies have gotten a lot of
critics due to privacy issues. This has resulted in that many users
disable the cookie support in their browsers.
Player identification (playerid) is in games is usually much
simpler, since the player is becoming uniquely identified when
entering the game (e.g. by username or by mobile phone number) and
stays identified the whole playing session.
For wireless games, the position playerpos of the (human) player
may be possible to detect (e.g. using GPS positions obtained from the
player's wireless device). With the playerpos, interesting
attributes such as playerspeed and playerdirection can be
estimated (during a session), this can possibly add useful knowledge
in a game usage mining context.
On the web it is possible to get user system characteristics such as
browser version and operating system from the extended web log format.
Games on the other hand, can possibly obtain more interesting data
about users, information might include the player's real-life name,
address, e-mail, connecting IP address etc (based on manual input by
the player when registering).
This section describes information attributes of the avatar - the
virtual character that the player controls in the game.
A web browsing user generates actions in form of HTTP requests, the
most frequent request-type is ``GET'' used to retrieve pages or
objects, but also ``POST'' and ``HEAD'' occur relatively frequently.
Action types (avataraction) in games are usually much more
plentiful and represented as verbs, examples include ``jump'',
``fire'', ``walk'' and ``say'' etc.
Parameters of an action in web browsing are usually an encoded path
requesting a particular HTML file, application/script or multimedia
object. This path may contain its own parameters, e.g. if the path
requests a script that sets some variables.
Examples of action parameters (avataraction parameters) in games
include: natural language or slang phrases (corresponding to action
``say''), power or distance (corresponding to action ``fire'').
On the web users can browse between web pages, objects and scripts
(discrete movement) by clicking on URLs. However, if items are cached
(e.g. by a reverse or standard proxy server, a content provisioning
service or the browser cache), browsing will not be detected and
logged by the web server. To improve the quality of web
logs. java-applets that send messages from the browser to the web
server can be applied, other attempts to improve path quality include
various heuristic methods
In games it is easier to determine the complete path, but the path is
more complex to represent than on the web since it's not necessarily
discrete. The player can move the avatar in any direction and not only
select between a relatively small set of directions. Possible
approaches to represent paths include storing the global game
coordinates of the avatar's position (avatarpos) at every
tcollect period, or game area relative coordinates
(avatarrpos,area), generalized direction (avatardir) and
speed (avatarspeed) together with position data.
Avatar characteristics may include gender, type of avatar (e.g. elf,
troll), occupation (e.g. wizard, warrior) etc. This information can be
used in creating profiles of the players.
This section summarizes the presented usage attributes found in games,
and proposes content for a common game log.
To reduce duplicate information, the game log is proposed divided into
two files: 1) player.log - which contains information about the
user that doesn't need to be repeated for each action the player does.
This file is updated only for every new session, it's maximum update
frequency is 5#5, and 2) playeractions.log -
which contains information logs of the players action, it's maximum
update frequency is 6#6.
playerid, sessionid, timestamp, playercharacteristics,
playerid, sessionid, playerpos, timestamp,
avatarpos, avataraction, avataraction parameter
One problem with usage log files in massive multiplayer games with
thousands of players is that they're likely to grow (in size) very
rapidly, this has to be considered when storing the data.
An intuitive way of representing the data would be to define XML
Schemas or DTDs for player.log and playeractions.log. Unfortunately
XML is quite verbose, so an easily compressed, though open and
standardized, file format is probably preferable over XML. However,
the best approach is probably to allow several types of game log
encoding, e.g. both XML and a compressed format. The requirements for
a multi-format solution should be simple translation mechanims between
Game Usage Mining Process
Figure 3 shows the details the Game
Usage Mining part of figure 2.
Information gathering of game log files described earlier is shown in
figure 3, part A. In order the make
the system scale up to a large number of players, we need to create
higher-level aggregated information, e.g. a sequence of rapid avatar
position changes will be interpreted as running. This aggregation
process is shown in figure 3, part
An hierarchical approach of dividing high-level actions
(e.g. performing a large task) into several sub-tasks with
corresponding actions have shown useful in the creation of intelligent
Quake monsters . This supports our aggregation
approach of actions.
We suggest that the behavior of non-personal characters
(e.g. monsters) are also logged and processed in the same way as the
players. This in order to be able to recreate the sessions and the
experience of the player. These data must be combined with global
information about the game (storyline, major events, user interface)
in the mining process (part C). Results of the mining process are
patterns (e.g. rules or statistics) that can either be used as input
to a recommender service or as metrics to the game service
We would like to thank Magnus Lie Hetland for fruitful discussions
about computer games in general. We would also like to thank Professor
Mihhail Matskin. This work is supported by the Norwegian Research
Council in the framework of the Distributed Information Technology
Systems (DITS) program and the
The contribution of this paper has been threefold, first we defined
types of data mining in computer, second we provided a classification
of computer games from a data mining viewpoint, and third we compared
information gathering in web usage mining and game usage mining as
well as proposing a common game log format to enable game usage
Future work include determining the actual representation of game log
files (e.g. which attributes and which concrete efficient
representation), determine what type of usage mining is most useful in
massive multiplayer computer games (e.g clustering, classification and
incremental sequence mining [10,7]), and how existing web usage mining architectures
and systems can be adapted to a support scalable game usage mining
Ahmed Abdelkhalek, Angelos Bilas, and Andreas Moshovos.
Behavior and performance of interactive multi-player game servers.
In Proceedings of IEEE International Symposium on Performance
Analysis of Systems and Software. IEEE, November 2001.
Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava.
Web mining: Information and pattern discovery on the world wide web.
In Proceedings of the 9th IEEE International Conference on Tools
with Artificial Intelligence (ICTAI'97). IEEE, November 1997.
Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava.
Data preparation for mining world wide web browsing patterns.
Knowledge and Information Systems, 1(1):5-32, February 1999.
The champs in online games.
Business Week, pages 27-28, July 23, 2001.
It knows what you're going to do: Adding anticipation to a quakebot.
In Proceedings of the 5th International Conference on Autonomous
Agents. ACM, ACM Press, 2001.
John Laird and Michael Van Kent.
Human-Level AI's Killer Application: Interactive Computer Games.
AI Magazine, 22(2):15-26, 2001.
Florent Masseglia, Pascal Poncelet, and Maguelone Teisseire.
Web usage mining: How to efficiently manage new transactions and new clients.
In Proceedings of the 4th European Conference on Principles of
Data Mining and Knowledge Discovery (PKDD'00), September 2000.
Jouni Smed, Timo Kaukoranta, and Harri Hakonen.
Aspects of networking in multiplayer computer games.
In Loo Wai Sing, Wan Hak Man, and Won Wai, editors, Proceedings
of International Conference on Application and Development of Computer Games
in the 21st Century, pages 74-81, November 2001.
Jaideep Srivastava, Robert Cooley, Mukund Deshpande, and Pang-Ning Tan.
Web usage mining: Discovery and applications of usage patterns from web data.
SIGKDD Explorations, 1(1):12-23, January 2000.
Introduction to sequence learning.
In Ron Sun and C. Lee Giles, editors, Sequence Learning -
Paradigms, Algorithms, and Applications, volume 1828 of Lecture Notes
in Computer Science, pages 1-10. Springer-Verlag, 2001.