http://www.idi.ntnu.no/ amundt/Amund Tveit
Department of Computer and Information Science
Norwegian University of Science and Technology - Gisle
B. Tveit
Department of Thermal Energy
Norwegian University of Science and Technology
Multiplayer games provide rich sources of information for data mining. The primary purpose of data mining in games is to find patterns of behavior, structure or content in order to improve the overall gameplay, hence keeping players longer and increasing the revenue of the game service. In this paper we define the term game usage mining and describe it from web usage mining perspective, with the main emphasis on data gathering. A classification of game types from a data mining perspective is also suggested. Based on the discussion we propose content for a common game log format suitable for game usage mining.
Keywords: Massive Multiplayer Games, Web Usage Mining, Information Gathering
In this paper we compare the process of information gathering for behavior mining in massive multiplayer games (game usage mining) to the similar process known from web usage mining.
Areas and connections between areas (e.g. doors or tunnels), are frequent building structures in games. From a web usage mining perspective they can be seen as relatively analog to web pages and links (i.e. URLs), respectively. Player behavior can also be detected, including actions and speech act utterances. This makes it possible generate game logs that resemble web logs in structure, and hence can gain from applying web usage mining methods with only minor adaptions.
The rest of this paper is organized as follows. Section 2 describes a game classification schema. Section 3 describes the Game Usage Mining concept. Section 4 compares information gathering in a web and game context. Section 5 describes the proposed game log approach, and finally the conclusion with future work.
In the discrete game-state class we consider games that have a
discrete search space and a turn-based gameplay, e.g. in chinese
checkers it is not possible to do fractional (non-integer) moves of
marbles, and players have to wait until their turn. In Quake
[1] the search space (from a practical view)
is non-discrete, and players don't have to wait until their turn,
hence Quake belongs to the non-discrete game-state class.
In the singleplayer game class, the games only have one
simultaneous (human) player, as in case of Solitaire where the human
plays, and the computer only shuffles the cards.
The difference between the multiplayer and the massive
multiplayer classes is not absolute, but the prior covers games with
2-100 simultaneous players (or same order of magnitude), and the
latter covers games with the number of players being 2#2. An
example of a (non-discrete game-state) massive multiplayer game is
NCSoft's Lineage with approximately 110,000 simultaneous players
[4]. We were not able to find
examples of discrete gamestate massive multiplayer games.
Another classification schema for computer games is based on genres
(e.g. action games, role-playing games, adventure games, sports game
etc.) [6]. Genres are suited to classify
what the game is about, but not so well suited to classify the amount
and what type of data that can be gathered from the game, hence being
less useful in a data mining context.
Games can also be classified according to their network architecture (e.g. single node, peer-to-peer, client/server or server-network) [8], which is useful for describing where to collect the data, but doesn't say anything about what type of data to collect.
In figure 2 a mobile massive multiplayer game setting is shown. Players interact with the Massive Multiplayer Game Service (MMGS). The MMGS sends events about user actions to the Game Usage Miner (GUM), if the GUM discovers patterns of interest (e.g. player logoff probability) they get passed on to Recommender Service (RS). When the RS receives a pattern it will give a recommendation back to the MMGS about how to alter the player experience. The purpose of this is to improve the gaming experience for players. Methods used in the RS can e.g. be collaborative filtering or case-based reasoning. Another purpose of the GUM is to provide realtime metrics of the game, typically macro numbers such as mood of the community (e.g. number of players likely to log off in the near future).
For web usage mining, the primary information source is the highly standardized web log (i.e. common or extended log format) created by the web server. Unfortunately, there are currently no standardized information sources that supports game usage mining in massive multiplayer games.
In game usage mining we propose an approach for estimating player sessions. However, if the player pays per time unit to play, it is usually very simple to determine tsession, since the user probably will explicitly log off instead of idling, or a least be logged out automatically after a particular idle time of the client (in order to avoid high bills). If the client tells the game server that the logout was caused by timeout for a period tidle, the accurate session time tsession can easily be determined by:
| 4#4 | (1) |
Where tstart and tstop are timestamps for the start and stop of player session, respectively.
Game playing generate many more events than web browsing, since events are based on real-time actions in the game (e.g driving a car, talking to other avatars, flying on a dragon's back etc.). To avoid overload of the game service's logging mechanism, it has to collect events in memory for a period tcollect, then possibly prune events of less importance and write the important events during the period tcollect to the log. If tcollect is relatively short (e.g. 1s or less, depending on game type), we propose that game log entries (from possibly several players) collected in that period can have an arbitrary order as the case for web logs. The reason for this is to avoid costly synchronization in parallel implementations.
User identification can be improved by using cookies (small information files residing at the user's browser which can be written to and read by the web server), but cookies have gotten a lot of critics due to privacy issues. This has resulted in that many users disable the cookie support in their browsers.
Player identification (playerid) is in games is usually much simpler, since the player is becoming uniquely identified when entering the game (e.g. by username or by mobile phone number) and stays identified the whole playing session.
Games on the other hand, can possibly obtain more interesting data about users, information might include the player's real-life name, address, e-mail, connecting IP address etc (based on manual input by the player when registering).
This section describes information attributes of the avatar - the virtual character that the player controls in the game.
Action types (avataraction) in games are usually much more plentiful and represented as verbs, examples include ``jump'', ``fire'', ``walk'' and ``say'' etc.
Examples of action parameters (avataraction parameters) in games include: natural language or slang phrases (corresponding to action ``say''), power or distance (corresponding to action ``fire'').
In games it is easier to determine the complete path, but the path is more complex to represent than on the web since it's not necessarily discrete. The player can move the avatar in any direction and not only select between a relatively small set of directions. Possible approaches to represent paths include storing the global game coordinates of the avatar's position (avatarpos) at every tcollect period, or game area relative coordinates (avatarrpos,area), generalized direction (avatardir) and speed (avatarspeed) together with position data.
To reduce duplicate information, the game log is proposed divided into two files: 1) player.log - which contains information about the user that doesn't need to be repeated for each action the player does. This file is updated only for every new session, it's maximum update frequency is 5#5, and 2) playeractions.log - which contains information logs of the players action, it's maximum update frequency is 6#6.
An intuitive way of representing the data would be to define XML Schemas or DTDs for player.log and playeractions.log. Unfortunately XML is quite verbose, so an easily compressed, though open and standardized, file format is probably preferable over XML. However, the best approach is probably to allow several types of game log encoding, e.g. both XML and a compressed format. The requirements for a multi-format solution should be simple translation mechanims between the formats.
Figure 3 shows the details the Game
Usage Mining part of figure 2.
Information gathering of game log files described earlier is shown in figure 3, part A. In order the make the system scale up to a large number of players, we need to create higher-level aggregated information, e.g. a sequence of rapid avatar position changes will be interpreted as running. This aggregation process is shown in figure 3, part B.
An hierarchical approach of dividing high-level actions
(e.g. performing a large task) into several sub-tasks with
corresponding actions have shown useful in the creation of intelligent
Quake monsters [5]. This supports our aggregation
approach of actions.
We suggest that the behavior of non-personal characters (e.g. monsters) are also logged and processed in the same way as the players. This in order to be able to recreate the sessions and the experience of the player. These data must be combined with global information about the game (storyline, major events, user interface) in the mining process (part C). Results of the mining process are patterns (e.g. rules or statistics) that can either be used as input to a recommender service or as metrics to the game service operator(s).
Future work include determining the actual representation of game log files (e.g. which attributes and which concrete efficient representation), determine what type of usage mining is most useful in massive multiplayer computer games (e.g clustering, classification and incremental sequence mining [10,7]), and how existing web usage mining architectures and systems can be adapted to a support scalable game usage mining setting.
This document was generated using the LaTeX2HTML translator Version 97.1 (release) (July 13th, 1997)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -split 0 -local_icons GameMining.tex.
The translation was initiated by Amund Tveit on 8/6/2002