ments, the teams select one or several maps (typically
3) by a deterministic negotiation protocol. The game
is played once for each map and a winner of each map
is recognized. The team that wins certain amount of
maps (e.g., 2 wins on 3 maps) wins the whole match.
In the remaining text, we will use the term ‘game’ in-
stead of ‘match’ since word ‘match’ may get some-
what ambiguous when describing player matching.
Each team may buy weapons and equipment for vir-
tual money before each round. The virtual money are
earned in the game for winning rounds and killing op-
ponents. Furthermore, players who survive the round
keep most of their equipment.
2.1 Demofiles and Data Sources
A game (one map) can be recorded into a DEM file
called demo file. It is basically a serialization of the
data transferred over the network between the server
and the players. The demofile can be recorded by the
server itself, but it can also be recorded by a spectator
(player present in the game which is invisible to other
players and cannot affect the game).
The demofile for the analysis must be provided by
the tournament organized, or a spectator access must
be granted to the game server. The demofile can be
also processed on the fly when the game is running
(i.e., read whilst it is being written) to provide real-
time game analysis. A very similar data stream is pro-
vided by GOTV – a broadcasting channel integratedin
the game, which may be enabled on the server and it
broadcasts game data to subscribed spectators. How-
ever, many tournaments delay this data (e.g., by 90
seconds), so the data cannot be feed back to the play-
ers via covert channels such as phone.
There is also a huge community interested in
CS:GO which manage data about players and games.
Perhaps the largest site dedicated to this game is
HLTV. It registers all important events and tour-
naments and gather results. The site also gathers
recorded demofiles and provide them for download.
Unfortunately, HLTV administrators have little inter-
est in sharing the data on a large scale; hence, there is
no API and all data has to be scraped from web pages.
Another issue with HLTV data is the player matching
– i.e., interlinking existing player profiles with players
in demofiles. Despite the fact that the site has identi-
fied the players internally, there is no direct linkage
between the demofiles and the web.
HLTV player matching is a special case of a more
general data matching problem. Although many com-
mercial, open-source, or research data-matching sys-
tems have been developed, such as BigMatch, D-
Dupe, R RecordLinkage, and many others (Christen,
2012), none of them is able to take into account the
particular needs of HLTV matching. The problem is
similar to nickname identification which is addressed
especially in the domain of social networks. Some of
the proposed methods use supervised learning meth-
ods (Peled et al., 2013), but they cannot be used in
HLTV matching due to absence of the relevantlabeled
training data of sufficient size.
Other class of methods use their own specific
models for matching individual accounts in particular
social networks. These methods compute a similar-
ity score from profile informations (Jamjuntra et al.,
2017) or combine various identity search methods ex-
ploiting distinct profile attributes to match accounts
across social networks (Jain et al., 2013). All of
these methods utilize additional information available
in user profiles to match the accounts. To our best
knowledge, none of the published methods could be
applicable to HLTV matching as additional informa-
tion are not available in demofiles. Therefore, we pro-
pose our own method which is described in Section 4.
3 GAME DATA PARSING
The game recordings are saved in demo files – a pro-
prietary format of Valve Corp which basically cap-
ture all network traffic (Breu, 2007) between the game
server and clients. A demo file is fixed to one map,
so if multiple maps are played in a game, multiple
demo files are required. On the other hand, it captures
a period of time in a game, so the game on one map
may be (and sometimes is) dividedinto multiple demo
files. Demo file uses three levels of encoding: net-
work packets, messages encoded using Google’s Pro-
tocol Buffers (Varda, 2008), and a proprietary Valve’s
data compression.
All encoding levels are bitwise-oriented. Parsing
one demo file must be done sequentially and only
maintaining the decoding state itself is rather com-
plicated. Furthermore, the third layer of encoding is
not very well documented (as it is proprietary) and
changes with new version of the game.
The CS:GO server (called Source) uses a tick time
unit as a logical time for the game simulation. All
client inputs, actions, and interactions with each other
and world objects are resolved periodically in these
ticks. Typical tick-rate for tournament servers is 128
ticks per second (one tick lasts approximately 7.8ms).
Probably due to size reasons, the demo file stores
information from 8 subsequent ticks together in a sin-
gle burst. This burst has two fixed parts – list of
events, and list of delta changes. Both parts are quite
important, so we describe them in more detail.