with 1 to be the best and can be compared across data
sets. It was further pointed (Newman, ) that formation
of the modularity matrix is closely analogous to the
covariance matrix whose eigenvectors are the basis
for Principal Component Analysis (PCA). Modularity
optimization can be regarded as a PCA for networks.
Related methods also include Laplacian matrix of the
graph or the admittance matrix, and spectral cluste-
ring (Ng et al., 2002). Newmans modularity assumes
a subgraph deviates substantially from its expected to-
tal number of edges to be considered anomalous and
interesting, therefore, all the clusters or communities
(i.e.,popular, emerging and anomalous themes) found
by the community detection algorithm are considered
to be interesting. However, this anomalousness metric
does not consider the difference among the communi-
ties.
In LLA, we improve the modularity metric by
considering a game theoretic framework detailed in
Section 3.
In a social network, the most connected nodes are
typically considered the most important nodes. Ho-
wever, in a text document, we consider emerging and
anomalous information are more interesting and cor-
related to innovations. Also, for a piece of informa-
tion, the combination of popular, emerging and ano-
malous contributes to the total value of the informa-
tion. Therefore, we define a value metric as follows:
Let the popular, emerging and anomalous value
of the information i be P(i), E(i) and A(i) computed
from LLA respectively, the total value V (i) for i, and
V (i) = P(i) + E(i) + A(i) (1)
In the use case in Section 4, we show that the
value metrics are correlated with 1) the innovations
selected and analyzed by human analysts which can
be viewed as ”ground truth”; 2) how many posts fol-
lowing the information as a measure of actual interest.
3 GAME-THEORETIC
FRAMEWORK OF LLA
Previously, game-theoretic frameworks of search en-
gine and information retrieval have studied but rarely
content based(Zhai, 2015). Also, it is important to
point out that the game of information ranking and
retrieval is not a zero-sum game, thus it is different
from a game such as chess or poker in this sense.
As we discussed, value can be defined differently
in different context. When it is defined, the value of
an information can be learned and trained using super-
vised machine learning methods with two conditions:
1) if data can be collected and value are measured and
labeled; 2) if the definition of the value in the context
does not consistently change therefore the historical
train data can be used for prediction.
In real-life, such data is difficult to collect and va-
lue is dynamically changing in many context, there-
fore, supervised machine learning method is difficult
to apply. We introduce a game-theoretic perspective
to justify the value metric in (1).
Game theory is a field of applied mathematics.
It formalizes the conflict between collaborating and
competing players has found applications ranging
from economics to biology(Nowak and Sigmund,
1999)(Rasmusen, 1995). The players can both coope-
rate and compete to exploit their environment to max-
imize their own rewards. This is often can be modeled
as a process to search for a Nash equilibrium. The
whole system including all the players reaches a sta-
ble state, where a player can not unilaterally change
her actions to improve her reward.
When designing a good value metric for an infor-
mation player, there are a couple of other factors that
need attention:
The whole system has to be Pareto efficient or su-
perior. That is to say, the system can not make at least
one player better off without making any other player
worse off is called at a Pareto efficient state. Here,
better off is often interpreted as having higher value
or being in a preferred position, for example, more
central or with a higher degree. If no Pareto impro-
vement can be made in a system, the system is Pa-
reto efficient. Searching for a Nash equilibrium may
not achieve a full Pareto efficiency at the collective le-
vel or to achieve the so-called social welfare measure,
i.e., a total value of a set of players.
LLA can be set up as a game-theoretic framework:
one player is an information provider and the rest of
the world is the other player who responds with the
interest for the information generated by the informa-
tion provider player (or player).
In Figure 2, a LLA player has two rewards: the
authority and expertise reward. The authority from
the popular information and the expertise rewards
from the emerging and anomalous information. An
authority reward measures the correlation (r
i j)
of
Player i to Player j. The expertise reward b
j
(X
t
) for
Player j measures her own unique information to the
whole system.
Traditional search engine algorithms only con-
sider the cumulative authority part of the recursion
(e.g., using the Power method to compute the eigen-
vector for the largest of eigenvalue of the adjacency
or correlation matrix). LLA introduces the expertise
part of the recursion as the total value of collabora-
tive learning agents. By weighing expertise more than
New Value Metrics using Unsupervised Machine Learning, Lexical Link Analysis and Game Theory for Discovering Innovation from Big
Data and Crowd-sourcing
329